FxNLMSUser
Overview
Filtered-reference normalized LMS module which works with the StateAllocator and CoeffAllocator modules
Discussion
This module implements a filtered-reference LMS adaptive algorithm, also known as Filt-X LMS or FxLMS. The implementation is MIMO and has optional input normalization and a leakage factor. The module is a "user" type and works in conjunction with other "user" and "allocator" modules.
FxNLMSUser adapts a set of FIR filter coefficients by first filtering the reference signal through a model of the secondary path, then using the resulting filtered-reference signal in the LMS gradient computation. Coefficients are adapted once per pump. One filtered-reference sample per channel triple (output, error, reference), per blocksize sample, is computed every pump and stored in an internal circular buffer. The full block is processed to keep a complete filtered-reference state, but adaptation is only performed with the latest available sample (i.e. Block LMS is not used). State and coefficients for the adaptive filter are shared by several modules and are hosted elsewhere. State is expected to come in forward-order and coefficients in reverse-order.
SRef and SErr are the reference and error state inputs respectively. They connect to StateAllocator modules.
StateAllocator modules host (allocate) state data and provide a reference to its internal buffer through an "S" frame.
StateAllocator for the reference input must be configured to hold at least numTapsSec + delaySampsRef - 1 delay samples, so that enough reference history is available for the secondary-path FIR and the configurable reference delay.
StateAllocator for the error input requires no additional delay samples beyond the block size, so delay should be configured to 1, which is the minimum allowed.
Both should be set to "deinterleave" mode and input is expected to be real numbers only.
C1 and C2 are coefficient inputs for the adaptive filter. They connect to CoeffAllocator modules.
CoeffAllocator modules host (allocate) coefficient data and provide a reference to its internal buffer through a "C" frame.
C1 and C2 alternate every pump, one as source and the other as destination. The coeffSelect variable reports 0 when C1 is source and 1 when C2 is source.
The same CoeffAllocator can be connected to both C1 and C2 to adapt coeffs "in place" (read and write from/to same buffer).
Two separate CoeffAllocator modules on C1 and C2 are useful in multi-thread/instance designs to avoid race conditions.
The output (C) forwards the "C" frame which was last used as destination (latest adapted coeffs). Forwarding only occurs if no error is reported.
CoeffAllocator module(s) must be configured to have numChanRef*numChanOut channels and numTapsAdapt samples per channel.
The default set of coefficients is zeros, but the arrays are tunable and can contain non-zero coefficients to start with.
If setting startup coefficients, they must be stored in reverse order. The channels are ordered by output number first, then by reference number.
Example: for 3 output channels with 2 references, the channel order is O1R1, O1R2, O2R1, O2R2, O3R1, O3R2.
CoeffAllocators store the channels in non-interleaved format. Coefficients should be real numbers only.
When the runtime status of the module is Muted/Bypassed/Inactive, its process function is not called. This module keeps internal state for the reference filter, so it will produce incorrect output due to incomplete history when set to Active again. Additionally, if the StateAllocators feeding it are set to inactive states at the same time, filter history will be interrupted as well and incorrect output will occurr for a few blocks after re-activating. It is recommended to freeze adaptation for a few blocks after toggling back to Active status from any other state. C1/C2 input toggling is also stopped during non-Active modes and it will resume to its previous state when Active again. Muted state will set C1 output to zeros, Bypassed state will forward C1 input to C1 output and Inactive state will leave C1 output to its pre-Inactive-state value.
The module will report an error in the errorCode variable (shown in inspector) if input frame values do not match expected ones. The reported error is the first one encountered, so there may be more. A list of possible errors is detailed further down. Adaptation does not start until all inputs are deemed correct. To reduce error-checking overhead, the module will go into locked state and will only run full checking again if heap numbers or offsets change. When an error is found, the output frame offset value is set to 0 (output wire may contain other undefined values).
The argument NUMTAPSADAPT specifies the adaptive filter length. CoeffAllocator(s) samples-per-channel must be set to this same value.
Set this number to be a multiple of the internal memory alignment for a better chance at optimal performance.
The argument NUMTAPSSEC specifies the secondary-path filter length. This filter is hosted within the module since the coeffs are not shared.
Set this number to be a multiple of the internal memory alignment for a better chance at optimal performance.
Coefficients must be stored in reverse order. Channels are ordered by output number first, then by error number.
Example: for 3 output channels with 2 error channels, the channel order is O1E1, O1E2, O2E1, O2E2, O3E1, O3E2.
The argument NUMCHANREF specifies the number of reference channels. This must match the number of channels connected to the reference StateAllocator.
The argument NUMCHANERR specifies the number of error channels. This must match the number of channels connected to the error StateAllocator.
The argument NUMCHANOUT specifies the number of LMS output (or control) channels. This is used for calculating the total number of channels in the adaptive and secondary filters.
The expected coeffs channel count at the input is numChanRef*numChanOut. The secondary path filter is sized to have numChanErr*numChanOut channels.
The argument BLOCKSIZE specifies the number of new samples processed by the secondary-path filter per pump. This must match the blocksize of the signal going into the StateAllocators. The filtered-reference array is updated by this many new samples every pump regardless of the thread where this module is placed. This is so that there are no gaps in the state required by the adaptation stage.
The argument MEMHEAP specifies the heap to use for internal arrays. This includes the secondary-path coefficients, filtered-reference state, and scratch buffers.
When AWE_HEAP_SHARED is used for state or coefficient buffers, the module invalidates the cache on source buffers before processing and writes back the destination coefficient buffer after adaptation, ensuring coherency in multi-core configurations.
The variable delaySampsRef specifies the delay in samples applied to the reference before secondary-path filtering.
This is meant to compensate for loop delay between the reference and error sensing points and depends on the specific application setup. This delay is additional to the one introduced by the secondary-path filter (if any).
When blocksize is not 1, this delay should normally be increased by blocksize-1 samples.
Changing delaySampsRef at runtime resets the locked state, so a full input validation re-check is performed on the next pump.
The variable normalize controls the normalization denominator used in the coefficient update:
- 0: no normalization (standard LMS) - lowest relative cost
- 1: normalize by the power of the raw reference signal - medium relative cost
- 2: normalize by the power of the filtered reference signal (true NLMS denominator) - highest relative cost
Normalization mode 2 is usually the most expensive (see Computational Cost section below) but it's the most accurate since it factors in the effects of the secondary path over the reference signal. It should normally allow the best convergence rate.
The variable stepSize specifies the adaptation learning rate. When set to 0, adaptation is frozen and the last adapted CoeffAllocator frame is forwarded (or default if first pass). Because the adaptation algorithm is not run, computational load is reduced. When freezing, however, the secondary-path filter is still run so that internal state does not contain gaps.
The variable leakFactor specifies the adaptation leakage factor. When set to 1, the algorithm is non-leaky.
Computational cost
Each pump executes three algorithmic steps.
Let B = blockSize, R = numChanRef, E = numChanErr, M = numChanOut, P = numTapsAdapt, Q = numTapsSec.
Step 1 — Secondary-path filtering of the reference:
One new filtered-reference sample is computed per channel triple (output, error, reference), per block sample, by evaluating the Q-tap secondary-path FIR at the latest reference sample of the block.
The result is stored in the internal circular buffer; the reference linear state buffer holds B + Q - 1 samples per channel to provide the needed history.
- MACs: B x R × M × E × Q
Step 2 — Error loading and normalization (once per pump):
The current error sample is read. If normalization is enabled, the power denominator is computed:
- Mode 0 (off): no MACs
- Mode 1 (raw reference power): R × Q multiply-accumulates (sum of squared samples in raw reference)
- Mode 2 (filtered reference power): R × M × E × P multiply-accumulates (sum of squared pump-rate filtered-reference samples in the circular buffer)
Step 3 — Coefficient update (once per pump):
For each (output, reference) adaptive filter, the gradient is accumulated over E error channels and P filtered-reference samples stored in the circular buffer (E × P MACs), then leakage and step are applied to all P taps (P MACs).
- MACs: R × M × P × (E + 1)
Error list
| Code | Input | Meaning |
|---|---|---|
| 0 | — | No error |
| -1 | SRef (reference) | Heap number out of range |
| -2 | SErr (error) | Heap number out of range |
| -3 | C1 | Heap number out of range |
| -4 | C2 | Heap number out of range |
| -11 | SRef (reference) | Heap does not exist on target, offset is zero, or reported size too large for heap |
| -12 | SErr (error) | Heap does not exist on target, offset is zero, or reported size too large for heap |
| -13 | C1 | Heap does not exist on target, offset is zero, or reported size too large for heap |
| -14 | C2 | Heap does not exist on target, offset is zero, or reported size too large for heap |
| -21 | SRef (reference) | Incorrect format — deinterleaving, complexity, number of channels, or channel length |
| -22 | SErr (error) | Incorrect format — deinterleaving, complexity, number of channels, or channel length |
| -23 | C1 | Incorrect format — complexity, number of channels, or channel length |
| -24 | C2 | Incorrect format — complexity, number of channels, or channel length |
Note: errors are checked and reported at run-time
See also: FIRUser FeNLMSUser StateAllocator CoeffAllocator
See usage examples in the AllocatorUser_*.awd layouts found in the \<AWE install folder>/Examples/Module_Usage folder
Module Pack
Advanced
ClassID
classID = 1463
Type Definition
typedef struct _ModuleFxNLMSUser
{
ModuleInstanceDescriptor instance; // Common Audio Weaver module instance structure
INT32 numTapsAdapt; // Length of adaptive filter.
INT32 numTapsSec; // Length of secondary path filter.
INT32 numChanRef; // Number of reference (x) channels.
INT32 numChanErr; // Number of error (e) channels.
INT32 numChanOut; // Number of output (y) channels.
INT32 blockSize; // Number of samples per block.
UINT32 delaySampsRef; // Reference-signal delay in samples.
UINT32 normalize; // Normalization mode: 0=off, 1=raw-reference power, 2=filtered-reference power.
FLOAT32 stepSize; // Adaptation step size.
FLOAT32 leakFactor; // Adaptation leakage factor (1.0 is non-leaky).
INT32 chanLenCircErr; // Length of error state per channel as set by StateAllocator.
INT32 chanLenCircRef; // Length of reference state per channel as set by StateAllocator.
UINT32 pastOffsetCoeffs1; // Keeps past received coeffs offset
UINT32 pastOffsetCoeffs2; // Keeps past received coeffs offset
UINT32 pastOffsetRef; // Keeps past received reference offset
UINT32 pastOffsetErr; // Keeps past received error offset
UINT32 pastHeapCoeffs1; // Keeps past received coeffs heap
UINT32 pastHeapCoeffs2; // Keeps past received coeffs heap
UINT32 pastHeapRef; // Keeps past received reference heap
UINT32 pastHeapErr; // Keeps past received error heap
INT32 locked; // Locks pointers and sizes safety checks.
INT32 invalidateCoeffs1; // Flag to invalidate cache for coeffs.
INT32 invalidateCoeffs2; // Flag to invalidate cache for coeffs.
INT32 invalidateRef; // Flag to invalidate cache for state.
INT32 invalidateErr; // Flag to invalidate cache for state.
INT32 errorCode; // Captures errors that could arise at run-time and is shown on the inspector
INT32 coeffSelect; // Coeff input selector. Toggled once per pump. When 0, C1 is the source and C2 the destination, else direction is reversed.
INT32 arrayHeap; // Heap in which to allocate secondary path coeffs and internal arrays.
INT32 chanLenLinRef; // Length of reference state per channel for internal linear buff (= numTapsSec + blockSize - 1).
INT32 filtRefStateIndex; // Circular write index into filtered reference state buffer.
FLOAT32* coeffsSec; // Secondary path filter coeff array in reverse order
FLOAT32* aScratch; // Scratch buffer for G_m computation
FLOAT32* aLinStateRef; // Reference linear state buffer (numTapsSec + blockSize - 1 samples per channel)
FLOAT32* aFiltRefState; // Filtered-reference state buffer (circular, one sample per pump per (l,m,e) channel triple)
FLOAT32* eValues; // Scratch buffer for preloaded error channel samples (numChanErr floats)
float * stateRef; // Keeps past calculated pointer
float * stateErr; // Keeps past calculated pointer
float * coeffsAdapt1; // Keeps past calculated pointer
float * coeffsAdapt2; // Keeps past calculated pointer
void * hardware_specific_struct_pointer; // General purpose pointer for target-specific cases
} ModuleFxNLMSUserClass;
Variables
Properties
| Name | Type | Usage | isHidden | Default Value | Range | Units |
|---|---|---|---|---|---|---|
| numTapsAdapt | int | const | 0 | 128 | Unrestricted | |
| numTapsSec | int | const | 0 | 256 | Unrestricted | |
| numChanRef | int | const | 0 | 1 | Unrestricted | |
| numChanErr | int | const | 0 | 1 | Unrestricted | |
| numChanOut | int | const | 0 | 1 | Unrestricted | |
| blockSize | int | const | 0 | 1 | Unrestricted | |
| delaySampsRef | uint | parameter | 0 | 1 | Unrestricted | samples |
| normalize | uint | parameter | 0 | 1 | 0:2 | |
| stepSize | float | parameter | 0 | 0.01 | 0:1 | |
| leakFactor | float | parameter | 0 | 1 | 0:1 | |
| chanLenCircErr | int | derived | 0 | 1 | Unrestricted | |
| chanLenCircRef | int | derived | 0 | 256 | Unrestricted | |
| pastOffsetCoeffs1 | uint | state | 0 | 0 | Unrestricted | |
| pastOffsetCoeffs2 | uint | state | 0 | 0 | Unrestricted | |
| pastOffsetRef | uint | state | 0 | 0 | Unrestricted | |
| pastOffsetErr | uint | state | 0 | 0 | Unrestricted | |
| pastHeapCoeffs1 | uint | state | 0 | 0 | Unrestricted | |
| pastHeapCoeffs2 | uint | state | 0 | 0 | Unrestricted | |
| pastHeapRef | uint | state | 0 | 0 | Unrestricted | |
| pastHeapErr | uint | state | 0 | 0 | Unrestricted | |
| locked | int | state | 0 | 0 | Unrestricted | |
| invalidateCoeffs1 | int | state | 0 | 0 | Unrestricted | |
| invalidateCoeffs2 | int | state | 0 | 0 | Unrestricted | |
| invalidateRef | int | state | 0 | 0 | Unrestricted | |
| invalidateErr | int | state | 0 | 0 | Unrestricted | |
| errorCode | int | state | 0 | 0 | Unrestricted | |
| coeffSelect | int | state | 0 | 0 | Unrestricted | |
| arrayHeap | int | const | 1 | 1 | Unrestricted | |
| chanLenLinRef | int | const | 1 | 256 | Unrestricted | |
| filtRefStateIndex | int | state | 1 | 0 | Unrestricted | |
| coeffsSec | float* | parameter | 0 | [256 x 1] | Unrestricted | |
| aScratch | float* | state | 0 | [256 x 1] | Unrestricted | |
| aLinStateRef | float* | state | 0 | [256 x 1] | Unrestricted | |
| aFiltRefState | float* | state | 0 | [130 x 1] | Unrestricted | |
| eValues | float* | state | 0 | [1 x 1] | Unrestricted | |
| stateRef | float * | state | 1 | Unrestricted | ||
| stateErr | float * | state | 1 | Unrestricted | ||
| coeffsAdapt1 | float * | state | 1 | Unrestricted | ||
| coeffsAdapt2 | float * | state | 1 | Unrestricted | ||
| hardware_specific_struct_pointer | void * | state | 1 | Unrestricted |
Pins
Input Pins
| Name | SRef |
| Description | Reference StateAllocator frame input |
| Data type | int |
| Channel range | 7 |
| Block size range | 1 |
| Sample rate range | Unrestricted |
| Complex support | Real |
| Name | SErr |
| Description | Error StateAllocator frame input |
| Data type | int |
| Channel range | 7 |
| Block size range | 1 |
| Sample rate range | Unrestricted |
| Complex support | Real |
| Name | C1 |
| Description | Adaptive coeffs CoeffAllocator frame input |
| Data type | int |
| Channel range | 5 |
| Block size range | 1 |
| Sample rate range | Unrestricted |
| Complex support | Real |
| Name | C2 |
| Description | Adaptive coeffs CoeffAllocator frame input |
| Data type | int |
| Channel range | 5 |
| Block size range | 1 |
| Sample rate range | Unrestricted |
| Complex support | Real |
Output Pins
| Name | C |
| Description | Adaptive coeffs CoeffAllocator frame out |
| Data Type | int |
Matlab Usage
File Name: fxnlms_user_module.m
M = fxnlms_user_module(NAME, NUMTAPSADAPT, NUMTAPSSEC, NUMCHANREF, NUMCHANERR, NUMCHANOUT, BLOCKSIZE, MEMHEAP)
Creates a module which implements a filtered-reference LMS adaptive algorithm
(also known as Multiple Error LMS or FxNLMS). The implementation is MIMO and
has optional input normalization and a leakage factor. This "user" module works
with other "user" and "allocator" modules.
Arguments:
NAME - name of the module.
NUMTAPSADAPT - adaptive filter number of taps
By default, NUMTAPSADAPT = 128;
NUMTAPSSEC - secondary path filter number of taps
By default, NUMTAPSSEC = 256;
NUMCHANREF - number of reference channels (x)
By default, NUMCHANREF = 1;
NUMCHANERR - number of error channels (e)
By default, NUMCHANERR = 1;
NUMCHANOUT - number of output channels (y)
By default, NUMCHANOUT = 1;
BLOCKSIZE - number of samples per block (must be 1 or greater)
By default, BLOCKSIZE = 1;
MEMHEAP - heap which hosts internal arrays. This is a string and
follows the memory allocation enumeration in Framework.h.
Allowable values are:
'AWE_HEAP_FAST' - always use internal DM memory (default).
'AWE_HEAP_FASTB' - always use internal PM memory.
'AWE_HEAP_SLOW' - always use external memory.
'AWE_HEAP_FAST2SLOW' - try DM, PM or external in that order
'AWE_HEAP_FASTB2SLOW' - try PM or external in that order
'AWE_HEAP_SHARED' - always use shared memory.
Copyright (c) 2026 DSP Concepts, Inc.