Skip to content

FxNLMSUser

Overview

Filtered-reference normalized LMS module which works with the StateAllocator and CoeffAllocator modules

Discussion

This module implements a filtered-reference LMS adaptive algorithm, also known as Filt-X LMS or FxLMS. The implementation is MIMO and has optional input normalization and a leakage factor. The module is a "user" type and works in conjunction with other "user" and "allocator" modules.

FxNLMSUser adapts a set of FIR filter coefficients by first filtering the reference signal through a model of the secondary path, then using the resulting filtered-reference signal in the LMS gradient computation. Coefficients are adapted once per pump. One filtered-reference sample per channel triple (output, error, reference), per blocksize sample, is computed every pump and stored in an internal circular buffer. The full block is processed to keep a complete filtered-reference state, but adaptation is only performed with the latest available sample (i.e. Block LMS is not used). State and coefficients for the adaptive filter are shared by several modules and are hosted elsewhere. State is expected to come in forward-order and coefficients in reverse-order.

SRef and SErr are the reference and error state inputs respectively. They connect to StateAllocator modules. StateAllocator modules host (allocate) state data and provide a reference to its internal buffer through an "S" frame. StateAllocator for the reference input must be configured to hold at least numTapsSec + delaySampsRef - 1 delay samples, so that enough reference history is available for the secondary-path FIR and the configurable reference delay. StateAllocator for the error input requires no additional delay samples beyond the block size, so delay should be configured to 1, which is the minimum allowed. Both should be set to "deinterleave" mode and input is expected to be real numbers only.

C1 and C2 are coefficient inputs for the adaptive filter. They connect to CoeffAllocator modules. CoeffAllocator modules host (allocate) coefficient data and provide a reference to its internal buffer through a "C" frame. C1 and C2 alternate every pump, one as source and the other as destination. The coeffSelect variable reports 0 when C1 is source and 1 when C2 is source. The same CoeffAllocator can be connected to both C1 and C2 to adapt coeffs "in place" (read and write from/to same buffer). Two separate CoeffAllocator modules on C1 and C2 are useful in multi-thread/instance designs to avoid race conditions. The output (C) forwards the "C" frame which was last used as destination (latest adapted coeffs). Forwarding only occurs if no error is reported. CoeffAllocator module(s) must be configured to have numChanRef*numChanOut channels and numTapsAdapt samples per channel. The default set of coefficients is zeros, but the arrays are tunable and can contain non-zero coefficients to start with. If setting startup coefficients, they must be stored in reverse order. The channels are ordered by output number first, then by reference number. Example: for 3 output channels with 2 references, the channel order is O1R1, O1R2, O2R1, O2R2, O3R1, O3R2. CoeffAllocators store the channels in non-interleaved format. Coefficients should be real numbers only.

When the runtime status of the module is Muted/Bypassed/Inactive, its process function is not called. This module keeps internal state for the reference filter, so it will produce incorrect output due to incomplete history when set to Active again. Additionally, if the StateAllocators feeding it are set to inactive states at the same time, filter history will be interrupted as well and incorrect output will occurr for a few blocks after re-activating. It is recommended to freeze adaptation for a few blocks after toggling back to Active status from any other state. C1/C2 input toggling is also stopped during non-Active modes and it will resume to its previous state when Active again. Muted state will set C1 output to zeros, Bypassed state will forward C1 input to C1 output and Inactive state will leave C1 output to its pre-Inactive-state value.

The module will report an error in the errorCode variable (shown in inspector) if input frame values do not match expected ones. The reported error is the first one encountered, so there may be more. A list of possible errors is detailed further down. Adaptation does not start until all inputs are deemed correct. To reduce error-checking overhead, the module will go into locked state and will only run full checking again if heap numbers or offsets change. When an error is found, the output frame offset value is set to 0 (output wire may contain other undefined values).

The argument NUMTAPSADAPT specifies the adaptive filter length. CoeffAllocator(s) samples-per-channel must be set to this same value. Set this number to be a multiple of the internal memory alignment for a better chance at optimal performance.

The argument NUMTAPSSEC specifies the secondary-path filter length. This filter is hosted within the module since the coeffs are not shared. Set this number to be a multiple of the internal memory alignment for a better chance at optimal performance. Coefficients must be stored in reverse order. Channels are ordered by output number first, then by error number. Example: for 3 output channels with 2 error channels, the channel order is O1E1, O1E2, O2E1, O2E2, O3E1, O3E2.

The argument NUMCHANREF specifies the number of reference channels. This must match the number of channels connected to the reference StateAllocator.

The argument NUMCHANERR specifies the number of error channels. This must match the number of channels connected to the error StateAllocator.

The argument NUMCHANOUT specifies the number of LMS output (or control) channels. This is used for calculating the total number of channels in the adaptive and secondary filters. The expected coeffs channel count at the input is numChanRef*numChanOut. The secondary path filter is sized to have numChanErr*numChanOut channels.

The argument BLOCKSIZE specifies the number of new samples processed by the secondary-path filter per pump. This must match the blocksize of the signal going into the StateAllocators. The filtered-reference array is updated by this many new samples every pump regardless of the thread where this module is placed. This is so that there are no gaps in the state required by the adaptation stage.

The argument MEMHEAP specifies the heap to use for internal arrays. This includes the secondary-path coefficients, filtered-reference state, and scratch buffers. When AWE_HEAP_SHARED is used for state or coefficient buffers, the module invalidates the cache on source buffers before processing and writes back the destination coefficient buffer after adaptation, ensuring coherency in multi-core configurations.

The variable delaySampsRef specifies the delay in samples applied to the reference before secondary-path filtering. This is meant to compensate for loop delay between the reference and error sensing points and depends on the specific application setup. This delay is additional to the one introduced by the secondary-path filter (if any). When blocksize is not 1, this delay should normally be increased by blocksize-1 samples. Changing delaySampsRef at runtime resets the locked state, so a full input validation re-check is performed on the next pump.

The variable normalize controls the normalization denominator used in the coefficient update: - 0: no normalization (standard LMS) - lowest relative cost - 1: normalize by the power of the raw reference signal - medium relative cost - 2: normalize by the power of the filtered reference signal (true NLMS denominator) - highest relative cost Normalization mode 2 is usually the most expensive (see Computational Cost section below) but it's the most accurate since it factors in the effects of the secondary path over the reference signal. It should normally allow the best convergence rate.

The variable stepSize specifies the adaptation learning rate. When set to 0, adaptation is frozen and the last adapted CoeffAllocator frame is forwarded (or default if first pass). Because the adaptation algorithm is not run, computational load is reduced. When freezing, however, the secondary-path filter is still run so that internal state does not contain gaps.

The variable leakFactor specifies the adaptation leakage factor. When set to 1, the algorithm is non-leaky.

Computational cost

Each pump executes three algorithmic steps. Let B = blockSize, R = numChanRef, E = numChanErr, M = numChanOut, P = numTapsAdapt, Q = numTapsSec.

Step 1 — Secondary-path filtering of the reference: One new filtered-reference sample is computed per channel triple (output, error, reference), per block sample, by evaluating the Q-tap secondary-path FIR at the latest reference sample of the block. The result is stored in the internal circular buffer; the reference linear state buffer holds B + Q - 1 samples per channel to provide the needed history. - MACs: B x R × M × E × Q

Step 2 — Error loading and normalization (once per pump): The current error sample is read. If normalization is enabled, the power denominator is computed: - Mode 0 (off): no MACs - Mode 1 (raw reference power): R × Q multiply-accumulates (sum of squared samples in raw reference) - Mode 2 (filtered reference power): R × M × E × P multiply-accumulates (sum of squared pump-rate filtered-reference samples in the circular buffer)

Step 3 — Coefficient update (once per pump): For each (output, reference) adaptive filter, the gradient is accumulated over E error channels and P filtered-reference samples stored in the circular buffer (E × P MACs), then leakage and step are applied to all P taps (P MACs). - MACs: R × M × P × (E + 1)

Error list

Code Input Meaning
0 No error
-1 SRef (reference) Heap number out of range
-2 SErr (error) Heap number out of range
-3 C1 Heap number out of range
-4 C2 Heap number out of range
-11 SRef (reference) Heap does not exist on target, offset is zero, or reported size too large for heap
-12 SErr (error) Heap does not exist on target, offset is zero, or reported size too large for heap
-13 C1 Heap does not exist on target, offset is zero, or reported size too large for heap
-14 C2 Heap does not exist on target, offset is zero, or reported size too large for heap
-21 SRef (reference) Incorrect format — deinterleaving, complexity, number of channels, or channel length
-22 SErr (error) Incorrect format — deinterleaving, complexity, number of channels, or channel length
-23 C1 Incorrect format — complexity, number of channels, or channel length
-24 C2 Incorrect format — complexity, number of channels, or channel length

Note: errors are checked and reported at run-time

See also: FIRUser FeNLMSUser StateAllocator CoeffAllocator

See usage examples in the AllocatorUser_*.awd layouts found in the \<AWE install folder>/Examples/Module_Usage folder

Module Pack

Advanced

ClassID

classID = 1463

Type Definition

typedef struct _ModuleFxNLMSUser
{
ModuleInstanceDescriptor instance;            // Common Audio Weaver module instance structure
INT32 numTapsAdapt;                           // Length of adaptive filter.
INT32 numTapsSec;                             // Length of secondary path filter.
INT32 numChanRef;                             // Number of reference (x) channels.
INT32 numChanErr;                             // Number of error (e) channels.
INT32 numChanOut;                             // Number of output (y) channels.
INT32 blockSize;                              // Number of samples per block.
UINT32 delaySampsRef;                         // Reference-signal delay in samples.
UINT32 normalize;                             // Normalization mode: 0=off, 1=raw-reference power, 2=filtered-reference power.
FLOAT32 stepSize;                             // Adaptation step size.
FLOAT32 leakFactor;                           // Adaptation leakage factor (1.0 is non-leaky).
INT32 chanLenCircErr;                         // Length of error state per channel as set by StateAllocator.
INT32 chanLenCircRef;                         // Length of reference state per channel as set by StateAllocator.
UINT32 pastOffsetCoeffs1;                     // Keeps past received coeffs offset
UINT32 pastOffsetCoeffs2;                     // Keeps past received coeffs offset
UINT32 pastOffsetRef;                         // Keeps past received reference offset
UINT32 pastOffsetErr;                         // Keeps past received error offset
UINT32 pastHeapCoeffs1;                       // Keeps past received coeffs heap
UINT32 pastHeapCoeffs2;                       // Keeps past received coeffs heap
UINT32 pastHeapRef;                           // Keeps past received reference heap
UINT32 pastHeapErr;                           // Keeps past received error heap
INT32 locked;                                 // Locks pointers and sizes safety checks.
INT32 invalidateCoeffs1;                      // Flag to invalidate cache for coeffs.
INT32 invalidateCoeffs2;                      // Flag to invalidate cache for coeffs.
INT32 invalidateRef;                          // Flag to invalidate cache for state.
INT32 invalidateErr;                          // Flag to invalidate cache for state.
INT32 errorCode;                              // Captures errors that could arise at run-time and is shown on the inspector
INT32 coeffSelect;                            // Coeff input selector. Toggled once per pump. When 0, C1 is the source and C2 the destination, else direction is reversed.
INT32 arrayHeap;                              // Heap in which to allocate secondary path coeffs and internal arrays.
INT32 chanLenLinRef;                          // Length of reference state per channel for internal linear buff (= numTapsSec + blockSize - 1).
INT32 filtRefStateIndex;                      // Circular write index into filtered reference state buffer.
FLOAT32* coeffsSec;                           // Secondary path filter coeff array in reverse order
FLOAT32* aScratch;                            // Scratch buffer for G_m computation
FLOAT32* aLinStateRef;                        // Reference linear state buffer (numTapsSec + blockSize - 1 samples per channel)
FLOAT32* aFiltRefState;                       // Filtered-reference state buffer (circular, one sample per pump per (l,m,e) channel triple)
FLOAT32* eValues;                             // Scratch buffer for preloaded error channel samples (numChanErr floats)
float * stateRef;                             // Keeps past calculated pointer
float * stateErr;                             // Keeps past calculated pointer
float * coeffsAdapt1;                         // Keeps past calculated pointer
float * coeffsAdapt2;                         // Keeps past calculated pointer
void * hardware_specific_struct_pointer;      // General purpose pointer for target-specific cases
} ModuleFxNLMSUserClass;

Variables

Properties

Name Type Usage isHidden Default Value Range Units
numTapsAdapt int const 0 128 Unrestricted
numTapsSec int const 0 256 Unrestricted
numChanRef int const 0 1 Unrestricted
numChanErr int const 0 1 Unrestricted
numChanOut int const 0 1 Unrestricted
blockSize int const 0 1 Unrestricted
delaySampsRef uint parameter 0 1 Unrestricted samples
normalize uint parameter 0 1 0:2
stepSize float parameter 0 0.01 0:1
leakFactor float parameter 0 1 0:1
chanLenCircErr int derived 0 1 Unrestricted
chanLenCircRef int derived 0 256 Unrestricted
pastOffsetCoeffs1 uint state 0 0 Unrestricted
pastOffsetCoeffs2 uint state 0 0 Unrestricted
pastOffsetRef uint state 0 0 Unrestricted
pastOffsetErr uint state 0 0 Unrestricted
pastHeapCoeffs1 uint state 0 0 Unrestricted
pastHeapCoeffs2 uint state 0 0 Unrestricted
pastHeapRef uint state 0 0 Unrestricted
pastHeapErr uint state 0 0 Unrestricted
locked int state 0 0 Unrestricted
invalidateCoeffs1 int state 0 0 Unrestricted
invalidateCoeffs2 int state 0 0 Unrestricted
invalidateRef int state 0 0 Unrestricted
invalidateErr int state 0 0 Unrestricted
errorCode int state 0 0 Unrestricted
coeffSelect int state 0 0 Unrestricted
arrayHeap int const 1 1 Unrestricted
chanLenLinRef int const 1 256 Unrestricted
filtRefStateIndex int state 1 0 Unrestricted
coeffsSec float* parameter 0 [256 x 1] Unrestricted
aScratch float* state 0 [256 x 1] Unrestricted
aLinStateRef float* state 0 [256 x 1] Unrestricted
aFiltRefState float* state 0 [130 x 1] Unrestricted
eValues float* state 0 [1 x 1] Unrestricted
stateRef float * state 1 Unrestricted
stateErr float * state 1 Unrestricted
coeffsAdapt1 float * state 1 Unrestricted
coeffsAdapt2 float * state 1 Unrestricted
hardware_specific_struct_pointer void * state 1 Unrestricted

Pins

Input Pins

Name SRef
Description Reference StateAllocator frame input
Data type int
Channel range 7
Block size range 1
Sample rate range Unrestricted
Complex support Real
Name SErr
Description Error StateAllocator frame input
Data type int
Channel range 7
Block size range 1
Sample rate range Unrestricted
Complex support Real
Name C1
Description Adaptive coeffs CoeffAllocator frame input
Data type int
Channel range 5
Block size range 1
Sample rate range Unrestricted
Complex support Real
Name C2
Description Adaptive coeffs CoeffAllocator frame input
Data type int
Channel range 5
Block size range 1
Sample rate range Unrestricted
Complex support Real

Output Pins

Name C
Description Adaptive coeffs CoeffAllocator frame out
Data Type int

Matlab Usage

File Name: fxnlms_user_module.m 
 M = fxnlms_user_module(NAME, NUMTAPSADAPT, NUMTAPSSEC, NUMCHANREF, NUMCHANERR, NUMCHANOUT, BLOCKSIZE, MEMHEAP) 
 Creates a module which implements a filtered-reference LMS adaptive algorithm 
 (also known as Multiple Error LMS or FxNLMS). The implementation is MIMO and 
 has optional input normalization and a leakage factor. This "user" module works 
 with other "user" and "allocator" modules. 
 Arguments: 
    NAME - name of the module. 
    NUMTAPSADAPT - adaptive filter number of taps 
         By default, NUMTAPSADAPT = 128; 
    NUMTAPSSEC - secondary path filter number of taps 
         By default, NUMTAPSSEC = 256; 
    NUMCHANREF - number of reference channels (x) 
         By default, NUMCHANREF = 1; 
    NUMCHANERR - number of error channels (e) 
         By default, NUMCHANERR = 1; 
    NUMCHANOUT - number of output channels (y) 
         By default, NUMCHANOUT = 1; 
    BLOCKSIZE - number of samples per block (must be 1 or greater) 
         By default, BLOCKSIZE = 1; 
    MEMHEAP - heap which hosts internal arrays. This is a string and 
         follows the memory allocation enumeration in Framework.h. 
         Allowable values are: 
            'AWE_HEAP_FAST' - always use internal DM memory (default). 
            'AWE_HEAP_FASTB' - always use internal PM memory. 
            'AWE_HEAP_SLOW' - always use external memory. 
            'AWE_HEAP_FAST2SLOW' - try DM, PM or external in that order 
            'AWE_HEAP_FASTB2SLOW' - try PM or external in that order 
            'AWE_HEAP_SHARED' - always use shared memory. 

Copyright (c) 2026 DSP Concepts, Inc.