FeNLMSUser
Overview
Filtered-error normalized LMS module which works with the StateAllocator and CoeffAllocator modules
Discussion
This module implements a filtered-error LMS adaptive algorithm. It is based on the Adjoint LMS by Eric A. Wan. The implementation is MIMO and has optional input normalization and a leakage factor. The module is a "user" type and works in conjunction with other "user" and "allocator" modules.
FeNLMSUser will adapt a set of FIR filter coefficients in the time domain following a filtered-error LMS approach. Coefficients are adapted only once per pump, regardless of the blocksize. The module will take only the latest available state sample to do so (i.e. Block LMS is not used). State and coefficients for the adaptive filter are shared by several modules and are hosted elsewhere. State is expected to come in forward-order and coefficients in reverse-order. In Adjoint LMS, the secondary-path filter is already a time-reversed model of the secondary path. Hence, in this implementation, reversing these coeffs results in using the model in original forward-order.
SRef and SErr are the reference and error state inputs respectively. They connect to StateAllocator modules.
StateAllocator modules host (allocate) state data and provide a reference to its internal buffer through an "S" frame.
StateAllocator for the error input must be configured to hold at least numTapsSec-1 delay samples.
StateAllocator for the reference input must be configured to hold at least numTapsAdapt+delaySampsRef-1 delay samples.
Both should be set to "deinterleave" mode and input is expected to be real numbers only.
C1 and C2 are coefficient inputs for the adaptive filter. They connect to CoeffAllocator modules.
CoeffAllocator modules host (allocate) coefficient data and provide a reference to its internal buffer through a "C" frame.
C1 and C2 alternate every pump, one as source and the other as destination. The coeffSelect variable reports 0 when C1 is source and 1 when C2 is source.
The same CoeffAllocator can be connected to both C1 and C2 to adapt coeffs "in place" (read and write from/to same buffer).
Two separate CoeffAllocator modules on C1 and C2 are useful in multi-thread/instance designs to avoid race conditions.
The output (C) forwards the "C" frame which was last used as destination (latest adapted coeffs). Forwarding only occurs if no error is reported.
CoeffAllocator module(s) must be configured to have numChanRef*numChanOut channels and numTapsAdapt samples per channel.
The default set of coefficients is zeros, but the arrays are tunable and can contain non-zero coefficients to start with.
If setting startup coefficients, they must be stored in reverse order. The channels are ordered by output number first, then by reference number.
Example: for 3 output channels with 2 references, the channel order is O1R1, O1R2, O2R1, O2R2, O3R1, O3R2.
CoeffAllocators store the channels in non-interleaved format. Coefficients should be real numbers only.
When the runtime status of the module is Muted/Bypassed/Inactive, its process function is not called. This module does not keep internal state for the filters, so it will not produce incorrect outputs due to incomplete history. However, if the StateAllocators feeding it are set to inactive states at the same time, filter history will be interrupted and the secondary path filter will produce incorrect output for a few blocks after re-activating. It is recommended to freeze adaptation for a few blocks after toggling back to Active status from any other state. C1/C2 input toggling is also stopped during non-Active modes and it will resume to its previous state when Active again. Muted state will set C1 output to zeros, Bypassed state will forward C1 input to C1 output and Inactive state will leave C1 output to its pre-Inactive-state value.
The module will report an error in the errorCode variable (shown in inspector) if input frame values do not match expected ones. The reported error is the first one encountered, so there may be more. A list of possible errors is detailed further down. Adaptation does not start until all inputs are deemed correct. To reduce error-checking overhead, the module will go into locked state and will only run full checking again if heap numbers or offsets change. When an error is found, the output frame offset value is set to 0 (output wire may contain other undefined values).
The argument NUMTAPSADAPT specifies the adaptive filter length. CoeffAllocator(s) samples-per-channel must be set to this same value.
Set this number to be a multiple of the internal memory alignment for a better chance at optimal performance.
The argument NUMTAPSSEC specifies the secondary-path filter length. This filter is hosted within the module since the coeffs are not shared.
Set this number to be a multiple of the internal memory alignment for a better chance at optimal performance.
As previously explained, the secondary filter coefficient order must match the secondary-path model order.
Channels are ordered by output number first, then by error number.
Example: for 3 output channels with 2 error channels, the channel order is O1E1, O1E2, O2E1, O2E2, O3E1, O3E2.
The argument NUMCHANREF specifies the number of reference channels. This must match the number of channels connected to the reference StateAllocator.
The argument NUMCHANERR specifies the number of error channels. This must match the number of channels connected to the error StateAllocator.
The argument NUMCHANOUT specifies the number of LMS output (or control) channels. This is used for calculating the total number of channels in the adaptive and secondary filters.
The expected coeffs channel count at the input is numChanRef*numChanOut. The secondary path filter is sized to have numChanErr*numChanOut channels.
The argument MEMHEAP specifies the heap to use for internal arrays. This includes the secondary-path coeffs and helper arrays.
When AWE_HEAP_SHARED is used for state or coefficient buffers, the module invalidates the cache on source buffers before processing and writes back the destination coefficient buffer after adaptation, ensuring coherency in multi-core configurations.
The variable delaySampsRef specifies the delay in samples to apply to the reference before adaptation.
This is meant to compensate for latency of the error vs. the ref and should normally be at least equal to NUMTAPSSEC, but might need to be larger (e.g. when ref/error blocksize is not 1, this delay might need to be increased by blocksize-1 samples).
Changing delaySampsRef at runtime resets the locked state, so a full input validation re-check is performed on the next pump.
The variable normalize enables normalization of the reference before adaptation when set to 1 (NLMS). Normalization has an added computational cost (see below) but allows the best convergence rate.
The variable stepSize specifies the adaptation learning rate. When set to 0, adaptation is frozen and the last adapted CoeffAllocator frame is forwarded (or default if first pass). Because the adaptation algorithm is not run, computational load is reduced significantly.
The variable leakFactor specifies the adaptation leakage factor. When set to 1, algorithm is non-leaky.
Computational cost
Each pump executes three algorithmic steps.
Let R = numChanRef, E = numChanErr, M = numChanOut, P = numTapsAdapt, Q = numTapsSec.
Step 1 — Secondary-path filtering of the error (filtered error):
For each of the M output channels, the E error state channels are each filtered through a Q-tap FIR and summed together, producing one filtered-error scalar per output channel.
- MACs: M × E × Q
Step 2 — Reference loading and normalization (once per pump):
The P most recent (delayed) reference samples per channel are read from the StateAllocator into a linear buffer. If normalization is enabled, the reference power denominator is computed:
- Mode 0 (off): no MACs
- Mode 1 (reference power): R × P multiply-accumulates (sum of squared samples across all reference channels and taps)
Step 3 — Coefficient update (once per pump):
For each output channel m, the filtered error and normalization factor are combined into a scalar step step_m = filtError[m] × stepSize / norm. All R×P adaptive coefficients for that output are then updated in a single SAXPY-style pass: w[j] = leakFactor × w[j] + step_m × x[j].
- MACs: M × R × P
Error list
| Code | Input | Meaning |
|---|---|---|
| 0 | — | No error |
| -1 | SRef (reference) | Heap number out of range |
| -2 | SErr (error) | Heap number out of range |
| -3 | C1 | Heap number out of range |
| -4 | C2 | Heap number out of range |
| -11 | SRef (reference) | Heap does not exist on target, offset is zero, or reported size too large for heap |
| -12 | SErr (error) | Heap does not exist on target, offset is zero, or reported size too large for heap |
| -13 | C1 | Heap does not exist on target, offset is zero, or reported size too large for heap |
| -14 | C2 | Heap does not exist on target, offset is zero, or reported size too large for heap |
| -21 | SRef (reference) | Incorrect format — deinterleaving, complexity, number of channels, or channel length |
| -22 | SErr (error) | Incorrect format — deinterleaving, complexity, number of channels, or channel length |
| -23 | C1 | Incorrect format — complexity, number of channels, or channel length |
| -24 | C2 | Incorrect format — complexity, number of channels, or channel length |
Note: errors are checked and reported at run-time
See also: FIRUser FxNLMSUser StateAllocator CoeffAllocator
See usage examples in the AllocatorUser_*.awd layouts found in the \<AWE install folder>/Examples/Module_Usage folder
Module Pack
Advanced
ClassID
classID = 1461
Type Definition
typedef struct _ModuleFeNLMSUser
{
ModuleInstanceDescriptor instance; // Common Audio Weaver module instance structure
INT32 numTapsAdapt; // Length of adaptive filter.
INT32 numTapsSec; // Length of secondary path filter.
INT32 numChanRef; // Number of reference (x) channels.
INT32 numChanErr; // Number of error (e) channels.
INT32 numChanOut; // Number of output (y) channels.
UINT32 delaySampsRef; // Reference-signal delay in samples.
UINT32 normalize; // When true, reference is normalized for adaptation.
FLOAT32 stepSize; // Adaptation step size.
FLOAT32 leakFactor; // Adaptation leakage factor (1.0 is non-leaky).
INT32 chanLenCircErr; // Length of error state per channel as set by StateAllocator.
INT32 chanLenCircRef; // Length of reference state per channel as set by StateAllocator.
UINT32 pastOffsetCoeffs1; // Keeps past received coeffs offset
UINT32 pastOffsetCoeffs2; // Keeps past received coeffs offset
UINT32 pastOffsetRef; // Keeps past received reference offset
UINT32 pastOffsetErr; // Keeps past received error offset
UINT32 pastHeapCoeffs1; // Keeps past received coeffs heap
UINT32 pastHeapCoeffs2; // Keeps past received coeffs heap
UINT32 pastHeapRef; // Keeps past received reference heap
UINT32 pastHeapErr; // Keeps past received error heap
INT32 locked; // Locks pointers and sizes safety checks.
INT32 invalidateCoeffs1; // Flag to invalidate cache for coeffs.
INT32 invalidateCoeffs2; // Flag to invalidate cache for coeffs.
INT32 invalidateRef; // Flag to invalidate cache for state.
INT32 invalidateErr; // Flag to invalidate cache for state.
INT32 errorCode; // Captures errors that could arise at run-time and is shown on the inspector
INT32 coeffSelect; // Coeff input selector. Toggled once per pump. When 0, C1 is the source and C2 the destination, else direction is reversed.
INT32 chanLenLinErr; // Length of error state per channel for internal linear buff.
INT32 chanLenLinRef; // Length of reference state per channel for internal linear buff.
INT32 arrayHeap; // Heap in which to allocate secondary path coeffs and internal arrays.
FLOAT32* coeffsSec; // Secondary path filter coeff array in reverse order
FLOAT32* aFiltErrorOut; // Filtered error output array
FLOAT32* aLinStateErr; // Error linear state buffer
FLOAT32* aLinStateRef; // Reference linear state buffer
float * stateRef; // Keeps past calculated pointer
float * stateErr; // Keeps past calculated pointer
float * coeffsAdapt1; // Keeps past calculated pointer
float * coeffsAdapt2; // Keeps past calculated pointer
void * hardware_specific_struct_pointer_fir; // General purpose pointer for target-specific cases
void * hardware_specific_struct_pointer; // General purpose pointer for target-specific cases
} ModuleFeNLMSUserClass;
Variables
Properties
| Name | Type | Usage | isHidden | Default Value | Range | Units |
|---|---|---|---|---|---|---|
| numTapsAdapt | int | const | 0 | 128 | Unrestricted | |
| numTapsSec | int | const | 0 | 256 | Unrestricted | |
| numChanRef | int | const | 0 | 1 | Unrestricted | |
| numChanErr | int | const | 0 | 1 | Unrestricted | |
| numChanOut | int | const | 0 | 1 | Unrestricted | |
| delaySampsRef | uint | parameter | 0 | 256 | Unrestricted | samples |
| normalize | uint | parameter | 0 | 1 | 0:1 | |
| stepSize | float | parameter | 0 | 0.01 | 0:1 | |
| leakFactor | float | parameter | 0 | 1 | 0:1 | |
| chanLenCircErr | int | derived | 0 | 256 | Unrestricted | |
| chanLenCircRef | int | derived | 0 | 128 | Unrestricted | |
| pastOffsetCoeffs1 | uint | state | 0 | 0 | Unrestricted | |
| pastOffsetCoeffs2 | uint | state | 0 | 0 | Unrestricted | |
| pastOffsetRef | uint | state | 0 | 0 | Unrestricted | |
| pastOffsetErr | uint | state | 0 | 0 | Unrestricted | |
| pastHeapCoeffs1 | uint | state | 0 | 0 | Unrestricted | |
| pastHeapCoeffs2 | uint | state | 0 | 0 | Unrestricted | |
| pastHeapRef | uint | state | 0 | 0 | Unrestricted | |
| pastHeapErr | uint | state | 0 | 0 | Unrestricted | |
| locked | int | state | 0 | 0 | Unrestricted | |
| invalidateCoeffs1 | int | state | 0 | 0 | Unrestricted | |
| invalidateCoeffs2 | int | state | 0 | 0 | Unrestricted | |
| invalidateRef | int | state | 0 | 0 | Unrestricted | |
| invalidateErr | int | state | 0 | 0 | Unrestricted | |
| errorCode | int | state | 0 | 0 | Unrestricted | |
| coeffSelect | int | state | 0 | 0 | Unrestricted | |
| chanLenLinErr | int | const | 1 | 256 | Unrestricted | |
| chanLenLinRef | int | const | 1 | 128 | Unrestricted | |
| arrayHeap | int | const | 1 | 1 | Unrestricted | |
| coeffsSec | float* | parameter | 0 | [256 x 1] | Unrestricted | |
| aFiltErrorOut | float* | state | 0 | [1 x 1] | Unrestricted | |
| aLinStateErr | float* | state | 0 | [256 x 1] | Unrestricted | |
| aLinStateRef | float* | state | 0 | [128 x 1] | Unrestricted | |
| stateRef | float * | state | 1 | Unrestricted | ||
| stateErr | float * | state | 1 | Unrestricted | ||
| coeffsAdapt1 | float * | state | 1 | Unrestricted | ||
| coeffsAdapt2 | float * | state | 1 | Unrestricted | ||
| hardware_specific_struct_pointer_fir | void * | state | 1 | Unrestricted | ||
| hardware_specific_struct_pointer | void * | state | 1 | Unrestricted |
Pins
Input Pins
| Name | SRef |
| Description | Reference StateAllocator frame input |
| Data type | int |
| Channel range | 7 |
| Block size range | 1 |
| Sample rate range | Unrestricted |
| Complex support | Real |
| Name | SErr |
| Description | Error StateAllocator frame input |
| Data type | int |
| Channel range | 7 |
| Block size range | 1 |
| Sample rate range | Unrestricted |
| Complex support | Real |
| Name | C1 |
| Description | Adaptive coeffs CoeffAllocator frame input |
| Data type | int |
| Channel range | 5 |
| Block size range | 1 |
| Sample rate range | Unrestricted |
| Complex support | Real |
| Name | C2 |
| Description | Adaptive coeffs CoeffAllocator frame input |
| Data type | int |
| Channel range | 5 |
| Block size range | 1 |
| Sample rate range | Unrestricted |
| Complex support | Real |
Output Pins
| Name | C |
| Description | Adaptive coeffs CoeffAllocator frame out |
| Data Type | int |
Matlab Usage
File Name: fenlms_user_module.m
M = fenlms_user_module(NAME, NUMTAPSADAPT, NUMTAPSSEC, NUMCHANREF, NUMCHANERR, NUMCHANOUT, MEMHEAP)
Creates a module which implements a filtered-error LMS adaptive algorithm
(also known as Adjoint LMS). The implementation is MIMO and has optional
input normalization and a leakage factor. This "user" module works with
other "user" and "allocator" modules.
Arguments:
NAME - name of the module.
NUMTAPSADAPT - adaptive filter number of taps
By default, NUMTAPSADAPT = 128;
NUMTAPSSEC - secondary path filter number of taps
By default, NUMTAPSSEC = 256;
NUMCHANREF - number of reference channels (x)
By default, NUMCHANREF = 1;
NUMCHANERR - number of error channels (e)
By default, NUMCHANERR = 1;
NUMCHANOUT - number of output channels (y)
By default, NUMCHANOUT = 1;
MEMHEAP - heap which hosts internal arrays. This is a string and
follows the memory allocation enumeration in Framework.h.
Allowable values are:
'AWE_HEAP_FAST' - always use internal DM memory (default).
'AWE_HEAP_FASTB' - always use internal PM memory.
'AWE_HEAP_SLOW' - always use external memory.
'AWE_HEAP_FAST2SLOW' - try DM, PM or external in that order
'AWE_HEAP_FASTB2SLOW' - try PM or external in that order
'AWE_HEAP_SHARED' - always use shared memory.
Copyright (c) 2026 DSP Concepts, Inc.