FIR & IIR Filters for Audio#

Audio filtering on an MCU is dominated by two filter types: FIR (Finite Impulse Response) and IIR (Infinite Impulse Response). FIR filters offer linear phase and unconditional stability but require many taps for sharp cutoffs — a 48 kHz low-pass filter with a narrow transition band may need 200+ taps, each costing one multiply-accumulate per sample. IIR filters achieve equivalent frequency shaping with 5 coefficients (a single biquad section), but introduce phase distortion and can become unstable with quantized fixed-point coefficients. In embedded audio, IIR biquads handle the bulk of the work — EQ, crossovers, high-pass DC removal — while FIR filters appear in sample rate conversion, linear-phase requirements, and anti-aliasing.

Biquad IIR Filter#

The biquad (second-order IIR section) is the building block of embedded audio filtering. A single biquad implements one of the standard filter types — low-pass, high-pass, bandpass, notch, peaking EQ, or shelving — using five coefficients (b0, b1, b2, a1, a2) and a two-sample state memory.

Direct Form II Transposed#

The preferred implementation for embedded audio is Direct Form II Transposed (DF2T), which minimizes intermediate signal levels and reduces quantization noise in fixed-point:

/* Biquad DF2T — floating-point reference implementation */
typedef struct {
    float b0, b1, b2, a1, a2;
    float z1, z2;  /* State variables (delay elements) */
} biquad_t;

float biquad_process(biquad_t *f, float x)
{
    float y = f->b0 * x + f->z1;
    f->z1 = f->b1 * x - f->a1 * y + f->z2;
    f->z2 = f->b2 * x - f->a2 * y;
    return y;
}

Each output sample requires 5 multiplications and 4 additions — approximately 10 cycles on a Cortex-M4 with DSP extensions, or ~480,000 cycles per second at 48 kHz. A single Cortex-M4 at 168 MHz can run roughly 35 biquad sections in series on a mono channel before saturating.

Q15 Biquad Implementation#

/* Biquad DF1 — Q15 with 32-bit accumulator */
/* Coefficients scaled to Q15 (divide by a0, negate a1/a2) */
typedef struct {
    q15_t b0, b1, b2, a1, a2;
    q15_t x1, x2, y1, y2;  /* DF1 state */
} biquad_q15_t;

q15_t biquad_q15_process(biquad_q15_t *f, q15_t x)
{
    q31_t acc = 0;
    acc += (q31_t)f->b0 * x;
    acc += (q31_t)f->b1 * f->x1;
    acc += (q31_t)f->b2 * f->x2;
    acc -= (q31_t)f->a1 * f->y1;  /* Note: a1, a2 are negated in storage */
    acc -= (q31_t)f->a2 * f->y2;

    q15_t y = (q15_t)__SSAT(acc >> 15, 16);

    f->x2 = f->x1; f->x1 = x;
    f->y2 = f->y1; f->y1 = y;
    return y;
}

CMSIS-DSP provides arm_biquad_cascade_df1_q15() and arm_biquad_cascade_df2T_f32() that process entire blocks with optimized loop unrolling and SIMD instructions.

Coefficient Calculation#

Filter coefficients are calculated from the desired frequency, Q factor (bandwidth), and gain using the Robert Bristow-Johnson audio EQ cookbook formulas. These are typically computed offline or at initialization — not per-sample.

Common Filter Types#

Type	Parameters	Application
Low-pass	Fc, Q	Anti-aliasing, subwoofer crossover
High-pass	Fc, Q	DC removal, bass roll-off
Bandpass	Fc, BW	Vocal isolation, band selection
Notch	Fc, BW	Hum removal (50/60 Hz), feedback suppression
Peaking EQ	Fc, Q, Gain (dB)	Parametric equalizer bands
Low shelf	Fc, Gain (dB)	Bass boost/cut
High shelf	Fc, Gain (dB)	Treble boost/cut

# Python — Biquad coefficient calculation (peaking EQ)
import math

def peaking_eq(fs, fc, gain_db, Q):
    """Calculate biquad coefficients for peaking EQ filter."""
    A = 10 ** (gain_db / 40.0)
    w0 = 2 * math.pi * fc / fs
    alpha = math.sin(w0) / (2 * Q)

    b0 = 1 + alpha * A
    b1 = -2 * math.cos(w0)
    b2 = 1 - alpha * A
    a0 = 1 + alpha / A
    a1 = -2 * math.cos(w0)
    a2 = 1 - alpha / A

    # Normalize by a0
    return (b0/a0, b1/a0, b2/a0, a1/a0, a2/a0)

# 1 kHz peaking EQ, +6 dB, Q=1.4, at 48 kHz
coeffs = peaking_eq(48000, 1000, 6.0, 1.4)

Converting to Q15 Coefficients#

Biquad coefficients for audio filters can exceed 1.0 (especially a1, which approaches -2.0 for low-frequency filters at high sample rates). Fitting these into Q15 requires scaling:

Find the maximum absolute coefficient value.
Determine a post-shift value that scales all coefficients into Q15 range.
Apply the inverse shift to the accumulator after the MAC loop.

CMSIS-DSP’s arm_biquad_cascade_df1_q15() uses a postShift parameter for this purpose. A postShift of 1 means all coefficients are stored as Q14 (divided by 2), and the output is left-shifted by 1 after filtering.

Cascading Biquad Sections#

Higher-order filters are built by cascading multiple biquad sections in series. A 4th-order Butterworth low-pass requires two biquad sections; a 10-band parametric equalizer requires 10 sections.

/* CMSIS-DSP — 3-section biquad cascade, Q15 */
#define NUM_STAGES 3

static q15_t coeffs[5 * NUM_STAGES];   /* {b0,b1,b2,a1,a2} × 3 */
static q15_t state[4 * NUM_STAGES];    /* {x[n-1],x[n-2],y[n-1],y[n-2]} × 3 */
static arm_biquad_casd_df1_inst_q15 filter;

void eq_init(void)
{
    /* Fill coeffs[] with calculated values for each section */
    arm_biquad_cascade_df1_init_q15(&filter, NUM_STAGES, coeffs, state, 1);
}

void eq_process(q15_t *buffer, uint32_t block_size)
{
    arm_biquad_cascade_df1_q15(&filter, buffer, buffer, block_size);
}

Section ordering matters for fixed-point: place sections with the highest Q (narrowest bandwidth / highest gain) last, so they process signals that have already been attenuated by earlier sections. This reduces the chance of intermediate overflow.

FIR Filters#

FIR filters compute each output sample as a weighted sum of the current and past N-1 input samples:

y[n] = b[0]*x[n] + b[1]*x[n-1] + ... + b[N-1]*x[n-N+1]

When to Use FIR Over IIR#

Criterion	FIR	IIR (Biquad)
Phase response	Linear (symmetric coefficients)	Non-linear
Stability	Always stable	Can become unstable with quantization
Coefficients for sharp cutoff	100–500+	5–25 (cascaded biquads)
CPU per sample	N MACs	5 MACs per section
Latency	N/2 samples (group delay)	1–2 samples per section
Best for	Sample rate conversion, linear-phase crossovers	EQ, dynamics, real-time filtering

/* CMSIS-DSP FIR filter — Q15 */
#define FIR_TAPS 64

static q15_t fir_coeffs[FIR_TAPS];
static q15_t fir_state[FIR_TAPS + BLOCK_SIZE - 1];
static arm_fir_instance_q15 fir_inst;

void fir_init(void)
{
    arm_fir_init_q15(&fir_inst, FIR_TAPS, fir_coeffs, fir_state, BLOCK_SIZE);
}

void fir_process(q15_t *input, q15_t *output, uint32_t block_size)
{
    arm_fir_q15(&fir_inst, input, output, block_size);
}

Cycle Budgets#

At 48 kHz mono, the CPU budget per sample is the clock frequency divided by 48,000:

MCU	Clock	Cycles/Sample (48 kHz)	Approx. Biquad Sections	Approx. FIR Taps
Cortex-M0	48 MHz	1,000	~10	~50
Cortex-M4	168 MHz	3,500	~35	~300
Cortex-M7	480 MHz	10,000	~100	~800
ESP32 (one core)	240 MHz	5,000	~50	~400
ESP32-S3 (one core)	240 MHz	5,000	~50	~400

These estimates assume mono processing. Stereo halves the available budget per channel. Overhead from DMA callbacks, RTOS scheduling, and other tasks consumes 10–30% of the total budget.

Tips#

Design filter coefficients in Python or MATLAB, export as C arrays, and verify the frequency response on the MCU with a swept sine test signal. Coefficient quantization to Q15 can shift the actual cutoff frequency and Q from the design values.
For DC removal (high-pass at a very low frequency), a single first-order IIR filter is sufficient and cheaper than a biquad: y[n] = alpha * (y[n-1] + x[n] - x[n-1]) where alpha is close to 1.0 (e.g., 0.995 for a ~7 Hz cutoff at 48 kHz).
Use block processing (process 64–256 samples per call) rather than sample-by-sample calls to reduce function call overhead and enable SIMD optimization.

Caveats#

IIR filters with low cutoff frequencies at high sample rates produce coefficients very close to the limits of Q15 representation. A 20 Hz high-pass at 48 kHz may require Q31 coefficients to avoid instability. If the filter oscillates or produces DC offset, coefficient quantization is likely the cause.
Changing biquad coefficients while audio is flowing (e.g., real-time EQ adjustment) can produce clicks if the state variables are not handled. Cross-fading between two filter instances, or using coefficient smoothing (updating coefficients gradually over 10–50 ms), avoids the discontinuity.
CMSIS-DSP arm_biquad_cascade_df1_q15() expects coefficients in a specific order and with negated a1/a2 values. Providing standard-form coefficients without negation produces a filter with completely wrong behavior but no error indication.

In Practice#

Filter produces a constant DC offset that grows over time — a sign of IIR instability caused by coefficient quantization. The poles have moved outside the unit circle due to rounding. Switching to Q31 coefficients or redesigning with a slightly higher cutoff frequency typically resolves it.
EQ adjustment produces an audible click — the biquad state variables (z1, z2) contain energy from the previous coefficient set. Instantaneously changing coefficients creates a discontinuity in the output. Implementing coefficient interpolation or double-buffered filter instances with cross-fading eliminates the artifact.
Filter response does not match the design curve — Q15 coefficient quantization shifts the center frequency and Q of narrow-band filters (high Q). Measuring the actual frequency response on the target hardware with a swept sine confirms whether quantization is the source of the deviation.