TinyML on Microcontrollers on Embedded Systems Development

Inference on Cortex-M

Mon, 01 Jan 0001 00:00:00 +0000

Inference on Cortex-M#

The Arm Cortex-M family is the dominant target for TinyML inference. These processors have no operating system, no virtual memory, and no heap allocator in typical deployments. Execution is deterministic — a given model on a given input produces the same result in the same number of cycles every time, provided memory placement is controlled. All memory management is static: model weights reside in flash, the tensor arena occupies a fixed region of SRAM, and the application firmware shares both address spaces. The entire inference pipeline — from sensor input to classification output — runs in a single-threaded main loop or a pinned RTOS task with no context-switch overhead affecting latency.

Inference on ESP32

Mon, 01 Jan 0001 00:00:00 +0000

Inference on ESP32#

The ESP32 family from Espressif offers a different trade-off from the Cortex-M ecosystem: built-in Wi-Fi and Bluetooth, dual-core processors, external PSRAM support, and a FreeRTOS-based runtime — all at a price point under $5. These features make the ESP32 a natural platform for ML inference applications that need to collect data, run a model, and transmit results over a wireless link without additional networking hardware. The cost is a less deterministic execution environment than bare-metal Cortex-M: FreeRTOS scheduling, Wi-Fi stack interrupts, and shared-bus memory access all introduce latency variability that must be managed through careful task partitioning.

Arduino & Edge Impulse

Mon, 01 Jan 0001 00:00:00 +0000

Arduino & Edge Impulse#

Edge Impulse is an end-to-end development platform for embedded machine learning. It provides a web-based Studio interface that covers the entire pipeline: data collection from physical sensors, signal processing configuration, neural network training, model validation, and deployment as a compiled library for Arduino, ESP32, STM32, Nordic nRF, and other microcontroller targets. The platform abstracts much of the complexity of quantization, memory fitting, and operator optimization, making it possible to go from raw sensor data to a running inference on an Arduino Nano 33 BLE Sense or an ESP32-S3 DevKit without writing any training code or manually configuring TensorFlow Lite.

Designing for Memory Constraints

Mon, 01 Jan 0001 00:00:00 +0000

Designing for Memory Constraints#

On microcontrollers, memory is the first constraint and the last. A model that achieves 95% accuracy on a desktop means nothing if it requires 200 KB of SRAM on a device with 128 KB. Unlike cloud or even mobile ML, where memory is abundant and dynamically allocated, MCU inference demands that every byte be accounted for at compile time. There is no virtual memory, no swap space, no fallback. The model weights must fit in flash, the tensor arena must fit in SRAM, and both must coexist with the firmware, peripheral drivers, communication stacks, and application logic. A design process that starts with the memory budget and works backward to the model architecture avoids the costly iteration of training a model that cannot deploy.