Embedded OS Landscape#

Once an MPU-class platform is chosen, the operating system defines how applications interact with hardware. The choice spans from full Linux distributions to lightweight RTOSes running on MPU-capable chips. Each option makes tradeoffs across boot time, image size, real-time capability, package availability, and long-term maintainability. Understanding the practical characteristics of each OS option — not just the marketing pitch — is essential for avoiding costly migrations later in a project.

The landscape divides broadly into three categories:

  • Pre-built Linux distributions (Raspberry Pi OS, Ubuntu Server, Armbian) — ready to flash, immediate productivity, but carry the weight of a general-purpose OS
  • Custom Linux build systems (Yocto, Buildroot) — produce tailored images with only the required components, at the cost of build system expertise and longer development cycles
  • Non-Linux alternatives (Zephyr, NuttX, Android, Windows IoT) — each addresses a specific niche: hard real-time, POSIX compatibility on bare metal, rich UI, or enterprise Windows integration

The decision is rarely permanent. A common trajectory starts with a pre-built distribution for prototyping, evaluates custom build systems as the product matures, and may incorporate an RTOS on a secondary core for time-critical I/O. Understanding the full landscape from the beginning prevents architectural decisions that make later transitions unnecessarily painful.


Linux Distributions for Embedded Use#

Linux dominates the embedded MPU space because it provides a complete networking stack, filesystem support, driver ecosystem, and user-space tooling out of the box. The kernel alone supports over 30 filesystem types, hundreds of network protocols, and thousands of hardware drivers. The differences between distributions matter most at the edges: boot time, image size, update mechanisms, and the depth of board support.

The distributions below are listed in order of decreasing abstraction — from ready-to-flash images that boot in minutes to build systems that require days of configuration before producing the first image.

Raspberry Pi OS#

Raspberry Pi OS is a Debian-based distribution maintained by the Raspberry Pi Foundation, purpose-built for Pi hardware. It ships with the apt package manager and full access to the Debian package archive, making it straightforward to install development tools, libraries, and services.

Variants:

VariantDesktopSize (compressed)Typical Use
Desktop (Full)LXDE + recommended software~2.5 GBGUI-based development, kiosks
Desktop (Standard)LXDE, minimal apps~1.1 GBLighter GUI applications
LiteHeadless, no GUI~450 MBServers, headless appliances

Practical characteristics:

  • First-class hardware support for all Pi models — GPIO, SPI, I2C, camera, DSI display all work without additional configuration
  • raspi-config provides a single tool for enabling interfaces, setting boot behavior, configuring Wi-Fi country, and resizing the root partition
  • The kernel is a Raspberry Pi Foundation fork of mainline Linux, with patches for VideoCore GPU, PCIe on Pi 5, and RP1 south bridge
  • apt update && apt upgrade pulls security and feature updates from the Pi Foundation’s package mirrors
  • Default boot time on a Pi 4 with Lite: approximately 20-30 seconds from power-on to shell prompt
  • The root filesystem image, once expanded, occupies 1.5-2 GB minimum for Lite

Update and maintenance model:

  • The Pi Foundation maintains its own package mirror, rebasing periodically on Debian stable releases (Bullseye, Bookworm)
  • rpi-update provides bleeding-edge kernel and firmware updates, but is intended for testing — not production use — as it pulls unreleased builds that may introduce regressions
  • The vcgencmd utility exposes GPU temperature, clock speeds, throttling status, and codec information — useful for monitoring thermal behavior in enclosed deployments
  • Automatic partition resize on first boot expands the root filesystem to fill the SD card or eMMC, eliminating manual partitioning

Strengths and limits:

Raspberry Pi OS is the fastest path from unboxing to a working system. For prototyping, sensor integration, and bench development, it is the default starting point on Pi hardware. The tradeoff is image size and boot time — both become problematic when the target is a commercial product that needs to boot in under 5 seconds or fit on a small eMMC. The Lite variant still includes hundreds of packages that a purpose-built appliance does not need — locale data, man pages, documentation, and development headers that consume storage and increase the attack surface.

Ubuntu Server (ARM)#

Canonical publishes official Ubuntu Server images for ARM platforms, including Raspberry Pi (3, 4, 5, Zero 2W), NVIDIA Jetson (Orin, Xavier), and select Qualcomm boards. The familiar apt package ecosystem extends to ARM without modification, and Canonical’s snap package system provides containerized application distribution.

Practical characteristics:

  • Same package archive as desktop Ubuntu — most x86-targeted tutorials and Ansible playbooks work without modification on ARM
  • Long-term support (LTS) releases receive 5 years of security updates, extendable to 10 years with Ubuntu Pro
  • snap packages provide isolated, auto-updating application containers — useful for deploying services like Mosquitto, Grafana, or Node-RED
  • Kernel is the Ubuntu generic ARM kernel, not board-vendor-optimized — this means some hardware-specific features may require manual kernel module installation or vendor PPA additions
  • Image size is larger than Pi OS Lite: approximately 700 MB compressed, expanding to 3+ GB on disk
  • Boot time on a Pi 4: approximately 25-40 seconds to shell prompt

Cloud-init and provisioning:

  • Ubuntu Server ARM images include cloud-init by default, enabling automated provisioning from a user-data file on the boot partition
  • Network configuration, SSH keys, package installation, and custom scripts can all be specified declaratively before first boot
  • This model aligns with fleet provisioning workflows — the same cloud-init configuration used for cloud VMs works on ARM edge devices
  • netplan handles network configuration via YAML files, replacing the traditional /etc/network/interfaces approach

When it fits:

Ubuntu Server ARM is a practical choice when the development team already operates Ubuntu infrastructure and wants consistency between cloud, desktop, and edge deployments. The snap ecosystem simplifies application packaging for fleet updates. The cost is a heavier base image and longer boot time compared to Pi OS Lite or a custom-built distribution.

Armbian#

Armbian is a community-maintained project that provides Debian- and Ubuntu-based images for over 100 ARM single-board computers. For boards outside the Raspberry Pi ecosystem — Orange Pi, Banana Pi, Radxa Rock, Pine64, Khadas, and dozens of others — Armbian is often the best-supported Linux option available.

Practical characteristics:

  • Two kernel tracks per board: vendor kernel (often based on an older Linux version with the SoC manufacturer’s patches) and mainline kernel (upstream Linux with varying levels of board support)
  • armbian-config provides a configuration tool similar in purpose to raspi-config
  • Hardware support quality varies dramatically between boards — some boards have complete GPU, NPU, Wi-Fi, and Bluetooth support, while others have known-broken features documented in the board’s status page
  • Community-driven development means release cadence and bug-fix responsiveness depend on volunteer contributor availability
  • Image sizes are comparable to Debian/Ubuntu base images: 400-800 MB compressed

Kernel track decision:

TrackKernel Version (typical)Hardware SupportStability
Vendor4.x or 5.x (SoC-specific)Full SoC peripherals, GPU accelerationOften stable but unpatched
Mainline6.x (upstream)Varies — may lack GPU, NPU, or specific peripheral driversBetter security posture, ongoing updates

The vendor kernel typically provides more complete hardware support, especially for GPU acceleration and NPU access, but may be based on a Linux version that no longer receives upstream security patches. The mainline kernel offers better long-term security but may not support all board peripherals.

Board support tiers:

Armbian classifies board support into tiers that indicate the level of testing and maintenance:

TierMeaningPractical Implication
SupportedActively maintained, tested each releaseSafe for development and light production use
Standard SupportMaintained, less frequent testingLikely works but may have release-to-release regressions
CommunityVolunteer-maintained, minimal testingExpect issues — check forums before relying on it
WIP / CSCWork in progress or end-of-lifeUsable for experimentation only

When it fits:

Armbian is the default starting point for non-Pi ARM boards. Before committing to a specific SBC, checking the Armbian board support page reveals whether the target board has active maintainer support or is effectively orphaned. The tier classification provides an honest assessment of what to expect — a board listed as “Community” may have an image that boots, but driver issues and kernel regressions are the norm rather than the exception.

Yocto Project#

The Yocto Project is a build system — not a distribution — for creating custom Linux images tailored to specific hardware and application requirements. It uses a layer-based architecture where Board Support Packages (BSPs), middleware, and application components are organized into separate meta-layers that can be composed, versioned, and shared independently.

Architecture overview:

┌─────────────────────────────────────────────────────┐
│                   Yocto Build System                │
│                                                     │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────┐ │
│  │  meta-poky   │  │ meta-raspberrypi│ │ meta-app  │ │
│  │ (reference   │  │ (BSP layer)    │  │ (custom   │ │
│  │  distro)     │  │               │  │  recipes)  │ │
│  └──────┬──────┘  └──────┬───────┘  └─────┬──────┘ │
│         │               │                │         │
│         ▼               ▼                ▼         │
│  ┌─────────────────────────────────────────────┐   │
│  │           BitBake Build Engine              │   │
│  │  (fetches sources, cross-compiles, packages)│   │
│  └─────────────────────┬───────────────────────┘   │
│                        │                            │
│                        ▼                            │
│  ┌─────────────────────────────────────────────┐   │
│  │         Output: rootfs, kernel, bootloader   │   │
│  │         (custom image, SDK, packages)        │   │
│  └─────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────┘

Practical characteristics:

  • Builds the entire Linux stack from source: bootloader, kernel, root filesystem, and application packages
  • A first build on a modern workstation (8+ cores, 32 GB RAM, SSD) takes 2-4 hours and consumes 50-100 GB of disk space
  • Subsequent incremental builds are faster — BitBake’s shared-state cache (sstate) reuses previously built components
  • The layer system allows teams to separate BSP changes from application changes, enabling independent versioning and reuse across products
  • Output images can be extremely minimal: a functional Linux image with BusyBox, a custom application, and networking support can fit in 30-50 MB
  • Recipe-based: each software component (package) is defined by a recipe (.bb file) that specifies source location, build steps, dependencies, and installation paths
  • Yocto releases are named (e.g., Kirkstone, Scarthgap) and version-locked — layers must target the same release to be compatible

Standard layers:

LayerPurpose
meta (openembedded-core)Core recipes — glibc, busybox, systemd, kernel
meta-pokyReference distribution policies
meta-yocto-bspReference BSP for QEMU and BeagleBone
meta-raspberrypiBSP for Raspberry Pi boards
meta-tiBSP for TI processors (AM335x, AM62x, etc.)
meta-freescaleBSP for NXP i.MX processors
meta-tegraBSP for NVIDIA Jetson platforms
meta-openembeddedAdditional packages (networking, Python, multimedia)

Development workflow:

A typical Yocto development cycle involves:

  1. Set up the build environment: source oe-init-build-env
  2. Configure layers in conf/bblayers.conf
  3. Set machine, distribution, and image targets in conf/local.conf
  4. Build the image: bitbake core-image-minimal (or a custom image recipe)
  5. Flash the output image to SD card, eMMC, or deploy via network boot
  6. Iterate: modify recipes, rebuild incrementally

The devtool utility streamlines common development tasks — adding new recipes, modifying existing packages, and creating patches without manually editing recipe files. For kernel development, devtool modify virtual/kernel extracts the kernel source into a workspace directory where changes can be made, tested, and converted back into patches for the BSP layer.

Image classes:

ImageContentsTypical Size
core-image-minimalBusyBox, init system, base utilities30-50 MB
core-image-baseAbove + package manager, hardware support80-150 MB
core-image-full-cmdlineAbove + standard Linux utilities150-300 MB
core-image-satoAbove + Sato UI (GTK-based reference GUI)300-500 MB

When it fits:

Yocto is the standard choice for commercial and industrial embedded Linux products where image size, boot time, reproducibility, and long-term maintenance matter. The investment in learning the build system and maintaining custom layers pays off when shipping products at scale. For one-off prototypes or small-batch projects, the overhead is rarely justified.

Buildroot#

Buildroot is a simpler alternative to Yocto for generating custom Linux root filesystem images. Where Yocto uses a layer-based architecture with hundreds of configuration variables and a custom build engine, Buildroot uses make menuconfig (the same interface as the Linux kernel configuration) and a straightforward Makefile-based build process.

Practical characteristics:

  • Configuration is done through make menuconfig — a single, navigable menu covering toolchain, kernel, bootloader, root filesystem, and packages
  • Build times are significantly shorter than Yocto: a minimal image builds in 15-30 minutes on a modern workstation
  • Output is a single root filesystem image (ext4, squashfs, or initramfs) plus kernel and bootloader binaries
  • Package selection is curated — approximately 2,500 packages compared to Yocto/OpenEmbedded’s 10,000+
  • Minimal images can be under 10 MB, booting in 1-2 seconds on a Cortex-A-class processor
  • No package manager in the generated image by default — the image is built once and deployed as a whole
  • External tree mechanism (BR2_EXTERNAL) allows maintaining custom board definitions and packages outside the Buildroot source tree

Buildroot vs. Yocto comparison:

CharacteristicBuildrootYocto
Configuration modelSingle make menuconfigLayer-based recipes
First build time15-30 minutes2-4 hours
Disk usage during build5-15 GB50-100 GB
Minimum image size~8 MB~30 MB
Package count~2,500~10,000+
Learning curveModerateSteep
SDK generationCross-compilation toolchainFull extensible SDK
ReproducibilityGood (with version pinning)Excellent (sstate, hash equivalence)
Commercial adoptionModerateHigh (industry standard)

Filesystem format options:

Buildroot can generate the root filesystem in several formats, each suited to different deployment models:

FormatCharacteristicsUse Case
ext4Read-write, journaledDevelopment, devices needing runtime persistence
squashfsRead-only, compressedProduction appliances, reliable power-loss recovery
initramfs (cpio)Loaded entirely into RAM at bootFastest boot, no storage dependency after boot
UBI/UBIFSFlash-aware, wear levelingRaw NAND flash (not eMMC or SD)

The squashfs + overlayfs combination is a common production pattern: the base root filesystem is read-only (squashfs), and a writable overlay (stored on a separate partition) captures runtime changes. This provides both the reliability of a read-only root and the flexibility of persistent configuration.

When it fits:

Buildroot is the practical starting point for teams building their first custom embedded Linux image, especially for appliance-type devices running a single application. The simpler configuration model and faster build times reduce the initial barrier. The tradeoff is less flexibility and a smaller package ecosystem compared to Yocto — projects that outgrow Buildroot’s capabilities often migrate to Yocto.


Non-Linux Alternatives#

Not every MPU-class chip needs to run Linux. Several alternative operating systems target the same hardware with different tradeoffs — prioritizing real-time determinism, POSIX compatibility, rich UI frameworks, or enterprise integration over Linux’s general-purpose flexibility.

The key question is whether the application requirements exceed what Linux provides (hard real-time, minimal footprint) or require something Linux does not naturally offer (Android app ecosystem, Windows enterprise integration). In many cases, the answer is a hybrid: Linux on the main application core with a lightweight RTOS on a secondary core.

Zephyr RTOS on MPU-Class Hardware#

Zephyr is primarily an RTOS for microcontrollers (Cortex-M, RISC-V RV32), but it also runs on certain MPU-capable chips. The typical scenario is an SoC with multiple cores where one core runs Linux and another runs Zephyr — but Zephyr can also serve as the sole OS on simpler MPU-class chips where the application needs deterministic real-time scheduling without the overhead of Linux.

Practical characteristics:

  • Minimal footprint: a Zephyr application image can be as small as 8 KB for a basic kernel, though real applications with networking and drivers typically range from 64-256 KB
  • Deterministic scheduling with configurable tick rate — interrupt-to-task latency is typically under 10 microseconds on Cortex-M, somewhat higher on Cortex-A without MMU
  • POSIX compatibility layer allows porting some Linux user-space code, though coverage is incomplete
  • Device tree-based hardware description, similar to Linux — the same .dts file format, though Zephyr’s device tree processing differs
  • West meta-tool manages the build system, SDK, and multi-repository workspace

MPU-class targets where Zephyr runs:

PlatformCoreTypical Use Case
nRF5340 (network core)Cortex-M33BLE controller alongside application core
STM32MP1 (Cortex-M4 core)Cortex-M4Real-time I/O coprocessor
i.MX RT1170Cortex-M7 + M4High-performance MCU with MPU-class peripherals

Zephyr’s device tree processing:

Unlike Linux, which parses the device tree at runtime, Zephyr processes the device tree at compile time through a set of Python scripts that generate C header files. This means:

  • Device tree changes require recompilation — there is no runtime device tree overlay mechanism
  • The generated headers provide direct memory-mapped register addresses as compile-time constants, enabling the compiler to optimize peripheral access
  • Zephyr device tree bindings (.yaml files) are separate from Linux bindings and may not cover the same set of properties
  • A device tree source file (.dts) can be shared between Linux and Zephyr in a heterogeneous design, but each OS extracts different nodes from it

When it fits:

Zephyr on MPU-class hardware makes sense when the application requires hard real-time guarantees (sub-millisecond response times) that Linux, even with PREEMPT_RT, cannot reliably provide. The most common deployment is as a coprocessor OS in a heterogeneous multicore design, not as the sole OS on a board that could run Linux.

NuttX#

NuttX is a POSIX-compatible RTOS that aims to feel like a small Unix system. It provides a VFS layer, standard C library, pthreads, sockets, and many other POSIX interfaces — making it possible to port significant amounts of Linux user-space code to NuttX with minimal modification.

Practical characteristics:

  • Strong POSIX compliance — applications using standard POSIX APIs (file I/O, pthreads, sockets, select/poll) often compile with no changes
  • Runs on both MCU targets (Cortex-M, RISC-V) and MPU targets (Cortex-A, MIPS)
  • NuttShell (NSH) provides a command-line interface with familiar Unix commands (ls, cat, ps, ifconfig)
  • Flat and protected build modes — protected mode uses the MPU to isolate kernel and user space, similar to Linux’s memory protection model but without a full MMU
  • Smaller community than Zephyr or FreeRTOS — fewer third-party libraries and board ports, but active development under the Apache Software Foundation
  • Build system uses make menuconfig, similar to the Linux kernel and Buildroot

NuttX vs. Zephyr comparison:

CharacteristicNuttXZephyr
API modelPOSIX (file descriptors, pthreads, sockets)Native Zephyr API (k_thread, k_sem, k_fifo)
ShellNuttShell (NSH) — Unix-likeZephyr shell module — command-based
FilesystemVFS with FAT, littlefs, NFS, procfsLimited VFS, littlefs, FAT
NetworkingBSD sockets (native)BSD sockets (compatibility layer over native)
Build systemmake menuconfig + MakefilesCMake + west
Community sizeSmaller, Apache FoundationLarger, Linux Foundation
Board ports~200+~500+
Commercial backingSony, Xiaomi, Samsung (contributors)Intel, Nordic, NXP (members)

When it fits:

NuttX is a strong option when an application needs POSIX compatibility on a resource-constrained platform. Porting existing Linux-based application code to NuttX is significantly easier than porting to Zephyr or FreeRTOS, where the API model is fundamentally different. The tradeoff is a smaller ecosystem and fewer pre-built board support packages.

Android (IoT / Embedded)#

Android-based platforms target embedded devices that need rich graphical interfaces, touchscreen input, and access to the Android application ecosystem. While Android started as a mobile phone OS, vendor-modified builds appear on industrial HMIs, point-of-sale terminals, vehicle infotainment systems, and smart displays.

Practical characteristics:

  • Full graphics stack: SurfaceFlinger compositor, OpenGL ES / Vulkan GPU acceleration, hardware video decode
  • Application framework provides Activities, Services, Content Providers, and Broadcast Receivers — a complete model for building complex UI-driven applications
  • Minimum practical hardware requirements: Cortex-A53 or higher, 1 GB RAM (2+ GB recommended), 8+ GB storage
  • Boot time on embedded hardware: 15-45 seconds depending on hardware and vendor modifications
  • OTA update framework (A/B partitioning) is built into the platform, though vendor implementations vary in reliability
  • Security patch cadence depends entirely on the SBC vendor — many embedded Android images ship with known CVEs and never receive updates

Common embedded Android platforms:

PlatformSoCRAMAndroid Version (typical)
Raspberry Pi 4/5 (community)BCM2711/BCM27124-8 GBAndroid 13-14 (unofficial)
Radxa Rock 5BRK35884-16 GBAndroid 12-13
Khadas VIM4Amlogic A311D28 GBAndroid 11-12
NVIDIA Jetson OrinOrin SoC8-64 GBAndroid (via NVIDIA BSP)

Build and customization:

  • Full AOSP (Android Open Source Project) builds require a powerful workstation: 64+ GB RAM, 300+ GB disk, 16+ CPU cores
  • Build time for a full AOSP image: 2-6 hours depending on hardware
  • Customization involves modifying the device tree, HAL (Hardware Abstraction Layer) implementations, and system properties
  • repo tool manages the hundreds of Git repositories that compose the AOSP source tree
  • Vendor BSPs typically provide a manifest file that specifies exact repository versions and patches for the target board

When it fits:

Android makes sense for embedded products that are fundamentally UI-driven and can benefit from the Android application ecosystem — digital signage, kiosks, smart home panels, and infotainment systems. The cost is heavy resource requirements, vendor dependency for BSP and security updates, and a complex build system (AOSP) that dwarfs even Yocto in build time and disk usage.

Windows IoT#

Microsoft offers Windows IoT variants for edge and embedded devices, primarily targeting x86 hardware but with limited ARM support.

Current variants:

VariantArchitectureStatusTarget
Windows IoT Enterprisex86-64ActiveIndustrial PCs, edge gateways, kiosks
Windows IoT Enterprise LTSCx86-64ActiveLong-lifecycle embedded devices (10-year support)
Windows IoT CoreARM, x86DeprecatedWas for Pi, DragonBoard — no longer actively developed

Practical characteristics:

  • Windows IoT Enterprise is essentially full Windows with embedded-specific licensing (lockdown features, kiosk mode, custom shell)
  • LTSC (Long-Term Servicing Channel) releases receive 10 years of security updates without feature changes — critical for industrial deployments
  • UWP (Universal Windows Platform) and WinUI application frameworks for building embedded UIs
  • Full .NET runtime, PowerShell, and Win32 API access
  • Driver ecosystem is mature for x86 hardware — USB devices, industrial I/O cards, and PCI peripherals generally work without custom driver development
  • Not practical for most ARM-based SBCs — the ecosystem is x86-centric

Lockdown and kiosk features:

  • Unified Write Filter (UWF) redirects disk writes to a RAM overlay, protecting the base image from modification — similar in concept to squashfs + overlayfs in Linux but managed through Windows APIs
  • Keyboard Filter blocks specific key combinations (Ctrl+Alt+Del, Windows key) to prevent users from escaping the kiosk application
  • Shell Launcher replaces the Windows desktop shell with a custom application, providing a dedicated-purpose appearance
  • AppLocker restricts which executables can run, preventing unauthorized software installation

When it fits:

Windows IoT Enterprise is relevant for edge devices in environments already committed to Microsoft tooling: Active Directory, Azure IoT Hub, System Center management. It also fits when the application requires Windows-specific software (proprietary industrial applications, specific .NET frameworks). For ARM-based SBCs and resource-constrained devices, Linux is the more practical choice.


OS Selection Criteria#

The following table summarizes key selection criteria across the major embedded OS options. Values represent typical configurations — specific hardware, kernel versions, and application stacks shift these numbers. No single criterion drives the decision in isolation — the weighting depends on the project phase (prototype vs. production), deployment scale (bench vs. fleet), and operational requirements (boot time, update cadence, regulatory compliance).

CriterionRaspberry Pi OSUbuntu Server ARMArmbianYocto (custom)Buildroot (custom)ZephyrNuttXAndroid IoT
Package Managerapt (Debian)apt + snapapt (Debian/Ubuntu)opkg or noneNone (monolithic image)NoneNoneAPK (Android)
Real-Time CapableNo (soft RT with PREEMPT_RT patch)No (soft RT with PREEMPT_RT patch)No (soft RT with PREEMPT_RT patch)Configurable (PREEMPT_RT optional)Configurable (PREEMPT_RT optional)Yes (hard RT)Yes (hard RT)No
Typical Boot Time20-30 s25-40 s20-35 s3-8 s (minimal)1-3 s (minimal)<1 s<1 s15-45 s
Min Storage Footprint~500 MB (Lite)~700 MB~400 MB30-100 MB8-50 MB64-256 KB64-256 KB4-8 GB
Security Update CadenceRegular (Pi Foundation)Regular (Canonical, 5-10 yr LTS)Community-dependentSelf-managed (CVE tracking required)Self-managedRegular (Zephyr project)Community-dependentVendor-dependent (often poor)
Long-Term MaintenanceGood (Pi Foundation backed)Excellent (Canonical commercial support)Variable (community volunteer)Excellent (self-controlled)Good (self-controlled)Good (Linux Foundation backed)Moderate (Apache Foundation)Poor to moderate (vendor-dependent)
Learning CurveLowLowLow-moderateSteepModerateModerate-steepModerateSteep (AOSP build system)

Reading the table:

  • Boot time is measured from power-on to application readiness, not just kernel boot. Pre-built distributions include service managers (systemd) that start dozens of services sequentially. Custom images can eliminate unnecessary services and use parallel initialization to reach application readiness faster.
  • Storage footprint is the installed size on disk, not the compressed download size. Compressed images are typically 30-50% of the installed size for Linux distributions, and even smaller for squashfs-based builds.
  • Real-time capable distinguishes between hard real-time (deterministic, bounded latency guaranteed by the scheduler) and soft real-time (best-effort low latency via PREEMPT_RT). Linux with PREEMPT_RT can achieve sub-millisecond scheduling latency under controlled conditions, but cannot guarantee it under all workloads.
  • Security update cadence matters most for network-connected devices. A device behind a firewall with no internet exposure has different update requirements than one exposed to the public internet.

RTOS-on-MPU Patterns#

Many modern SoCs integrate both application-class cores (Cortex-A) and real-time cores (Cortex-M) on the same die. This heterogeneous multicore architecture enables running Linux on the application core for networking, UI, and general-purpose computing, while an RTOS on the real-time core handles time-critical I/O with deterministic latency. This pattern is increasingly common in industrial, automotive, and IoT SoCs — it represents a convergence of the MCU and MPU worlds onto a single chip.

Common Heterogeneous Multicore Platforms#

SoCApplication CoreReal-Time CoreTypical LinuxTypical RTOS
STM32MP157Cortex-A7 (dual, 650 MHz)Cortex-M4 (209 MHz)Yocto (meta-st-stm32mp)FreeRTOS, Zephyr, bare-metal
STM32MP135Cortex-A7 (single, 1 GHz)— (no M core)Yocto (meta-st-stm32mp)N/A
i.MX 8M PlusCortex-A53 (quad, 1.8 GHz)Cortex-M7 (800 MHz)Yocto (meta-freescale)FreeRTOS, Zephyr
nRF5340Cortex-M33 (128 MHz, app)Cortex-M33 (64 MHz, net)N/A (no Linux)Zephyr on both cores
TI AM62xCortex-A53 (quad, 1.4 GHz)Cortex-M4F (400 MHz)Yocto (meta-ti)FreeRTOS, bare-metal
Renesas RZ/G2LCortex-A55 (dual, 1.2 GHz)Cortex-M33 (200 MHz)Yocto (meta-renesas)FreeRTOS, bare-metal

When This Pattern Applies#

The heterogeneous multicore pattern is appropriate when a system needs both:

  • Network-connected application logic — TCP/IP stack, MQTT/HTTP clients, cloud connectivity, web server, database — tasks where Linux’s networking stack and user-space tooling provide significant value
  • Hard real-time I/O — motor control loops, sensor sampling at fixed intervals, protocol timing (CAN, industrial fieldbus), safety-critical monitoring — tasks where jitter above tens of microseconds causes functional failures

Running both workloads on Linux alone (even with PREEMPT_RT) risks priority inversion, garbage collection pauses (in managed-language applications), or kernel scheduling latency spikes that violate real-time deadlines. Running both workloads on an RTOS alone sacrifices the networking stack, filesystem, and user-space tooling that make Linux productive for application development.

Inter-Core Communication#

The two cores share a physical SoC but run independent operating systems with separate memory spaces. Communication between them uses hardware-supported mechanisms:

RPMsg (Remote Processor Messaging):

  • Standard Linux kernel framework for inter-core messaging
  • Built on top of the VirtIO transport layer
  • Each core sees named communication channels (endpoints)
  • Message-passing model — small messages (typically 256-512 bytes) sent through shared memory buffers
  • Linux side uses /dev/rpmsgX character devices or the rpmsg kernel subsystem
  • RTOS side uses vendor-provided RPMsg libraries (e.g., OpenAMP)

Shared Memory:

  • Direct shared memory regions mapped into both cores’ address spaces
  • No protocol overhead — the fastest communication path
  • Requires explicit synchronization (mailbox interrupts, spinlocks, or lock-free data structures)
  • Risky without careful memory barrier management — cache coherency between cores is not always automatic

Mailbox Peripherals:

  • Hardware interrupt-based signaling between cores
  • Typically used to notify one core that data is available in shared memory
  • Very low latency (single interrupt cycle)
  • Limited payload (often just a few registers — 32-bit values)
  • Usually combined with shared memory: mailbox signals “data ready,” shared memory holds the actual data

Typical communication flow:

┌──────────────────────┐          ┌──────────────────────┐
│   Linux (Cortex-A)   │          │   RTOS (Cortex-M)    │
│                      │          │                      │
│  Application         │          │  Real-time task      │
│       │              │          │       │              │
│       ▼              │          │       ▼              │
│  RPMsg endpoint      │          │  RPMsg endpoint      │
│       │              │          │       │              │
│       ▼              │          │       ▼              │
│  VirtIO ring buffer  │◄────────►│  VirtIO ring buffer  │
│  (shared memory)     │          │  (shared memory)     │
│       │              │          │       │              │
│       ▼              │          │       ▼              │
│  Mailbox IRQ ────────┼──────────┼─► Mailbox IRQ       │
└──────────────────────┘          └──────────────────────┘

Lifecycle and Boot Sequence#

In most heterogeneous multicore SoCs, the application core (running Linux) controls the lifecycle of the real-time core:

  1. The SoC boots, and the bootloader (U-Boot) starts the Linux kernel on the application core
  2. Linux loads the RTOS firmware image for the real-time core (often stored in /lib/firmware/)
  3. The remoteproc kernel subsystem loads the firmware into the real-time core’s memory and starts execution
  4. RPMsg channels are established after both cores are running
  5. Linux can stop, reload, and restart the real-time core firmware without rebooting — useful for firmware updates

This model means the RTOS firmware is a file managed by Linux, simplifying updates and version management. The real-time core does not boot independently in most configurations, though some SoCs support early boot of the M core before Linux starts (useful for safety-critical applications that must be operational immediately).

Linux-side management commands (STM32MP1 example):

# Load firmware into the Cortex-M4
echo /lib/firmware/my_rtos_app.elf > /sys/class/remoteproc/remoteproc0/firmware

# Start the real-time core
echo start > /sys/class/remoteproc/remoteproc0/state

# Check current state
cat /sys/class/remoteproc/remoteproc0/state
# Output: running

# Stop the real-time core (for firmware update)
echo stop > /sys/class/remoteproc/remoteproc0/state

# Update firmware and restart
echo /lib/firmware/my_rtos_app_v2.elf > /sys/class/remoteproc/remoteproc0/firmware
echo start > /sys/class/remoteproc/remoteproc0/state

Memory Partitioning#

Heterogeneous multicore SoCs require careful memory partitioning between cores. The device tree on the Linux side reserves memory regions for the RTOS core, shared memory buffers, and inter-core communication structures. Incorrect memory reservation leads to Linux overwriting RTOS memory or vice versa — a class of bug that produces intermittent crashes with no obvious pattern.

Typical memory layout (STM32MP157 example):

RegionAddress RangeSizeOwner
DDR (Linux)0xC0000000 - 0xCFFFFFFF256 MBLinux kernel + user space
DDR (RTOS)0xD0000000 - 0xD003FFFF256 KBCortex-M4 firmware
Shared memory0xD0040000 - 0xD004FFFF64 KBRPMsg buffers
SRAM0x10000000 - 0x1001FFFF128 KBCortex-M4 (fast access)
RETRAM0x00000000 - 0x0000FFFF64 KBCortex-M4 (retained in standby)

The RTOS firmware’s linker script must match the memory regions defined in the Linux device tree. A mismatch — even by a single byte — causes one core to corrupt the other’s memory space. Vendor tooling (e.g., STM32CubeMX for STMicroelectronics SoCs) can generate both the device tree fragment and the linker script from a single configuration, reducing the risk of misalignment.


Tips#

The following tips apply across the embedded OS options, focusing on decisions that commonly affect project timelines and deployment success.

  • For prototyping, start with a pre-built distribution (Pi OS, Armbian) — custom images add complexity that only pays off at production scale or when boot time and image size matter.

  • Yocto’s initial build can take 2-4 hours and 50+ GB of disk space. Allocating an SSD with at least 200 GB free and a machine with 32+ GB of RAM avoids build failures from exhausted resources. CI/CD pipelines benefit from preserving the sstate-cache and downloads directories between builds.

  • Buildroot is a better starting point than Yocto when the team has no prior embedded Linux build system experience. The make menuconfig interface is familiar to anyone who has configured a Linux kernel, and the build completes in minutes rather than hours.

  • Before committing to a specific SBC and OS combination, check whether the SBC vendor provides a BSP layer for Yocto. Building a BSP layer from scratch requires deep knowledge of the SoC’s boot sequence, device tree, and driver stack — typically weeks of effort for a new SoC family.

  • For fleet deployment, a custom Yocto or Buildroot image containing only the required packages reduces attack surface, boot time, and storage requirements. Removing unused services, kernel modules, and libraries also eliminates potential sources of instability.

  • When evaluating RTOS-on-MPU designs, start with the SoC vendor’s reference implementation (e.g., STMicroelectronics’ OpenSTLinux with M4 firmware examples) before building a custom RTOS application. The reference implementation validates that inter-core communication works on the specific hardware revision.

  • The meta-openembedded layer collection provides thousands of additional Yocto recipes. Adding the meta-oe, meta-python, and meta-networking layers covers most common application dependencies without writing custom recipes.

  • Armbian’s armbian-monitor tool collects system information (kernel version, dmesg, hardware details) into a single paste for forum troubleshooting. Running armbianmonitor -u generates a shareable link — far more efficient than manually copying logs when reporting board-specific issues.

  • For Buildroot projects targeting read-only production deployments, the squashfs + overlayfs combination provides crash resilience. The read-only squashfs root cannot be corrupted by power loss, and the writable overlay (on a separate partition) captures only the delta of runtime changes. A factory reset simply erases the overlay partition.

  • When comparing boot times across OS options, measure at the application level, not the shell prompt. A consistent method is to toggle a GPIO pin from the application’s initialization code and measure the time from power-on using an oscilloscope or logic analyzer. This captures the full boot chain: bootloader, kernel, init system, and application startup.

  • NuttX’s apps/ repository contains example applications covering networking, filesystems, sensors, and graphics. Building and running these examples on the target board before writing custom application code validates that the board port and peripheral drivers are functional.


Caveats#

  • “Runs Linux” does not mean “runs any Linux” — kernel version, device tree support, and driver availability vary dramatically between boards. A board that runs the vendor’s Ubuntu 22.04 image may fail to boot a mainline 6.x kernel due to missing device tree bindings or out-of-tree driver dependencies.

  • Armbian support quality varies by board. Some boards are “supported” only in the sense that an image exists, with known bugs and missing drivers listed on the board’s status page. Checking the Armbian forum and GitHub issues for the specific board before purchasing avoids surprises.

  • Yocto layer compatibility is version-locked. Mixing layers from different Yocto releases (e.g., a Kirkstone BSP layer with Scarthgap core layers) causes subtle build failures — recipes referencing removed variables, incompatible class inheritance, or mismatched package versions. All layers in a build must target the same Yocto release.

  • Android IoT images from SBC vendors are often outdated and missing security patches. A board shipped with Android 11 in 2023 may never receive an Android 12 update, leaving known kernel and framework CVEs unpatched. Evaluate the vendor’s update track record before selecting Android for a product.

  • Zephyr on MPU-class hardware is a niche use case with limited board support. The Zephyr project’s primary focus is Cortex-M and RISC-V microcontrollers — verify that the specific MPU target has an actively maintained board port before planning around it.

  • Buildroot’s lack of a package manager in the generated image means every change requires rebuilding and reflashing the entire image. For devices that need frequent in-field application updates without full image replacement, this model adds deployment friction.

  • RTOS-on-MPU inter-core communication adds a message-passing boundary that complicates debugging. Standard Linux debugging tools (gdb, strace, perf) cannot see into the RTOS core, and RTOS-side debugging requires a separate JTAG/SWD connection. Correlating timestamps between cores requires shared clock references or hardware trace (e.g., ARM CoreSight).

  • Custom Yocto images that strip too aggressively can remove packages needed for runtime debugging and diagnostics. Maintaining a separate “development” image recipe with SSH, strace, gdb, and tcpdump alongside the minimal “production” image avoids having to rebuild when issues arise in the field.

  • Ubuntu Server ARM’s snap auto-refresh mechanism can update packages at unpredictable times, causing service restarts that disrupt embedded applications. For deployed devices, configuring snap set system refresh.hold or snap set system refresh.timer to a maintenance window prevents unexpected downtime.

  • Pre-built distributions often ship with kernel configurations that enable every possible module. When migrating to Yocto or Buildroot, starting from the vendor’s defconfig and trimming unused drivers — rather than starting from a minimal config and adding — avoids missing obscure dependencies that only manifest at runtime (e.g., USB gadget mode requiring specific UDC drivers, or Wi-Fi requiring specific cfg80211 and rfkill options).


In Practice#

  • A product that ships on Raspberry Pi OS during prototyping often migrates to a Yocto-built image for production. The trigger is usually boot time (Pi OS: 20-40 seconds, Yocto minimal: 3-8 seconds) or image size (Pi OS Lite: ~500 MB, Yocto minimal: 30-100 MB). The migration surfaces configuration assumptions that worked in Debian but require explicit recipe additions in Yocto — Python packages installed via pip at development time, systemd service files that relied on Debian-specific paths, or kernel modules that were auto-loaded by udev rules present in the Pi OS kernel config but absent in a minimal Yocto kernel.

  • Armbian on an Orange Pi 5 commonly shows kernel panics or missing features that work fine on the vendor-provided Ubuntu image. This typically traces back to mainline kernel support gaps for the RK3588 SoC. The vendor kernel includes Rockchip-specific patches for GPU (Mali G610), NPU (RKNN), and HDMI output that have not been upstreamed. Switching to Armbian’s vendor kernel track (when available) often resolves these issues, at the cost of running a kernel version that may not receive upstream security fixes.

  • Heterogeneous multicore designs (Linux + RTOS) commonly surface communication timing issues during integration. RPMsg latency between cores often shows up as dropped messages under load. The root cause is typically the Linux side not servicing VirtIO ring buffer interrupts quickly enough during periods of high CPU load (e.g., network traffic spikes, filesystem writes). This appears as missing sensor readings, delayed control responses, or watchdog timeouts on the RTOS side. Careful buffer sizing, flow control (backpressure signaling from the RTOS core), and elevated IRQ thread priority on the Linux side mitigate the issue.

  • Buildroot images that boot in under 2 seconds on a CM4 sometimes mask slow application startup. The kernel and init system reach a ready state quickly, but the application itself (especially if it initializes network connections, loads configuration from flash, or waits for hardware peripherals to enumerate) may not be functional for several additional seconds. systemd-analyze blame (if systemd is included) or timestamped init scripts help separate kernel boot from application readiness. Measuring time-to-first-useful-output rather than time-to-shell-prompt gives a more accurate picture of actual boot performance.

  • Yocto builds that work locally often fail in CI pipelines due to resource constraints. BitBake’s parallel build scheduling can exhaust RAM on machines with fewer than 32 GB, causing OOM kills that manifest as cryptic compiler errors or missing output files rather than clear memory-related messages. Setting BB_NUMBER_THREADS and PARALLEL_MAKE to match the CI runner’s core count and memory — rather than relying on autodetection — prevents these failures. A common rule of thumb is 2 GB of RAM per parallel build thread.

  • NuttX-to-Linux migration paths often appear smoother than expected on paper but stall on driver availability. NuttX’s POSIX compliance means application-layer code ports to Linux with minimal changes, but custom peripheral drivers written against NuttX’s driver framework require a complete rewrite for Linux’s driver model (character devices, platform drivers, device tree bindings). The reverse is also true — Linux drivers do not port to NuttX without significant rework.

  • Android IoT devices that perform well during demo but degrade over months in the field often trace to storage wear from excessive logging, unconstrained app data growth, or filesystem fragmentation on low-endurance eMMC. Configuring logd buffer limits, application data quotas, and periodic fstrim (on supported eMMC controllers) prevents gradual performance degradation that otherwise appears as random slowdowns and ANR (Application Not Responding) events.

  • Ubuntu Server ARM on a Jetson Orin sometimes exhibits GPU compute failures that do not appear on Raspberry Pi OS or Armbian. This commonly traces to a mismatch between the Ubuntu generic ARM kernel and NVIDIA’s proprietary kernel modules. The NVIDIA JetPack SDK provides a specific kernel version with out-of-tree modules for CUDA, TensorRT, and the display driver — installing Ubuntu’s kernel updates via apt upgrade can replace the NVIDIA-patched kernel with a generic one, breaking GPU functionality. Pinning the kernel package (apt-mark hold linux-image-*) or using NVIDIA’s L4T (Linux for Tegra) base image avoids this issue.

  • Buildroot images deployed on devices with eMMC rather than SD cards sometimes show faster boot times by 2-5 seconds due to eMMC’s higher sequential read performance (100-300 MB/s vs. 30-50 MB/s for typical SD cards). The kernel decompression and root filesystem mount phases benefit most from this speedup. However, the eMMC write endurance rating becomes the limiting factor for devices that perform frequent writes — consumer-grade eMMC rated for 3,000 P/E cycles can wear out within 1-2 years under heavy logging workloads.

  • Windows IoT Enterprise deployments on industrial x86 edge devices often require disabling Windows Update entirely in production environments. An uncontrolled update can reboot the device mid-process, change driver behavior, or consume bandwidth on metered connections. The LTSC variant mitigates this by limiting updates to security patches without feature changes, but even these require validation against the deployed application stack before rollout.

  • STM32MP1 projects that work correctly with Linux + Cortex-M4 bare-metal firmware sometimes break when migrating the M4 firmware to FreeRTOS or Zephyr. The RTOS introduces its own interrupt management and timer configuration that can conflict with the hardware settings established by the Linux device tree. A common symptom is the M4 core hanging during initialization because the RTOS reconfigures a clock or peripheral that the Linux side expects to control. The fix involves carefully partitioning peripheral ownership in the device tree’s m4_rproc node and ensuring the RTOS firmware only touches peripherals explicitly assigned to it.

Page last modified: March 1, 2026