<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Inference Frameworks &amp; Runtimes on Embedded Systems Development</title><link>https://applied-ee.github.io/embedded/docs/edge-ai/inference-frameworks/</link><description>Recent content in Inference Frameworks &amp; Runtimes on Embedded Systems Development</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://applied-ee.github.io/embedded/docs/edge-ai/inference-frameworks/index.xml" rel="self" type="application/rss+xml"/><item><title>TensorFlow Lite Micro</title><link>https://applied-ee.github.io/embedded/docs/edge-ai/inference-frameworks/tflite-micro/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://applied-ee.github.io/embedded/docs/edge-ai/inference-frameworks/tflite-micro/</guid><description>&lt;h1 id="tensorflow-lite-micro"&gt;TensorFlow Lite Micro&lt;a class="anchor" href="#tensorflow-lite-micro"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;TensorFlow Lite for Microcontrollers (TFLM) is an inference runtime designed for bare-metal environments where there is no operating system, no heap allocator, and as little as 16 KB of RAM. The runtime loads a pre-trained model stored as a FlatBuffer in flash memory, allocates all intermediate tensor storage from a single pre-allocated byte array (the tensor arena), and executes the model&amp;rsquo;s operations sequentially through a lightweight interpreter. There is no dynamic memory allocation at any point during inference — every byte comes from the arena, and the arena size is fixed at compile time. This makes TFLM deterministic and suitable for hard real-time systems, but it also means the developer bears full responsibility for sizing the arena correctly and registering exactly the operators the model requires.&lt;/p&gt;</description></item><item><title>TensorFlow Lite for Linux</title><link>https://applied-ee.github.io/embedded/docs/edge-ai/inference-frameworks/tflite-linux/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://applied-ee.github.io/embedded/docs/edge-ai/inference-frameworks/tflite-linux/</guid><description>&lt;h1 id="tensorflow-lite-for-linux"&gt;TensorFlow Lite for Linux&lt;a class="anchor" href="#tensorflow-lite-for-linux"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;TensorFlow Lite (TFLite) on Linux is the full-featured inference runtime for edge devices with an operating system, a filesystem, and megabytes to gigabytes of RAM. Unlike &lt;a href="https://applied-ee.github.io/embedded/docs/edge-ai/inference-frameworks/tflite-micro/"&gt;TensorFlow Lite Micro&lt;/a&gt;, which targets bare-metal microcontrollers with static memory allocation, TFLite for Linux uses dynamic memory, supports multi-threaded inference, and — critically — provides a &lt;strong&gt;delegate architecture&lt;/strong&gt; that offloads computation to hardware accelerators like GPUs, NPUs, and DSPs. This delegate system is what makes TFLite viable on platforms ranging from a Raspberry Pi 4 running object detection on the CPU to a Jetson Orin Nano offloading an entire model graph to TensorRT.&lt;/p&gt;</description></item><item><title>ONNX Runtime &amp; Edge Variants</title><link>https://applied-ee.github.io/embedded/docs/edge-ai/inference-frameworks/onnx-runtime/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://applied-ee.github.io/embedded/docs/edge-ai/inference-frameworks/onnx-runtime/</guid><description>&lt;h1 id="onnx-runtime--edge-variants"&gt;ONNX Runtime &amp;amp; Edge Variants&lt;a class="anchor" href="#onnx-runtime--edge-variants"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;ONNX (Open Neural Network Exchange) is an open model interchange format that allows models trained in one framework — PyTorch, TensorFlow, scikit-learn, XGBoost — to run on a shared inference runtime. The format defines a standard set of operators (organized into versioned &amp;ldquo;opsets&amp;rdquo;), a graph-based computation representation, and a serialization scheme using Protocol Buffers. ONNX Runtime (ORT) is Microsoft&amp;rsquo;s high-performance inference engine for ONNX models, supporting execution on CPUs, GPUs, NPUs, and DSPs through a pluggable &lt;strong&gt;execution provider&lt;/strong&gt; architecture.&lt;/p&gt;</description></item><item><title>Edge Runtime Selection Guide</title><link>https://applied-ee.github.io/embedded/docs/edge-ai/inference-frameworks/runtime-selection/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://applied-ee.github.io/embedded/docs/edge-ai/inference-frameworks/runtime-selection/</guid><description>&lt;h1 id="edge-runtime-selection-guide"&gt;Edge Runtime Selection Guide&lt;a class="anchor" href="#edge-runtime-selection-guide"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Selecting an inference runtime for edge deployment is a multi-dimensional decision that involves the target hardware, model format, operator coverage, memory budget, and the training framework that produced the model. There is no single &amp;ldquo;best&amp;rdquo; runtime — the choice is constrained by the intersection of what the hardware supports, what the model requires, and what the deployment environment allows. A model that runs perfectly on &lt;a href="https://applied-ee.github.io/embedded/docs/edge-ai/inference-frameworks/tflite-linux/"&gt;TensorFlow Lite for Linux&lt;/a&gt; with the XNNPACK delegate on a Raspberry Pi 4 cannot run on a Cortex-M4 microcontroller, where &lt;a href="https://applied-ee.github.io/embedded/docs/edge-ai/inference-frameworks/tflite-micro/"&gt;TensorFlow Lite Micro&lt;/a&gt; is the only viable option. Conversely, a PyTorch model that exports cleanly to &lt;a href="https://applied-ee.github.io/embedded/docs/edge-ai/inference-frameworks/onnx-runtime/"&gt;ONNX Runtime&lt;/a&gt; may have no path to TFLite without an intermediate conversion step that risks operator compatibility issues.&lt;/p&gt;</description></item></channel></rss>