Generated: 2026-05-04 · API: Gemini 2.5 Flash · Modes: Summary


Optimizing Frontier Small Models for Edge AI: Liquid AI’s Innovations

Clip title: Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI Author / channel: AI Engineer URL: https://www.youtube.com/watch?v=fLUtUkqYHnQ

Summary

The presentation by Maxime Labonne of Liquid AI, titled “Everything I Learned Training Frontier Small Models,” offers an in-depth look into the development and optimization of small AI models, specifically tailored for on-device or “edge” deployment. Liquid AI focuses on models ranging from 350 million to 24 billion parameters, asserting that these models are fundamentally different from their larger counterparts and demand unique approaches. Labonne outlines three main characteristics of edge models: they are memory-bound (typically under 3 billion parameters, leading to low knowledge capacity), task-specific (not designed for general-purpose chatbots), and latency-sensitive (requiring sub-100ms responses). These distinctions necessitate a specialized development strategy rather than simply scaling down larger models.

Architecturally, Labonne illustrated Liquid AI’s innovative approach by comparing their LFM2.5-350M LLM with Google’s Gemma 3 270M and Qwen 3.5 0.8B. He highlighted a critical observation: the embedding layer can consume a significant portion of a small model’s total parameters (e.g., 63% for Gemma 3 270M), meaning the “effective size” for reasoning and knowledge capacity is often much smaller. Liquid AI optimized their LFM2.5 architecture, reducing the embedding layer’s proportion to 19% of parameters. This efficiency was achieved through “on-device profiling,” testing models directly on target hardware like AMD Ryzen and Samsung Galaxy devices. The integration of a Gated Short Convolution Block (ShortConv) proved particularly effective, offering superior inference speed and lower memory usage compared to other attention mechanisms like Sliding Window Attention (SWA) and Gated DeltaNet (GDN).

Regarding training, the LFM2.5-350M recipe involves extensive pre/mid-training on 28 trillion tokens, followed by a multi-stage post-training process including Supervised Fine-Tuning (SFT), Preference Alignment (PA), and Reinforcement Learning (RL). Labonne noted that, contrary to some traditional scaling laws, smaller models benefit significantly from greater pre-training data, with performance continuing to improve. A major challenge for small models is “doom looping,” where the model gets stuck repeating text. Liquid AI tackled this through a two-pronged solution: during DPO (Direct Preference Optimization) data generation, they use an LLM jury to identify and reject looping responses, training the model not to repeat. Additionally, during RL, they implemented n-gram repetition penalties and verifiable rewards for correct answers, drastically reducing the doom loop ratio from ~15% to a mere 0.36%.

In conclusion, Labonne emphasized that edge models possess unique challenges, but these are largely addressable with creative solutions. A forward-looking strategy involves integrating edge models with “agentic tools,” such as web search capabilities. While small models have limited internal knowledge and struggle with long-context tasks, external tools allow them to overcome these limitations by fetching information and taking “shortcuts” in complex environments. The core takeaway is that optimizing small models requires moving beyond direct scaling from larger models, focusing instead on tailored architectures, innovative training methodologies like those addressing doom looping, and empowering them with agentic tools to leverage external knowledge and reasoning.

Description

A new class of small models is emerging with the ability to reliably follow instructions and call tools while running on-device under 1 GB of memory. In this talk, we’ll break down how to post-train frontier small models using the LFM2.5 recipe: on-policy preference alignment, agentic reinforcement learning, and curriculum training with iterative model merging. We’ll cover training challenges unique to the 1B scale, like doom loops, capability interference, and how to fix them. The goal is to give you a concrete playbook to fine-tune and deploy small models for your own use cases, from structured data extraction to multi-turn tool use.

Speaker info:

Timestamps: 0:00:00 - Start 0:00:14 - Introduction to frontier small models at Liquid AI 0:01:02 - Characteristics: memory-bound, task-specific, latency-sensitive 0:02:20 - Architecture: why large embedding layers are inefficient 0:04:01 - LFM2 architecture: using gated short convolutions for speed 0:06:09 - LFM 2.5 recipe: 28T tokens and post-training stages 0:08:34 - Post-training: SFT, preference alignment, and RL best practices 0:10:43 - Identifying “doom loops” in reasoning models 0:11:34 - Solutions: mitigating loops via preference alignment and RL 0:15:29 - Future focus: using agentic tools to overcome memory limits 0:17:58 - Q&A: real-world applications for small vs. large models

Tags

ai, ai engineer, ai engineering, software development, tech, startups, software architecture, machine learning