Tiiny AI Pocket Lab: Running Large Language Models Locally and Privately

Generated: 2026-05-26 · API: Gemini 2.5 Flash · Modes: Summary

Tiiny AI Pocket Lab: Running Large Language Models Locally and Privately

Clip title: This Shouldn’t Be Able to Run 120B Locally Author / channel: Alex Ziskind URL: https://www.youtube.com/watch?v=RkzCAaIV_cQ

Summary

The video introduces the Tiiny AI Pocket Lab, a compact device designed to run large language models (LLMs) locally and privately, challenging the traditional need for extensive, expensive GPU hardware. The presenter highlights the growing trend of bigger GPUs and servers for AI, then dramatically reveals this pocket-sized device capable of handling models up to 120 billion parameters, a claim he sets out to verify.

The Tiiny AI Pocket Lab is surprisingly powerful for its size, weighing just 305 grams. It features an ARM v9.2 CPU with a Neural Processing Unit (NPU) boasting 30 INT8 TOPS, an impressive 80GB of LPDDR5X memory, and 1TB PCIe 4.0 SSD storage. This allows it to directly store and process large models, rather than relying on the host computer’s limited resources. The device connects to a host computer (a MacBook Neo with only 8GB RAM is used in the demo) via USB-C, where its TiinyOS software provides a user-friendly interface. While the host MacBook could only run a 4-billion parameter model at 9 tokens/second, the Tiiny device successfully ran the GPT-OSS-120B model (which typically requires 60-80GB VRAM) locally, achieving a decoding speed of 18.86 tokens/second without stressing the MacBook’s memory.

Beyond simple chat, TiinyOS offers an “Agent Store” with various pre-built AI applications like ChatMemo (AI assistant), Presenton, RAGFlow, SD Web UI (for Stable Diffusion), and TiinyBot. It also provides an SDK and command-line interface, enabling developers to integrate and interact with models programmatically in Python or directly from the terminal, making it highly versatile for software development. Models are downloaded directly to the Tiiny device via Wi-Fi (initial internet connection required) and then run completely offline, ensuring privacy. The dashboard tracks token usage, which is valuable for developers to estimate costs if deploying solutions to cloud-based services later. The device handles various model types, including coding models (like Qwen3-Coder-30B, integrated into VS Code) and text-to-image models, although resources are managed by loading/unloading models as needed.

The underlying technology, PowerInfer, found on GitHub, is a CPU/GPU LLM inference engine that intelligently manages model activation, keeping frequently used parts “hot” and less common ones “asleep” to optimize performance and low power consumption. Although the Tiiny AI Pocket Lab is not intended to replace high-end GPU rigs, its ability to bring powerful, private, and local AI capabilities to less capable laptops or mini PCs makes it a compelling solution for developers and users seeking portable, on-device AI. Currently available through a Kickstarter campaign, it presents a significant step towards democratizing access to large language models for personal and mobile use.

Video Description & Links

Description

I paired a tiny AI box with the MacBook Neo—and it seriously changed what I thought was possible with local AI. Tiiny box: https://tiiny.ai

👀 My favorite external drive (dependable): https://amzn.to/3Os9Wi3 👀 Thunderbolt 4 dock: https://amzn.to/3yVRicC

⚡ Other gear I use: https://www.amazon.com/shop/alexziskind

🎥 Related Videos 🎥 🧬🐍 Mac Studio CLUSTER vs M3 Ultra 🤯 - https://youtu.be/d8yS-2OyJhw 🧳🧰 Mini PC portable setup - https://youtu.be/4RYmsrarOSw 🍎💻 Dev setup on Mac - https://youtu.be/KiKUN4i1SeU 💸🧠 Cheap mini runs a 70B LLM 🤯 - https://youtu.be/xyKEQjUzfAk 🧪🔥 RAM torture test on Mac - https://youtu.be/l3zIwPgan7M 🍏⚡ FREE Local LLMs on Apple Silicon | FAST! - https://youtu.be/bp2eev21Qfo 🧠📉 REALITY vs Apple’s Memory Claims | vs RTX4090m - https://youtu.be/fdvzQAWXU7A ⚡💥 Thunderbolt 5 BREAKS Apple’s Upcharge - https://youtu.be/nHqrvxcRc7o 🧠🚀 INSANE Machine Learning on Neural Engine - https://youtu.be/Y2FOUg_jo7k 🧱🖥️ Mac Mini Cluster - https://youtu.be/GBR6pHZ68Ho

🛠️ Developer productivity Playlist - https://www.youtube.com/playlist?list=PLPwbI_iIX3aQCRdFGM7j4TY_7STfv2aXX

— — — — — — — — —

❤️ SUBSCRIBE TO MY YOUTUBE CHANNEL 📺

Click here to subscribe: https://www.youtube.com/@AZisk?sub_confirmation=1

Join this channel to get access to perks: https://www.youtube.com/channel/UCajiMK_CY9icRhLepS8_3ug/join

— — — — — — — — —

📱LET’S CONNECT ON SOCIAL MEDIA

ALEX ON TWITTER: https://twitter.com/digitalix

— — — — — — — — —

macstudio tiiny llm

URLs

YouTube Playlist URLs

https://www.youtube.com/playlist?list=PLPwbI_iIX3aQCRdFGM7j4TY_7STfv2aXX

Tiiny AI Pocket Lab — Wikipedia
Large Language Models (LLMs) — Wikipedia
Local and Private Computing — Wikipedia
GPU Hardware — Wikipedia
Local LLM Inference — Wikipedia
Neural Processing Unit (NPU) — Wikipedia
LPDDR5X Memory — Wikipedia
PowerInfer Engine — Wikipedia
Retrieval-Augmented Generation (RAG) — Wikipedia
Stable Diffusion — Wikipedia
Offload Computing — Wikipedia
ARM v9.2 Architecture — Wikipedia
Privacy-Preserving AI — Wikipedia
Model Quantization (INT8) — Wikipedia
Edge AI Hardware — Wikipedia
SDK Development — Wikipedia
Token Decoding Speed — Wikipedia
Offline Processing — Wikipedia
AI Agent Store — Wikipedia
USB-C Interface — Wikipedia

Alex Ziskind — Wikipedia
Tiiny AI Pocket Lab — Wikipedia
GPT-OSS-120B — Wikipedia
TiinyOS — Wikipedia
Qwen3-Coder-30B — Wikipedia
MacBook Neo — Wikipedia
RAGFlow — Wikipedia
ChatMemo — Wikipedia
Presenton — Wikipedia
SD Web UI — Wikipedia
TiinyBot — Wikipedia
Kickstarter — Wikipedia

NemoClaw Knowledge Wiki

Explorer

Tiiny AI Pocket Lab: Running Large Language Models Locally and Privately

Tiiny AI Pocket Lab: Running Large Language Models Locally and Privately

Summary

Video Description & Links

Description

Tags

URLs

YouTube Playlist URLs

Graph View

Table of Contents

Backlinks

NemoClaw Knowledge Wiki

Explorer

Tiiny AI Pocket Lab: Running Large Language Models Locally and Privately

Tiiny AI Pocket Lab: Running Large Language Models Locally and Privately

Summary

Video Description & Links

Description

Tags

URLs

YouTube Playlist URLs

Related Concepts

Related Entities

Graph View

Table of Contents

Backlinks