CPU optimised TTS - Kitten AI - Sam Witteveen channel

https://www.youtube.com/watch?v=YpQWdrfzSzQ

CPU optimised TTS

Here is a Markdown summary of the video reviewing Kitten TTS.

🐱 Kitten TTS - Model Overview & Review

Kitten TTS is a new, open-source text-to-speech framework developed by Kitten ML. The primary focus of this project is extreme efficiency, small file sizes, and CPU optimization, making it ideal for edge computing and browser-based applications.

🚀 Key Features

Ultra-Lightweight: The smallest model is under 25MB.
CPU Optimized: Designed to run without a GPU.
Edge Ready: Can run in browsers, mobile phones, and IoT devices with minimal RAM.
Open Source: Released under the permissive Apache 2.0 License.
Fast Inference: Optimized for real-time speech synthesis.

📦 Model Sizes & Variations

Kitten TTS offers three distinct model sizes, plus a quantized version of the smallest model.


Model Name	Parameters	Disk Size	Description
Kitten-TTS-Mini	80 Million	~80 MB	The “largest” model available.
Kitten-TTS-Micro	40 Million	~41 MB	Mid-range balance of size/quality.
Kitten-TTS-Nano	15 Million	~56 MB	The smallest base model.
Nano (Int8)	15 Million	< 25 MB	8-bit quantized version. Extremely portable.

🧪 Performance & Audio Quality

The video demonstrated a comparison between the models using a Google Colab notebook (running entirely on CPU).

General Quality: While not achieving the hyper-realism of massive models (like QuenTTS or ElevenLabs), the quality is impressive relative to the tiny file size.
Size vs. Quality: Surprisingly, there is not a massive degradation in voice character between the 80M (Mini) and 15M (Nano) models.
The 8-Bit Quantized Model:
- Pros: Runs incredibly fast; file size is negligible.
- Cons: Introduces some audio artifacts; struggles slightly with punctuation and pausing (sometimes results in run-on sentences).
Voices: The system creates embeddings similar to Kokoro TTS. Available voices include:
- Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo.
- Notable mentions: Hugo (formal/news anchor style) and Luna (storytelling style) performed well.

🛠️ Technical Details

Format: The models are packaged as ONNX files, contributing to their portability.
Installation: capable of being installed via pip.pip install https://github.com/KittenML/KittenTTS/releases/download/0.8/kittentts-0.8.0-py3-none-any.whlpip install soundfile
Development Status: Currently in Developer Preview (Version 0.8 tested in video).
Team: Appears to be a very small team (potentially a solo developer) based on the GitHub contributors list.

💭 Conclusion

Kitten TTS represents a shift toward TinyML in the audio space. It proves that TTS systems are becoming efficient enough to run fully client-side (in-browser or on-device) without relying on heavy cloud APIs or expensive GPUs. While the audio quality has minor artifacts in the smallest versions, the trade-off for a <25MB footprint makes it a game-changer for mobile and web apps.

Resources:

NemoClaw Knowledge Wiki

Explorer

CPU optimised TTS - Kitten AI - Sam Witteveen channel

CPU optimised TTS - Kitten AI - Sam Witteveen channel

🐱 Kitten TTS - Model Overview & Review

🚀 Key Features

📦 Model Sizes & Variations

🧪 Performance & Audio Quality

🛠️ Technical Details

💭 Conclusion

Graph View

Table of Contents