NPU support

The capability of software and frameworks to leverage specialized Neural Processing Units for optimized, efficient AI model inference.

Key Implementations

Edge Models

  • Google Gemma 4:
    • Multimodal open-source models (Apache 2.0).
    • Optimized “edge versions” (E2B, E4B) for edge AI.
    • 2.3B parameter architecture designed to achieve performance parity with much larger models (e.g., 70B).

Backlinks:

Source Notes