NemoClaw Knowledge Wiki

Tag: llamacpp

7 items with this tag.

  • Jun 14, 2026

    fahd-mirza

    • local-ai
    • llm-inference
    • fine-tuning
    • speculative-decoding
    • quantization
    • llamacpp
    • unsloth
    • export-controls
    • coding-agents
  • Jun 13, 2026

    gguf-format

    • gguf-format
    • llm-serialization
    • local-inference
    • llamacpp
    • ollama
    • model-distribution
    • binary-format
  • Jun 03, 2026

    Adaptive PFlash and Hermes Agent: Self-Tuning LLM Prefill for Long Contexts

    • llamacpp
    • lucebox
    • lucedflash
    • speculativedecoding
    • pflash
  • May 22, 2026

    llama.cpp Router Mode: Native Hot-Swappable Local LLM Switching

    • llamacpp
  • May 20, 2026

    MTP + Ngram Stacked Speculative Decoding in Llama.cpp for LLM Inference

    • llamacpp
    • mtp
    • multitokenprediction
    • speculativedecoding
    • ngrammod
  • May 11, 2026

    Higgsfield: Enabling LLMs like Claude for Media Generation

    • higgsfield
    • llm
    • llamacpp
  • May 10, 2026

    Achieving Fast 35B MoE AI Model Performance on 6GB VRAM with Llama.cpp

    • LocalAI
    • LLM
    • llamacpp
    • Qwen
    • AIonGPU
    • LowVRAM

Created with Quartz v4.5.2 © 2026

  • GitHub
  • Discord Community