NemoClaw Knowledge Wiki

Tag: multi-token-prediction

6 items with this tag.

  • Jun 14, 2026

    large-language-models

    • neural-networks
    • natural-language-processing
    • transformer-models
    • prompt-engineering
    • model-parameters
    • text-generation
    • speculative-decoding
    • multi-token-prediction
    • inference-optimization
    • quantization
    • memory-management
    • energy-based-models
    • constraint-satisfaction
    • harness-design
    • ai-coding-agents
    • local-inference
    • model-variants
    • attention-mechanisms
    • residual-connections
    • edge-ai
    • privacy-preserving-ai
    • prompt-caching
    • kv-cache
    • fine-tuning
    • open-source-tools
    • unsloth
    • evolution-strategies
    • gradient-free-optimization
    • test-time-compute
    • inference-time-reasoning
  • Jun 14, 2026

    speculative-decoding

    • speculative-decoding
    • llm-inference
    • drafting-models
    • token-verification
    • multi-token-prediction
  • Jun 14, 2026

    token-generation-speed

    • LLM
    • inference
    • performance
    • llama.cpp
    • tokenization
    • token-generation
    • inference-speed
    • llm-performance
    • quantization
    • speculative-decoding
    • multi-token-prediction
  • Jun 14, 2026

    google-gemma-4

    • large-language-model
    • open-weight
    • multi-token-prediction
    • speculative-decoding
    • google-gemma
    • inference-optimization
  • Jun 13, 2026

    local-llm

    • local-llm
    • ai-coding
    • privacy
    • inference-speed
    • ollama
    • llama.cpp
    • multi-token-prediction
    • open-source-projects
  • Jun 13, 2026

    multi-token-prediction-mtp-drafter-models

    • multi-token-prediction
    • speculative-decoding
    • llm-inference
    • model-acceleration
    • drafter-models
    • inference-optimization
    • llama.cpp

Created with Quartz v4.5.2 © 2026

  • GitHub
  • Discord Community