NemoClaw Knowledge Wiki

❯

❯

qwen 36 35b a3b

qwen-36-35b-a3b

Jul 12, 20261 min read

ai
llm
moe
qwen
local-inference
llama-cpp
vram-optimization
quantization
gguf
low-vram
coding-agent
mtp

Qwen 3.6 35B-A3B

Overview

35B-parameter mixture-of-experts language model from the Qwen Series
A3B routing variant activates ~3B parameters per token, maximizing throughput vs. memory tradeoffs
Architecture: Sparse MoE with dense attention, optimized expert gating, and instruction-tuned reasoning/code capabilities
Training: Multilingual corpus, heavy code synthesis, aligned for complex tool-use and long-context retention

Specialized Variants: Qwopus Coder

Qwopus 3.6-35B-A3B-Coder: A specialized derivative developed by Jackrong, built on the Qwen 3.6-35B A3B base
Agentic Self-Correction: Features “thinking-off” capabilities and token-efficient coding agent behaviors
MTP Integration: Utilizes MTP (Multi-Token Prediction) to drive efficiency and enable the model to fix its own bugs
Performance: Demonstrated high throughput (up to 160 s) in coding tasks while maintaining low VRAM footprint
See Qwopus Coder: Agentic Code Self-Correction and MTP-Driven Efficiency for detailed analysis

Local Deployment & Performance

Validated inference on 6GB vram constraints via llamacpp GGUF pipelines
Achieving Fast 35B MoE AI Model Performance on 6GB VRAM with Llama.cpp documents:
- Successful execution on 8-year-old consumer GPU hardware
- Q4_K_M / Q

References

Qwopus Coder: Agentic Code Self-Correction and MTP-Driven Efficiency

Graph View

Qwen 3.6 35B-A3B
Overview
Specialized Variants: Qwopus Coder
Local Deployment & Performance
References

Backlinks

INDEX
qwen-36-35b-a3b
thinking-off-mode
token-usage-optimization
qwopus-36-35b-a3b-coder
qwopus-coder
Achieving Fast 35B MoE AI Model Performance on 6GB VRAM with Llama.cpp

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community