Codacus
Content creator and educator specializing in local large language model (LLM) deployment, optimization, and resource-constrained inference. Known for tutorials on running high-parameter models on consumer-grade hardware.
Key Works & Demonstrations
- Achieving Fast 35B MoE AI Model Performance on 6GB VRAM with Llama.cpp (2026-05-10)
- Channel guide: “Running a 35B AI Model on 6GB VRAM, FAST (llama.cpp Guide)”
- Demonstrated inference of qwen-36-35b-a3b (35B parameters, mixture-of-experts architecture) on hardware with only 6GB VRAM
- Leveraged llamacpp for [[concepts/paramet
- Budget GPU Local Coding Agent Performance Optimization Report (2026-05-31)
- Clip title: “Build Powerful agent|Local Coding Agent]] on Budget GPU with Llama.cpp and Pi”
- Focuses on deploying mid-tier coding agents on budget GPU hardware to achieve responsive performance comparable to cloud solutions
- Utilizes llamacpp for local inference optimization