NemoClaw Knowledge Wiki

❯

❯

codacus

Jul 22, 20262 min read

creator
ai-educator
local-llm
llama-cpp
optimization
moe
content-creator
llm-optimization
local-inference
quantization
resource-constrained-computing
moe-models
coding-agents
budget-hardware
ai-memory
persistent-memory
benchmarking
model-comparison

Codacus

Content creator and educator specializing in local large language model (LLM) deployment, optimization, and resource-constrained inference. Known for tutorials on running high-parameter models on consumer-grade hardware.

Key Works & Demonstrations

Achieving Fast 35B MoE AI Model Performance on 6GB VRAM with Llama.cpp (2026-05-10)
- Channel guide: “Running a 35B AI Model on 6GB VRAM, FAST (llama.cpp Guide)”
- Demonstrated inference of qwen-36-35b-a3b (35B parameters, mixture-of-experts architecture) on hardware with only 6GB VRAM
- Leveraged quantization and MoE sparsity to bypass VRAM limits.
Bonsai 27B vs. Qwen 35B: LLM Performance and Replacement Feasibility Benchmarks (2026-07-22)
- Channel guide: “Can a 3.5GB model replace my 35B daily driver? (Bonsai 27B)”
- Benchmarked Bonsai 27B against Qwen 35B to evaluate trade-offs between model size, inference speed, and real-world applicability.
- Investigated feasibility of replacing high-parameter daily drivers with smaller, optimized models for specific use cases.

References

Bonsai 27B vs. Qwen 35B: LLM Performance and Replacement Feasibility Benchmarks

Source Notes

2026-07-22: Bonsai 27B vs. Qwen 35B: LLM Performance and Replacement Feasibility Benchmarks · ▶ source
2026-07-13: Developing Persistent, Intelligent Memory for Local AI with a Librarian System · ▶ source
2026-05-31: Budget GPU Local Coding Agent Performance Optimization Report · ▶ source
2026-05-10: Achieving Fast 35B MoE AI Model Performance on 6GB VRAM with Llama.cpp · ▶ source

Graph View

Codacus
Key Works & Demonstrations
References
Source Notes

Backlinks

INDEX
ternary-quantization
Earth Systems, Geology & Climate
Maths, Logic & Crypto
llm-wiki
qwen-35b
Achieving Fast 35B MoE AI Model Performance on 6GB VRAM with Llama.cpp

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community