Ollama + Claude + GLM. Channel Sam Witteveen
https://www.youtube.com/watch?v=NA5U06WuO34 Here is a Markdown summary and guide based on the video content.
Running Claude Code Locally with Ollama and GLM-4.7-Flash
This guide covers how to use the new Anthropic API compatibility in Ollama to run Claude Code locally using the GLM-4.7-Flash model.
Overview
Ollama now supports the Anthropic API, allowing users to hook local models into tools designed for Claude. This demonstration tests the GLM-4.7-Flash model (a 30B parameter Mixture-of-Experts model with 3B active parameters) to see if it can function as a local coding assistant.
New Feature: ollama launch
Ollama released a new command called ollama launch. This feature simplifies the process of connecting local models to coding environments. It supports:
- Claude Code
- Codex
- Droid
- OpenCode
Prerequisites
- Ollama Version: Must be updated to the latest version (v0.1.5+).
- Hardware: Recommended to have a Mac with Apple Silicon (M-series) or a machine with a powerful GPU. Tested on Mac Mini Pro (32GB RAM).
Step-by-Step Setup
1. Pull the Model
Open your terminal and pull the GLM model:
ollama pull glm-4.7-flash
2. Configure Context Length (Crucial Step)
By default, Ollama uses a context length of 4096 tokens. This is insufficient for coding agents like Claude Code, which will cause the model to forget instructions or fail to use tools.
- Open the Ollama application menu bar icon.
- Go to Settings (or specific model settings).
- Change the Context Length to at least 64k (64000).
3. Launch Claude Code
Run the following command in your terminal to initialize the connection:
ollama launch claude
Alternatively, you can type _ollama launch_ to see a menu of available integrations.
Once launched, you can interact with Claude Code using the local model just as you would with the hosted version (e.g., using /plan mode).
Performance Review
The Setup Tested: Mac Mini Pro with 32GB RAM.
✅ The Good
- It Works: The integration successfully connects; Claude Code boots up and recognizes the local model.
- Tool Recognition: The model is capable of identifying and attempting to use MCP (Model Context Protocol) tools installed on the system.
- Cost: It allows for a free “backup” to the paid Anthropic API.
❌ The Bad
- Speed: It is significantly slower than the hosted Claude API. Both “pre-fill” (processing context) and “decoding” (generating text) take a long time on local hardware.
- Accuracy: While it attempts to use tools, the quantized/smaller model sometimes hallucinates incorrect arguments for tools (unlike Claude Opus or Sonnet 3.5).
- Resource Intensive: Running a 64k context window locally requires significant RAM/VRAM.
Verdict
While ollama launch is an excellent feature for ease of use, running Claude Code locally with current open-weights models on consumer hardware is not yet “prime time” ready for professional workflows.
- It is currently too slow and prone to minor hallucinations compared to the paid API.
- It serves as a great proof of concept.
- Future optimized coding models (like Gemini 4 or Qwen 4) may make this viable soon.