Ollama + Claude + GLM. Channel Sam Witteveen



https://www.youtube.com/watch?v=NA5U06WuO34 Here is a Markdown summary and guide based on the video content.

Running Claude Code Locally with Ollama and GLM-4.7-Flash

This guide covers how to use the new Anthropic API compatibility in Ollama to run Claude Code locally using the GLM-4.7-Flash model.

Overview

Ollama now supports the Anthropic API, allowing users to hook local models into tools designed for Claude. This demonstration tests the GLM-4.7-Flash model (a 30B parameter Mixture-of-Experts model with 3B active parameters) to see if it can function as a local coding assistant.

New Feature: ollama launch

Ollama released a new command called ollama launch. This feature simplifies the process of connecting local models to coding environments. It supports:

  • Claude Code
  • Codex
  • Droid
  • OpenCode

Prerequisites

  • Ollama Version: Must be updated to the latest version (v0.1.5+).
  • Hardware: Recommended to have a Mac with Apple Silicon (M-series) or a machine with a powerful GPU. Tested on Mac Mini Pro (32GB RAM).

Step-by-Step Setup

1. Pull the Model

Open your terminal and pull the GLM model:

ollama pull glm-4.7-flash


2. Configure Context Length (Crucial Step)

By default, Ollama uses a context length of 4096 tokens. This is insufficient for coding agents like Claude Code, which will cause the model to forget instructions or fail to use tools.

  1. Open the Ollama application menu bar icon.
  2. Go to Settings (or specific model settings).
  3. Change the Context Length to at least 64k (64000).

3. Launch Claude Code

Run the following command in your terminal to initialize the connection:

ollama launch claude


Alternatively, you can type _ollama launch_ to see a menu of available integrations. Once launched, you can interact with Claude Code using the local model just as you would with the hosted version (e.g., using /plan mode).

Performance Review

The Setup Tested: Mac Mini Pro with 32GB RAM.

✅ The Good

  • It Works: The integration successfully connects; Claude Code boots up and recognizes the local model.
  • Tool Recognition: The model is capable of identifying and attempting to use MCP (Model Context Protocol) tools installed on the system.
  • Cost: It allows for a free “backup” to the paid Anthropic API.

❌ The Bad

  • Speed: It is significantly slower than the hosted Claude API. Both “pre-fill” (processing context) and “decoding” (generating text) take a long time on local hardware.
  • Accuracy: While it attempts to use tools, the quantized/smaller model sometimes hallucinates incorrect arguments for tools (unlike Claude Opus or Sonnet 3.5).
  • Resource Intensive: Running a 64k context window locally requires significant RAM/VRAM.

Verdict

While ollama launch is an excellent feature for ease of use, running Claude Code locally with current open-weights models on consumer hardware is not yet “prime time” ready for professional workflows.

  • It is currently too slow and prone to minor hallucinations compared to the paid API.
  • It serves as a great proof of concept.
  • Future optimized coding models (like Gemini 4 or Qwen 4) may make this viable soon.