Every developer who has used Claude Code knows how transformative it feels. You describe what you want, and it reads files, writes code, runs terminal commands, and manages context across your entire project — all on its own. It's not a chatbot. It's an agent.
The problem? Most people assume Claude Code is locked behind Anthropic's Pro or Max subscription. And while the default setup does require a paid plan, the underlying tool itself is completely free to install. What you're actually paying for is the AI model behind it — and that part is entirely swappable.
Claude Code is the interface, not the brain. You can replace the brain (the AI model) with a free, locally-running alternative and keep all the same features: file editing, terminal commands, and multi-step task execution.
What Is Claude Code, Exactly?
Claude Code is a terminal-based agentic coding tool built by Anthropic. Unlike ChatGPT or Claude's web interface, it doesn't just respond to messages — it actively does things. It can open files, edit code, run shell commands, search your project, and chain together multi-step tasks without you needing to babysit every move.
Think of it as having a senior developer sitting in your terminal who actually executes work, not just suggests it.
The architecture is two-layered. First, there's the Claude Code harness — the CLI application that handles file access, tool execution, and task management. This layer is free and installable by anyone. Second, there's the language model that does the actual thinking. By default, this points to Anthropic's servers and uses Claude Sonnet or Opus, which costs money.
The trick? You can tell Claude Code to point somewhere else entirely.
How the Free Setup Actually Works
Claude Code exposes two critical configuration points that make this possible:
--modelflag — lets you specify which model to use when launching Claude CodeANTHROPIC_BASE_URLenvironment variable — lets you redirect API calls to any compatible endpoint, including a local one
Ollama is a tool that runs open-weight AI models directly on your machine. It starts a local server — by default at localhost:11434 — that speaks the same API language as Anthropic's servers. Set the environment variable to point there, and Claude Code has no idea it's talking to a local model instead of the cloud.
The result is a fully functional agentic coding environment running entirely on your hardware, with zero API costs and complete privacy — your code never leaves your machine.
Step-by-Step: How to Set It Up
-
1
Install Claude Code
Claude Code installs as a global npm package. You'll need Node.js 18+ on your machine. Run this in your terminal:
npm install -g @anthropic-ai/claude-code
-
2
Install Ollama
Head to ollama.com and download Ollama for your OS (Mac, Linux, or Windows). Install and run it — it starts a local server automatically in the background.
-
3
Download a Coding Model
Pull one of the recommended models. Qwen3.6 is purpose-built for agentic coding tasks and is the top recommendation for this setup:
# Recommended: Qwen3.6 (agentic coding specialist) ollama pull qwen3.6 # Alternative: Gemma 4 by Google DeepMind ollama pull gemma4 # For 16GB RAM or less (lightweight version) ollama pull gemma4:4b
-
4
Launch Claude Code with Ollama
Run this single command to start Claude Code using your local model. It will prompt you to select the model you downloaded:
ollama launch claude # Or manually set the endpoint and model: ANTHROPIC_BASE_URL=http://localhost:11434 claude --model qwen3.6
That's it. You're inside Claude Code. The interface is identical — file editing, terminal commands, context management — all working the same way as the paid version, just powered by a local model.
Which Models Work Best?
Not every open-weight model is suited for agentic coding tasks. Claude Code needs a model that handles long context windows, follows complex multi-step instructions, and properly uses tool-call syntax. Here's what's worth your time right now:
| Model | Size | RAM Needed | Best For | Agentic? |
|---|---|---|---|---|
| Qwen3.6 27B | 27B params | ~17GB | Frontend, repos | ✓ Excellent |
| Qwen3.6 35B | 35B params | ~24GB | Complex codebases | ✓ Best |
| Gemma 4 31B | 31B params | ~22GB | General coding | ✓ Good |
| Gemma 4 E4B | 4B active | ~5GB | Low-end devices | ~ Decent |
| Claude Sonnet (paid) | Cloud | No local GPU | Everything | ✓ Best-in-class |
Qwen3.6 is the standout choice for this setup. It was designed specifically for agentic workflows and handles repository-level reasoning and frontend code better than most open-weight alternatives at this size.
Ollama defaults to a small context window which breaks multi-file tasks. Set OLLAMA_NUM_CTX=65536 before launching to give your model a proper 64K context window. Complex tasks will fail without this.
What Hardware Do You Actually Need?
This is the honest part. Running a large language model locally is one of the most demanding workloads a consumer machine handles. Before committing to this setup, here's what you're working with:
Apple Silicon Macs (Best Option)
Apple's unified memory architecture gives Macs a significant advantage for local AI. The CPU and GPU share the same RAM pool, which means a 32GB M-series Mac can comfortably run Qwen3.6 or Gemma 4 at full quality. Even a 16GB M2 or M3 can handle the smaller Gemma 4 E4B variant.
Windows / Linux with Dedicated GPU
If you're on PC, VRAM is everything. An RTX 3090 (24GB VRAM) or RTX 4090 handles the 27B models well. For 16GB VRAM cards like the RTX 4080, stick to the quantized versions of smaller models. The 4-bit quantized Qwen3.6 27B pulls around 17GB, so it's a tight but workable fit.
One reader noted that the latest Claude Code release may have closed this integration. Before committing significant time to the setup, verify that your installed version still accepts the ANTHROPIC_BASE_URL override. The core method remains valid — just double check version compatibility.
Free vs Paid: Honest Comparison
Let's be direct. The free local setup is genuinely capable, but it's not identical to running Claude Sonnet or Opus. Here's where the gap is real and where it barely matters:
Where the Gap Is Real
- Complex, multi-file refactors across large codebases — frontier models handle these with more precision
- Nuanced debugging where context and reasoning depth matter
- Generating long, architecturally complex outputs from scratch
Where Free Local Models Hold Their Own
- Routine coding tasks — writing functions, creating components, adding features
- File editing and search — these are harness-level tasks, not model-dependent
- Simple automation and scripting — shell scripts, config files, boilerplate
- Privacy-sensitive projects — your code stays on your machine, always
For most day-to-day development work, the free setup with Qwen3.6 is surprisingly competitive. The gap has shrunk considerably as open-weight models have matured.
Advanced Setup: Dedicated Inference Server
If you work across multiple devices or want to avoid taxing your main machine, consider running Ollama on a dedicated machine or home server and connecting to it over your local network.
# Point Claude Code to your inference server ANTHROPIC_BASE_URL=http://192.168.1.100:11434 claude --model qwen3.6
Replace 192.168.1.100 with your server's local IP. This way the heavy model runs on the server, and your laptop stays cool and fast. You get the performance of a beefy machine delivered to any device on your network.
More AI Tools on Toolyfi
If you're exploring free AI tools, Toolyfi has several utilities worth bookmarking alongside your Claude Code setup:
- AI Assistant — Claude-powered chat, no signup required
- AI Tool Finder — discover the right AI tool for any task
- AI Prompt Generator — craft better prompts for coding assistants
- JSON Formatter — clean and validate JSON output from AI tools
Conclusion: Own the Tool, Not the Subscription
The subscription economy has trained us to think of AI as a monthly rental. Claude Code breaks that assumption. The harness is free. The workflow is yours. The only thing you were paying for was the model — and now you have alternatives.
For developers on a budget, this setup is a complete game-changer. Install Ollama, pull Qwen3.6, set one environment variable, and you're running a full agentic coding environment at zero cost. For developers already on Claude Pro, it's still worth setting up as a fallback for tasks where you'd rather not burn through tokens.
The models aren't as powerful as Sonnet. That's the honest truth. But free and surprisingly capable is a completely different value proposition than $20/month and excellent. Try it first, then judge.
1. npm install -g @anthropic-ai/claude-code → 2. Install Ollama → 3. ollama pull qwen3.6 → 4. ollama launch claude. That's your entire setup.