How to Use Claude Code for Free — No Subscription, No API Bill

Claude Code is one of the most powerful agentic coding tools available today — but most people think it requires a $20/month Pro plan. It doesn't. Here's exactly how to run it for free using local AI models on your own machine.

Every developer who has used Claude Code knows how transformative it feels. You describe what you want, and it reads files, writes code, runs terminal commands, and manages context across your entire project — all on its own. It's not a chatbot. It's an agent.

The problem? Most people assume Claude Code is locked behind Anthropic's Pro or Max subscription. And while the default setup does require a paid plan, the underlying tool itself is completely free to install. What you're actually paying for is the AI model behind it — and that part is entirely swappable.

💡 Key Insight

Claude Code is the interface, not the brain. You can replace the brain (the AI model) with a free, locally-running alternative and keep all the same features: file editing, terminal commands, and multi-step task execution.

What Is Claude Code, Exactly?

Claude Code is a terminal-based agentic coding tool built by Anthropic. Unlike ChatGPT or Claude's web interface, it doesn't just respond to messages — it actively does things. It can open files, edit code, run shell commands, search your project, and chain together multi-step tasks without you needing to babysit every move.

Think of it as having a senior developer sitting in your terminal who actually executes work, not just suggests it.

The architecture is two-layered. First, there's the Claude Code harness — the CLI application that handles file access, tool execution, and task management. This layer is free and installable by anyone. Second, there's the language model that does the actual thinking. By default, this points to Anthropic's servers and uses Claude Sonnet or Opus, which costs money.

The trick? You can tell Claude Code to point somewhere else entirely.

🦙 Ollama Local Server

Claude Code → localhost:11434

Model: Qwen3.6 / Gemma 4 · Cost: $0

How Claude Code connects to a local Ollama instance instead of Anthropic's paid API servers.

How the Free Setup Actually Works

Claude Code exposes two critical configuration points that make this possible:

--model flag — lets you specify which model to use when launching Claude Code
ANTHROPIC_BASE_URL environment variable — lets you redirect API calls to any compatible endpoint, including a local one

Ollama is a tool that runs open-weight AI models directly on your machine. It starts a local server — by default at localhost:11434 — that speaks the same API language as Anthropic's servers. Set the environment variable to point there, and Claude Code has no idea it's talking to a local model instead of the cloud.

The result is a fully functional agentic coding environment running entirely on your hardware, with zero API costs and complete privacy — your code never leaves your machine.

Step-by-Step: How to Set It Up

1

Install Claude Code

Claude Code installs as a global npm package. You'll need Node.js 18+ on your machine. Run this in your terminal:

Terminal

npm install -g @anthropic-ai/claude-code

2

Install Ollama

Head to ollama.com and download Ollama for your OS (Mac, Linux, or Windows). Install and run it — it starts a local server automatically in the background.
3

Download a Coding Model

Pull one of the recommended models. Qwen3.6 is purpose-built for agentic coding tasks and is the top recommendation for this setup:

Terminal

# Recommended: Qwen3.6 (agentic coding specialist)
ollama pull qwen3.6

# Alternative: Gemma 4 by Google DeepMind
ollama pull gemma4

# For 16GB RAM or less (lightweight version)
ollama pull gemma4:4b

4

Launch Claude Code with Ollama

Run this single command to start Claude Code using your local model. It will prompt you to select the model you downloaded:

Terminal

ollama launch claude

# Or manually set the endpoint and model:
ANTHROPIC_BASE_URL=http://localhost:11434 claude --model qwen3.6

That's it. You're inside Claude Code. The interface is identical — file editing, terminal commands, context management — all working the same way as the paid version, just powered by a local model.

$ ollama launch claude

Select a model:

▶ qwen3.6

gemma4

gemma4:4b

✓ Claude Code initialized with qwen3.6

✓ Connected to local Ollama instance

Type your first task or /help for commands...

❯ █

Claude Code terminal prompt after connecting to a local Ollama model — interface is identical to the paid version.

Which Models Work Best?

Not every open-weight model is suited for agentic coding tasks. Claude Code needs a model that handles long context windows, follows complex multi-step instructions, and properly uses tool-call syntax. Here's what's worth your time right now:

Model	Size	RAM Needed	Best For	Agentic?
Qwen3.6 27B	27B params	~17GB	Frontend, repos	✓ Excellent
Qwen3.6 35B	35B params	~24GB	Complex codebases	✓ Best
Gemma 4 31B	31B params	~22GB	General coding	✓ Good
Gemma 4 E4B	4B active	~5GB	Low-end devices	~ Decent
Claude Sonnet (paid)	Cloud	No local GPU	Everything	✓ Best-in-class

Qwen3.6 is the standout choice for this setup. It was designed specifically for agentic workflows and handles repository-level reasoning and frontend code better than most open-weight alternatives at this size.

💡 Pro Tip

Ollama defaults to a small context window which breaks multi-file tasks. Set OLLAMA_NUM_CTX=65536 before launching to give your model a proper 64K context window. Complex tasks will fail without this.

What Hardware Do You Actually Need?

This is the honest part. Running a large language model locally is one of the most demanding workloads a consumer machine handles. Before committing to this setup, here's what you're working with:

Apple Silicon Macs (Best Option)

Apple's unified memory architecture gives Macs a significant advantage for local AI. The CPU and GPU share the same RAM pool, which means a 32GB M-series Mac can comfortably run Qwen3.6 or Gemma 4 at full quality. Even a 16GB M2 or M3 can handle the smaller Gemma 4 E4B variant.

Windows / Linux with Dedicated GPU

If you're on PC, VRAM is everything. An RTX 3090 (24GB VRAM) or RTX 4090 handles the 27B models well. For 16GB VRAM cards like the RTX 4080, stick to the quantized versions of smaller models. The 4-bit quantized Qwen3.6 27B pulls around 17GB, so it's a tight but workable fit.

⚠️ Important Caveat

One reader noted that the latest Claude Code release may have closed this integration. Before committing significant time to the setup, verify that your installed version still accepts the ANTHROPIC_BASE_URL override. The core method remains valid — just double check version compatibility.

Free vs Paid: Honest Comparison

Let's be direct. The free local setup is genuinely capable, but it's not identical to running Claude Sonnet or Opus. Here's where the gap is real and where it barely matters:

Where the Gap Is Real

Complex, multi-file refactors across large codebases — frontier models handle these with more precision
Nuanced debugging where context and reasoning depth matter
Generating long, architecturally complex outputs from scratch

Where Free Local Models Hold Their Own

Routine coding tasks — writing functions, creating components, adding features
File editing and search — these are harness-level tasks, not model-dependent
Simple automation and scripting — shell scripts, config files, boilerplate
Privacy-sensitive projects — your code stays on your machine, always

For most day-to-day development work, the free setup with Qwen3.6 is surprisingly competitive. The gap has shrunk considerably as open-weight models have matured.

Advanced Setup: Dedicated Inference Server

If you work across multiple devices or want to avoid taxing your main machine, consider running Ollama on a dedicated machine or home server and connecting to it over your local network.

Terminal — From Your Laptop

# Point Claude Code to your inference server
ANTHROPIC_BASE_URL=http://192.168.1.100:11434 claude --model qwen3.6

Replace 192.168.1.100 with your server's local IP. This way the heavy model runs on the server, and your laptop stays cool and fast. You get the performance of a beefy machine delivered to any device on your network.

If you're exploring free AI tools, Toolyfi has several utilities worth bookmarking alongside your Claude Code setup:

AI Assistant — Claude-powered chat, no signup required
AI Tool Finder — discover the right AI tool for any task
AI Prompt Generator — craft better prompts for coding assistants
JSON Formatter — clean and validate JSON output from AI tools

Conclusion: Own the Tool, Not the Subscription

The subscription economy has trained us to think of AI as a monthly rental. Claude Code breaks that assumption. The harness is free. The workflow is yours. The only thing you were paying for was the model — and now you have alternatives.

For developers on a budget, this setup is a complete game-changer. Install Ollama, pull Qwen3.6, set one environment variable, and you're running a full agentic coding environment at zero cost. For developers already on Claude Pro, it's still worth setting up as a fallback for tasks where you'd rather not burn through tokens.

The models aren't as powerful as Sonnet. That's the honest truth. But free and surprisingly capable is a completely different value proposition than $20/month and excellent. Try it first, then judge.

🚀 Quick Start Summary

1. npm install -g @anthropic-ai/claude-code → 2. Install Ollama → 3. ollama pull qwen3.6 → 4. ollama launch claude. That's your entire setup.

How to Use Claude Code
Completely Free

How to Use Claude Code for Free — No Subscription, No API Bill

What Is Claude Code, Exactly?

How the Free Setup Actually Works

Step-by-Step: How to Set It Up

Install Claude Code

Install Ollama

Download a Coding Model

Launch Claude Code with Ollama

Which Models Work Best?

What Hardware Do You Actually Need?

Apple Silicon Macs (Best Option)

Windows / Linux with Dedicated GPU

Free vs Paid: Honest Comparison

Where the Gap Is Real

Where Free Local Models Hold Their Own

Advanced Setup: Dedicated Inference Server

Conclusion: Own the Tool, Not the Subscription

Explore 40+ Free Online Tools

How to Use Claude CodeCompletely Free

How to Use Claude Code for Free — No Subscription, No API Bill

What Is Claude Code, Exactly?

How the Free Setup Actually Works

Step-by-Step: How to Set It Up

Install Claude Code

Install Ollama

Download a Coding Model

Launch Claude Code with Ollama

Which Models Work Best?

What Hardware Do You Actually Need?

Apple Silicon Macs (Best Option)

Windows / Linux with Dedicated GPU

Free vs Paid: Honest Comparison

Where the Gap Is Real

Where Free Local Models Hold Their Own

Advanced Setup: Dedicated Inference Server

More AI Tools on Toolyfi

Conclusion: Own the Tool, Not the Subscription

More From Toolyfi Blog

Explore 40+ Free Online Tools

How to Use Claude Code
Completely Free