Harnesses, Not Frameworks — The New Shape of AI Tools

On April 18, 2026, Gregor Zunic — co-founder of Browser Use — posted this:

Introducing: Browser Harness. A self-healing harness that can complete virtually any browser task. We got tired of browser frameworks restricting the LLM. So we removed the framework.

No framework. Direct CDP. One websocket to Chrome. A helpers.py the agent edits on the fly. Drop-in for Claude Code and Codex. The tweet is here.

This isn’t just a browser automation tool. It’s the clearest statement yet of a pattern that’s been quietly taking over AI tooling in 2026: the harness.

What’s a Harness?

A harness is the minimum wrapping around an LLM that lets it do useful work. It exposes a tool surface — usually filesystem, shell, maybe HTTP — and then gets out of the way.

Compare the two shapes:

Framework	Harness
Defines workflows, steps, DAGs	No workflow. The LLM decides.
Abstracts away the underlying tools	Exposes raw tools (shell, CDP, fs)
Prescribes what the agent should do	Prescribes what the agent can do
Breaks when the task doesn’t fit the template	Bends, because there’s no template
Optimizes for dumb models	Optimizes for smart models

Frameworks made sense in 2023. Models weren’t reliable enough to trust with raw capability, so you built rails. LangChain, AutoGPT, CrewAI — all variations on “let me hand-hold this LLM through a pipeline.”

Models got smarter. The rails started costing more than they saved.

Claude Code Was the First Real Harness

Claude Code shipped in early 2025 with a radical design: no orchestration, no planner module, no memory graph. Just an LLM with Bash, Read, Edit, Write, Grep, and a few web tools. That’s it.

The bet was that a smart enough model, given file system access and a shell, could do the orchestration itself. And it could. Karpathy called it “the only AI tool I actually use every day.”

Codex landed on the same shape a few months later. Different model, same philosophy: give the LLM a sandbox and tools, not a framework.

Browser Harness is this pattern arriving in browser automation. Instead of Selenium-style step definitions or Playwright-style APIs wrapped in agent scaffolding, you get a raw Chrome DevTools Protocol connection and a helpers file the agent rewrites when something breaks.

That’s the “self-healing” part. There’s no retry logic, no fallback strategy, no parser for error states. The LLM reads the error, edits the helper, tries again. The code base is the memory.

Why Harnesses Are Winning

Three things shifted in parallel:

Tool use got reliable. Claude 4 and GPT-5 follow tool schemas consistently enough that you don’t need a validator layer catching malformed calls.
Context windows stopped being scarce. A 1M-token context means you can load the whole codebase, the whole browser DOM, the whole doc set — and let the model re-read instead of pre-chunking.
Models learned to recover. When a call fails, a modern LLM edits the tool, writes a new helper, or changes approach. Framework authors used to write that recovery logic by hand. The model does it better.

Once those three are true, every abstraction layer between the LLM and the raw tool is a liability. It’s code that you maintain, that the model has to work around, that breaks when the task is even slightly off-pattern.

Greg’s line is the tell: “I challenge anyone to find a task that DOESN’T work.” Frameworks have known failure modes. Harnesses don’t — or rather, their failure mode is the LLM itself, which keeps getting better.

The Harness Stack in 2026

If you squint, you can see the stack forming:

Coding harness: Claude Code, Codex, Cursor agent mode
Browser harness: Browser Harness (Browser Use)
Research harness: Karpathy’s autoresearch — program.md + Claude Code
Data harness: Emerging — direct DB access + shell

The common shape: LLM + raw tool + persistent working directory. The working directory is where context accumulates, where helpers get written, where the model’s memory lives between turns.

Harnesses Run on Context

Here’s the part that matters if you’re building with these tools: a harness is only as good as the context it’s handed.

Claude Code without a CLAUDE.md is a generic coding assistant. Claude Code with a well-curated CLAUDE.md, a library of reference docs, and a knowledge folder it can grep — that’s what Karpathy uses. That’s the 10x version.

Same for Browser Harness. The helpers.py it edits on the fly starts from somewhere. If you seed that somewhere with patterns, auth flows, site-specific quirks you’ve documented — the harness gets leverage. If you hand it a blank file, it has to rediscover everything.

The harness does the work. The context library is where your advantage lives.

Where Save Fits

Every harness we’ve talked about reads Markdown from disk. CLAUDE.md, AGENTS.md, reference docs, saved documentation pages, API notes — all Markdown, all sitting in a folder the agent can see.

Save is a one-click converter from any webpage to clean Markdown. Documentation pages, blog posts, Stack Overflow answers, GitHub READMEs, API references — whatever the next harness you run will need to read.

The people getting the most out of Claude Code and Browser Harness in 2026 aren’t building more framework. They’re curating better libraries. The harness is free. The context is the moat.

Save turns any webpage into Markdown your AI harness can read — install the extension and start building the library that makes your agents smarter.

Harnesses, Not Frameworks — The New Shape of AI Tools

What’s a Harness?

Claude Code Was the First Real Harness

Why Harnesses Are Winning

The Harness Stack in 2026

Harnesses Run on Context

Where Save Fits

## Continue reading

Why Browser Use's Founder Just Open-Sourced an Anti-Framework

How AI Agents Use Your Obsidian Vault in 2026 (MCP + Markdown)

Context Engineering Is the New Prompt Engineering

Build a Personal LLM Knowledge Base in 15 Minutes (2026)

Jean-Sébastien Wallez