Context Engineering Is the New Prompt Engineering
In September 2025, Anthropic published an engineering post that quietly retired a discipline.
Context engineering is the natural progression of prompt engineering. Where prompt engineering refers to methods for writing and organizing LLM instructions, context engineering refers to the strategies for curating and maintaining the optimal set of tokens during LLM inference.
The post wasn’t framed as a category death. But it was one. The thing people had been doing for two years under the name “prompt engineering” was now a subset of something larger — and the larger thing was where the actual leverage lived.
A few months earlier, Andrej Karpathy had laid the conceptual foundation in a single sentence.
The LLM is the CPU. The context window is the RAM. Context engineering is the operating system curating what goes into RAM at each step.
That analogy did more for the field than a thousand prompt-engineering tutorials.
What Prompt Engineering Actually Was
Strip away the mystique and prompt engineering was three things:
- Wording tricks. “Take a deep breath.” “You are an expert in X.” “Think step by step.” These worked on weaker models that needed nudging into reasoning mode.
- Format coercion. Forcing structured output via examples, delimiters, role-play scaffolding. Necessary when models couldn’t reliably follow schemas.
- Jailbreak adjacency. Half the discipline was about getting around the safety layer, the other half about getting around the formatting layer.
By 2026, all three are mostly solved at the model level. Claude 4 and GPT-5 reason without being told to. They follow JSON schemas natively. They produce well-formed tool calls. The prompt wording barely matters anymore — what matters is what’s around the prompt.
What Context Engineering Actually Is
Context engineering is the design of everything that fits into the LLM’s working memory for a given turn. The prompt is one component. The other components are where the leverage is.
The full context envelope, in 2026:
- System prompt: identity, behavior, hard rules, output format.
- Tool definitions: what functions the model can call, what they do, when to use them.
- Retrieved knowledge: documents, snippets, embeddings pulled from a vector store or a file system.
- Working directory state: the files the model can read, edit, or grep.
CLAUDE.md,AGENTS.md,helpers.py, project READMEs. - Message history: prior turns, prior tool calls and their results.
- Live data: API responses, search results, page DOMs, database rows.
- Examples: few-shot demonstrations of desired behavior.
Context engineering is deciding which of these matter for this task, in what order, at what density, refreshed how often.
A prompt engineer asked: how do I phrase this so the model does the right thing?
A context engineer asks: what configuration of tokens around the prompt makes the right thing the obvious next step?
The Anthropic post puts the question even more sharply.
What configuration of context is most likely to generate our model’s desired behavior?
That’s the entire discipline in one sentence.
Why It Replaced Prompt Engineering So Fast
Three things shifted in parallel between mid-2024 and end of 2025.
Context windows got cheap. A 1M-token context means you can drop the entire codebase, the entire doc set, the entire conversation history into the model — and the model can re-read on demand. The pre-chunking and retrieval gymnastics that dominated 2023’s prompt engineering became optional. You could just load the file.
Tool use got reliable. Models started producing well-formed function calls without a validator layer. That made the tool surface a first-class part of context, not an afterthought layered on top of a prompt.
Agents stopped being demos. When you’re running a 200-turn agent loop, the prompt at turn 1 is irrelevant by turn 50. What matters is the cumulative state of the working directory, the message history compaction strategy, what got pruned, what got summarized, what got re-fetched. That’s context engineering. It has nothing to do with “how do I word my instructions.”
Once those three were true, the locus of optimization moved up the stack. Prompts became cheap. Context became the bottleneck.
Karpathy’s OS Analogy, Fully Unpacked
Karpathy’s CPU/RAM framing is worth taking seriously, because it dictates how the discipline works in practice.
If the LLM is the CPU, then:
- The context window is RAM. Finite, fast, expensive per byte, gets flushed every turn.
- Files on disk are the hard drive. Slow to read but unlimited. You curate which ones get loaded into RAM for the current step.
- System prompts are firmware. Set once per session, always present, define what the CPU can and can’t do.
- Tool definitions are device drivers. Expose what the CPU can talk to (filesystem, HTTP, browser, database).
- Retrieval is paging. Pull bytes from disk into RAM when needed, evict when not.
- The agent loop is the scheduler. Decides what gets loaded next, what gets evicted, what gets summarized.
The job of the context engineer is to design the scheduler. Which files get loaded for which kinds of tasks. How history gets compacted. When to re-fetch fresh data vs trust stale data. What lives in firmware (system prompt) vs what gets streamed in.
This isn’t an abstraction. It’s exactly what Anthropic does inside Claude Code. It’s exactly what Cursor does in agent mode. It’s exactly what the harness pattern formalizes — a tool surface plus a working directory that the model curates as it goes.
The Components Worth Engineering, Ranked
If you’re building anything that uses LLMs in 2026, here’s where the gains are, in rough order of impact:
-
The working directory. What files exist, how they’re named, what their first 20 lines look like. The agent is going to grep, read, and edit these. They shape every decision it makes. This is the single highest-leverage place to spend time.
-
CLAUDE.md/AGENTS.md/ project meta-files. The agent’s orientation document. What the project is, what the conventions are, what to do and not to do. Every coding agent in 2026 reads one of these first. -
Tool design. Not which tools exist — that’s binary. But what they’re named, what their docstrings say, what their error messages look like. The model uses these signals to decide when to call what.
-
Retrieval strategy. When to load a file vs grep for a string vs query an embedding. Most projects over-engineer this. A
grepover a well-organized folder beats most RAG setups. -
History compaction. How prior turns get summarized as the conversation grows. The default in most agent loops is naive truncation. A better strategy preserves the right signal density.
-
System prompt. Still matters, but less than people think. A good system prompt sets identity and hard rules in 50 lines. Anything longer is usually doing something the working directory should be doing.
Notice what’s not on the list. The user-facing prompt. The wording. The “you are a senior engineer” preamble. Those are commodity now.
Why Markdown Won
One pattern shows up everywhere context engineering is taken seriously: Markdown on disk.
Claude Code reads Markdown. Codex reads Markdown. Cursor’s agent mode reads Markdown. Browser Harness reads Markdown. Karpathy’s autoresearch loop reads program.md. Every harness shipped in 2026 expects a folder of Markdown files describing what’s in it.
This isn’t an accident. Markdown is the right substrate for context because:
- It’s human-readable, so you can audit what you’re feeding the model.
- It’s structured enough (headings, lists, code blocks) that the model parses it cleanly.
- It’s diff-friendly, so you can version it in Git.
- It’s plain text, so it works across every tool, every platform, every shell.
- It compresses well in context windows — no XML bloat, no JSON ceremony.
If you’re context-engineering anything serious, you’re maintaining a Markdown library. The question is whether it’s your Markdown library or someone else’s.
Where Save Fits
Every context-engineered system in 2026 reads Markdown from disk. The CLAUDE.md it grounds against, the docs it grepped, the API reference it consulted, the API I just wrote-up — all sitting in a folder the agent can see.
Save is the one-click converter from any webpage to clean Markdown. Documentation pages, GitHub READMEs, Anthropic’s context engineering post, Karpathy’s threads, Stack Overflow answers — whatever shows up in your research that the next agent you run is going to need to read.
The harness is open source. The model is a commodity. The context library is what makes one agent better than the next — and that library is just Markdown, sitting in a folder, ready to be loaded into RAM when needed.
Context engineering is the discipline. A curated Markdown library is the artifact.
Save turns any webpage into Markdown your AI context engine can read — install the extension and start building the library that makes your agents smarter.
## Continue reading
Karpathy's 'Two Groups' of AI Users --- Which One Are You?
Andrej Karpathy says there's a growing gap in understanding of AI capability. One group thinks AI is a toy. The other is experiencing 'AI Psychosis.' Here's what separates them --- and how to cross the divide.
How AI Agents Use Your Obsidian Vault in 2026 (MCP + Markdown)
Connect AI agents like Claude Code to your Obsidian vault via MCP. Turn your saved Markdown notes into context that makes AI smarter about your work.
Karpathy's Autoresearch & PROGRAM.md: AI That Runs Experiments While You Sleep
Andrej Karpathy's autoresearch lets AI agents run 100+ ML experiments overnight, guided by a single Markdown file called program.md. Here's how it works and why it matters.
Harnesses, Not Frameworks — The New Shape of AI Tools
Greg Zunic just open-sourced Browser Harness. It's the same pattern as Claude Code and Codex: strip the framework, hand the LLM raw tools, let it figure things out. Why the harness is replacing the framework — and what it runs on.
Written by
Jean-Sébastien Wallez
I've been making internet products for 10+ years. Built Save on weekends because I wanted my own reading library in clean markdown for Claude and Obsidian. Write here about web clipping, AI workflows, and the small things that make a personal knowledge base actually useful.