Andrej Karpathy’s autoresearch proved that a well-written Markdown file can direct AI agents to make real scientific discoveries overnight. But not all program.md files are created equal.

The quality of your Markdown instructions directly determines the quality of the AI agent’s output. A vague program.md produces random, undirected experiments. A precise one produces focused improvements that compound.

Here’s how to write a program.md that actually works.

The Structure of a Good program.md

Every effective program.md needs five sections, whether you’re doing ML research or any other agent-directed work.

1. Context: What Does the Agent Need to Know?

The agent starts with zero understanding of your project. Your first job is giving it enough context to make intelligent decisions.

What to include:

What the project does
What the codebase looks like
Key files and their purposes
Domain-specific terminology
Current state and known issues

What to skip:

Obvious information the LLM already knows
Implementation details it can read from the code
History that doesn’t affect current decisions

2. Goals: What Should the Agent Optimize?

This is the most critical section. The agent needs a clear, measurable objective.

In autoresearch, the goal is straightforward: reduce val_bpb (validation bits per byte). The agent can measure this after every 5-minute training run.

For your own projects, define success in terms the agent can evaluate:

“Reduce page load time below 2 seconds”
“Increase test coverage above 80%”
“Reduce bundle size by at least 15%”

Vague goals like “make the code better” produce vague results. Measurable goals produce focused improvements.

3. Constraints: What Should the Agent Never Do?

Constraints are just as important as goals. Without them, the agent might find creative solutions you don’t want --- like deleting all tests to “improve” build speed.

Common constraints:

Don’t modify test files or evaluation code
Don’t change the public API
Don’t introduce new dependencies
Don’t exceed a memory budget
Keep the code readable and maintainable

In autoresearch, the key constraint is that only train.py can be modified. The data pipeline, evaluation code, and test set are locked. This prevents the agent from gaming the metrics.

4. Strategy: How Should the Agent Approach the Problem?

This is where your domain expertise shines. You know things the agent doesn’t --- which directions are promising and which are dead ends.

Good strategy instructions:

“Start with hyperparameter tuning before architectural changes”
“Focus on the attention mechanism --- the current implementation may be suboptimal”
“Try regularization techniques first: dropout, weight decay, layer norm”
“Avoid changes that increase training time by more than 10%”

Bad strategy instructions:

“Try everything” (too vague)
“Change the learning rate to 0.001” (too specific --- you’re micromanaging)

The sweet spot is directional guidance that lets the agent explore within productive boundaries.

5. Evaluation: How Should the Agent Judge Success?

The agent needs to know how to measure whether its changes helped. In autoresearch, this is built into the loop: if val_bpb improves, keep the change. If not, revert.

For other contexts, define your evaluation criteria:

Which metrics matter?
What threshold counts as an improvement?
How should the agent handle ambiguous results?
When should the agent stop and report back?

Common Mistakes

Being Too Vague

“Make the model better” gives the agent no direction. Be specific about what “better” means, how to measure it, and which approaches to try first.

Being Too Specific

“Change line 47 to use a learning rate of 3e-4” defeats the purpose of agentic engineering. You’re supposed to set direction, not dictate implementation. Let the agent explore.

Forgetting Constraints

Without constraints, agents will find the path of least resistance --- which often isn’t what you want. An agent told to “reduce training time” might skip half the training data if you don’t say otherwise.

Not Iterating

Your first program.md won’t be perfect. Watch what the agent does, see where it goes wrong, and update your instructions. The best program.md files evolve over dozens of iterations.

The Iteration Loop

Writing program.md isn’t a one-shot process. It’s a loop:

Write your initial program.md
Run the agent
Review what the agent did
Update your instructions based on what worked and what didn’t
Repeat

Each iteration makes your instructions more precise. After a few rounds, you’ll have a program.md that consistently produces good results.

This is the core skill of agentic engineering: not writing code, but writing increasingly effective agent instructions through iteration.

Building Your Reference Library

The best program.md files don’t come from thin air. They’re built on deep knowledge of the domain --- documentation, papers, best practices, and examples.

When you encounter useful reference material on the web, save it as Markdown. Then when you’re writing your program.md, you can pull in relevant context, cite specific techniques, and give the agent the background knowledge it needs.

The researchers getting the best results from autoresearch aren’t just good writers. They’re domain experts with well-organized reference material that they can synthesize into clear agent instructions.

Save converts any webpage to clean Markdown --- perfect for building the reference library that powers effective program.md files. Try Save free.