The URL-to-Markdown API built for AI agents and RAG
Every retrieval pipeline eventually hits the same wall: the web is HTML, and your model wants clean text. Feed raw HTML to an LLM and you burn tokens on <div> soup, navigation, and cookie banners — and you dilute the signal you actually care about. The Save API turns any URL into clean Markdown so your agent reads content, not markup.
Why Markdown is the right format for LLMs
Markdown is the sweet spot for model context: it preserves structure (headings, lists, tables, code) that helps a model understand a document, while dropping the styling and scripting that waste tokens and confuse attention. It’s compact, it’s readable, and every major model is heavily trained on it.
That’s why “give me this page as Markdown” is one of the most common primitives in agent tooling. The Save API makes it a single call.
The agent loop
import os, requests
def read_url(url: str) -> str:
r = requests.post(
"https://api.savemarkdown.co/v1/convert",
headers={"Authorization": f"Bearer {os.environ['SAVE_API_KEY']}"},
json={"url": url},
)
r.raise_for_status()
return r.json()["markdown"]
# In your agent: tool call → read_url(url) → put markdown in context
Drop read_url behind a tool definition and your agent can browse the open web, returning clean context every time. For a RAG ingest job, map it over your source URLs and write the Markdown straight to your vector store’s document loader.
Why not just scrape it yourself?
You can. But the long tail is brutal:
- JavaScript pages return an empty shell to a plain fetch. You need a headless browser — but spinning one up for every URL is slow and expensive.
- Bot walls block datacenter IPs. You need realistic headers and, sometimes, a residential proxy.
- Boilerplate removal is a research problem. Readability heuristics get you 70% of the way and then break on the page that matters.
- Maintenance never ends. Sites change, your extractor rots.
The Save API handles this with a tiered engine: cheap fetch first, headless render only when a page is genuinely a JS shell, then boilerplate stripped. You get the result without owning the pipeline — and the response tells you which tier ran, so cost stays predictable.
Walled gardens, handled honestly
YouTube routes through its official transcript channel. X, Instagram and TikTok are best-effort. We do not use ghost accounts or scrape behind logins — that’s a legal and reliability minefield, and it’s not what a dependable API should be built on. For content you have legitimate access to, bring-your-own-session support is on the roadmap.
Discoverable by agents, by design
If your agent discovers tools dynamically, the Save API is already advertised in our agent-skills index (save.api.url-to-markdown), API catalog, and llms.txt.
## Continue reading
Introducing the Save API: turn any URL into clean Markdown
The engine behind Save is now a developer API. POST a URL, get back clean, LLM-ready Markdown. Built for AI agents, RAG pipelines and scrapers. Pay as you go from $2 per 1,000 pages.
Why Markdown Is the Best Format for LLMs and AI Agents
Markdown reduces token usage by up to 10x compared to HTML. Learn why AI agents and LLMs prefer Markdown for context and how to optimize your AI workflows.
Markdown Wikis Are Replacing RAG — Karpathy's Pattern Explained
A folder of Markdown files beats a vector database for personal knowledge. Why Karpathy's post-RAG pattern works, when to use it, and how to build one in 15 minutes with Save Vault.
How the Save API renders JavaScript pages to Markdown
A look inside the tiered fetch engine: cheap server-side fetch first, headless render only when a page is a JS shell, then boilerplate stripped. How we keep quality high and cost near zero.
Written by
Jean-Sébastien Wallez
I've been making internet products for 10+ years. Built Save on weekends because I wanted my own reading library in clean markdown for Claude and Obsidian. Write here about web clipping, AI workflows, and the small things that make a personal knowledge base actually useful.