The URL-to-Markdown API built for AI agents and RAG

·

Every retrieval pipeline eventually hits the same wall: the web is HTML, and your model wants clean text. Feed raw HTML to an LLM and you burn tokens on <div> soup, navigation, and cookie banners — and you dilute the signal you actually care about. The Save API turns any URL into clean Markdown so your agent reads content, not markup.

Why Markdown is the right format for LLMs

Markdown is the sweet spot for model context: it preserves structure (headings, lists, tables, code) that helps a model understand a document, while dropping the styling and scripting that waste tokens and confuse attention. It’s compact, it’s readable, and every major model is heavily trained on it.

That’s why “give me this page as Markdown” is one of the most common primitives in agent tooling. The Save API makes it a single call.

The agent loop

import os, requests

def read_url(url: str) -> str:
    r = requests.post(
        "https://api.savemarkdown.co/v1/convert",
        headers={"Authorization": f"Bearer {os.environ['SAVE_API_KEY']}"},
        json={"url": url},
    )
    r.raise_for_status()
    return r.json()["markdown"]

# In your agent: tool call → read_url(url) → put markdown in context

Drop read_url behind a tool definition and your agent can browse the open web, returning clean context every time. For a RAG ingest job, map it over your source URLs and write the Markdown straight to your vector store’s document loader.

Why not just scrape it yourself?

You can. But the long tail is brutal:

  • JavaScript pages return an empty shell to a plain fetch. You need a headless browser — but spinning one up for every URL is slow and expensive.
  • Bot walls block datacenter IPs. You need realistic headers and, sometimes, a residential proxy.
  • Boilerplate removal is a research problem. Readability heuristics get you 70% of the way and then break on the page that matters.
  • Maintenance never ends. Sites change, your extractor rots.

The Save API handles this with a tiered engine: cheap fetch first, headless render only when a page is genuinely a JS shell, then boilerplate stripped. You get the result without owning the pipeline — and the response tells you which tier ran, so cost stays predictable.

Walled gardens, handled honestly

YouTube routes through its official transcript channel. X, Instagram and TikTok are best-effort. We do not use ghost accounts or scrape behind logins — that’s a legal and reliability minefield, and it’s not what a dependable API should be built on. For content you have legitimate access to, bring-your-own-session support is on the roadmap.

Discoverable by agents, by design

If your agent discovers tools dynamically, the Save API is already advertised in our agent-skills index (save.api.url-to-markdown), API catalog, and llms.txt.

See the docs →

## Continue reading

Jean-Sébastien Wallez

Written by

Jean-Sébastien Wallez

I've been making internet products for 10+ years. Built Save on weekends because I wanted my own reading library in clean markdown for Claude and Obsidian. Write here about web clipping, AI workflows, and the small things that make a personal knowledge base actually useful.