How the Save API renders JavaScript pages to Markdown

·

The hard part of turning a URL into Markdown isn’t the Markdown. It’s the fetch. Half the modern web renders its content with JavaScript, which means a plain HTTP request gets back an empty shell. The naive fix — render every page in a headless browser — is reliable but slow and expensive. The Save API takes a tiered approach so most pages cost almost nothing and only the ones that need it pay for a render.

Tier 1: server-side fetch

Every request starts with a plain fetch() using a real browser fingerprint. Before anything goes out, the URL passes an SSRF guard that blocks private, loopback, and cloud-metadata addresses — a non-negotiable for an endpoint that fetches arbitrary URLs on a customer’s behalf.

The HTML is parsed with a streaming parser (Cloudflare’s HTMLRewriter — no DOM library, no extra memory) that strips script, style, nav, footer, aside and other noise while collecting the content elements: headings, paragraphs, lists, blockquotes, code, links and images. Entities are decoded; titles are cleaned of site-name suffixes. For static and server-rendered pages — most of the web — this is the whole story, and it’s fast and cheap.

The density check

After Tier 1 extraction, the engine asks one question: is this actually content, or an empty app shell? It looks at a few signals:

  • Is the extracted text suspiciously short?
  • Is the text-to-markup ratio tiny (i.e. mostly <div>s, little prose)?
  • Does the HTML look like a single-page-app mount point with an empty root node?
  • Did the server return a bot-wall status (403/429/503)?

If the page passes, you get the Tier 1 result. If it looks like a JS shell, the request escalates.

Tier 2: headless render

Escalation runs the page through a headless browser, waits for the network to settle, grabs the fully rendered HTML, and feeds it back through the same extraction pipeline. Same Markdown quality, now with the JavaScript-rendered content included.

Crucially, this only happens when the density check says it’s needed — or when you explicitly pass render: "always". Pass render: "never" to disable it entirely. That’s how the cheap tier stays cheap: you’re not paying for a browser on pages that didn’t need one.

Platform adapters

Some URLs are better served by an official channel than by scraping HTML. When the host matches a known platform, the engine routes to an adapter instead of fetching the page — YouTube, for example, goes through its transcript channel. This is also where we draw a hard line: no ghost accounts, no scraping behind logins.

Caching

Results are cached by URL plus options for 24 hours, so repeated reads of the same page are instant. Pass fresh: true to bypass the cache and refetch.

Why this shape matters

The tiered design is what makes the pricing honest. A plain Markdown conversion costs us a fraction of a cent, so it’s $2 per 1,000 pages. A render costs more, so it’s a separate, higher tier — and we only charge it when a render actually ran. The meta.tier field in every response tells you exactly which path handled your page.

Try it on your hardest page →

## Continue reading

Jean-Sébastien Wallez

Written by

Jean-Sébastien Wallez

I've been making internet products for 10+ years. Built Save on weekends because I wanted my own reading library in clean markdown for Claude and Obsidian. Write here about web clipping, AI workflows, and the small things that make a personal knowledge base actually useful.