How to Convert arXiv Papers to Markdown for AI Research
arXiv papers are PDFs. PDFs are terrible for AI workflows. They don’t search well, they waste tokens when fed to LLMs, and they can’t be easily combined with other research materials in a knowledge base.
If you’re doing AI research --- or any field that relies on arXiv --- converting papers to Markdown changes everything.
Why Markdown for Research Papers?
LLMs understand Markdown natively. Feed Claude or ChatGPT a PDF and it struggles with formatting, page breaks, and two-column layouts. Feed it Markdown and it reads perfectly --- every equation, every code block, every reference.
10x fewer tokens. A typical arXiv paper is 200-500KB as PDF. The same content in Markdown is 10-30KB. That means you can fit 10x more papers in a single Claude context window.
Searchable across your entire library. With 50 papers as Markdown files in a folder, you can grep for any concept across all of them in milliseconds. Try that with PDFs.
Works with Obsidian. Papers as Markdown files in Obsidian become linked, tagged, and searchable. Add your own notes inline. Create connections between papers with [[wikilinks]].
How to Save arXiv Papers as Markdown
Method 1: Save Extension (Recommended)
Save converts the arXiv abstract page (and many HTML-rendered papers) to clean Markdown.
- Open the arXiv paper page (e.g.,
arxiv.org/abs/2401.12345) - Click the Save extension icon
- Get a Markdown file with the title, authors, abstract, and available content
For papers with HTML versions (increasingly common on arXiv), Save extracts the full paper content including equations, figures references, and citations.
Method 2: arXiv HTML + Save
Many recent papers have an HTML version on arXiv (look for the “HTML” link next to the PDF). Open the HTML version and use Save --- you’ll get the full paper as clean Markdown.
Method 3: Semantic Scholar or Papers With Code
These sites often have cleaner HTML renderings of papers. Open the paper page and use Save.
Building a Research Knowledge Base
The real power comes from accumulating papers over time:
research/
attention/
attention-is-all-you-need.md
flash-attention-v2.md
multi-head-latent-attention.md
scaling/
chinchilla-scaling-laws.md
scaling-data-constrained.md
agents/
toolformer.md
react-prompting.md
mcp-protocol.md
Point Claude Code at this folder:
cd research
claude
Now you can ask: “Compare the attention mechanisms in these papers” or “What are the key findings on scaling laws?” Claude reads all your papers and synthesizes answers grounded in actual research.
The Karpathy Pattern
Andrej Karpathy described this approach: build a personal wiki of markdown files, let an LLM research across them. For AI researchers, this means:
- Save every important paper as Markdown
- Organize by topic
- Add your own notes and annotations
- Let Claude or ChatGPT work with the full collection
After a few months, you have a personal research assistant that knows every paper you’ve read.
Get Started
Install Save and start with the next arXiv paper you read. Over time, your Markdown research library compounds into something no generic AI can match.
Turn arXiv papers into a searchable, AI-readable knowledge base. Install Save --- free to start.