How to Convert arXiv Papers to Markdown for AI Research

arXiv papers are PDFs. PDFs are terrible for AI workflows. They don’t search well, they waste tokens when fed to LLMs, and they can’t be easily combined with other research materials in a knowledge base.

If you’re doing AI research --- or any field that relies on arXiv --- converting papers to Markdown changes everything.

Why Markdown for Research Papers?

LLMs understand Markdown natively. Feed Claude or ChatGPT a PDF and it struggles with formatting, page breaks, and two-column layouts. Feed it Markdown and it reads perfectly --- every equation, every code block, every reference.

10x fewer tokens. A typical arXiv paper is 200-500KB as PDF. The same content in Markdown is 10-30KB. That means you can fit 10x more papers in a single Claude context window.

Searchable across your entire library. With 50 papers as Markdown files in a folder, you can grep for any concept across all of them in milliseconds. Try that with PDFs.

Works with Obsidian. Papers as Markdown files in Obsidian become linked, tagged, and searchable. Add your own notes inline. Create connections between papers with [[wikilinks]].

How to Save arXiv Papers as Markdown

Method 1: Save Extension (Recommended)

Save converts the arXiv abstract page (and many HTML-rendered papers) to clean Markdown.

Open the arXiv paper page (e.g., arxiv.org/abs/2401.12345)
Click the Save extension icon
Get a Markdown file with the title, authors, abstract, and available content

For papers with HTML versions (increasingly common on arXiv), Save extracts the full paper content including equations, figures references, and citations.

Method 2: arXiv HTML + Save

Many recent papers have an HTML version on arXiv (look for the “HTML” link next to the PDF). Open the HTML version and use Save --- you’ll get the full paper as clean Markdown.

Method 3: Semantic Scholar or Papers With Code

These sites often have cleaner HTML renderings of papers. Open the paper page and use Save.

Building a Research Knowledge Base

The real power comes from accumulating papers over time:

research/
  attention/
    attention-is-all-you-need.md
    flash-attention-v2.md
    multi-head-latent-attention.md
  scaling/
    chinchilla-scaling-laws.md
    scaling-data-constrained.md
  agents/
    toolformer.md
    react-prompting.md
    mcp-protocol.md

Point Claude Code at this folder:

cd research
claude

Now you can ask: “Compare the attention mechanisms in these papers” or “What are the key findings on scaling laws?” Claude reads all your papers and synthesizes answers grounded in actual research.

The Karpathy Pattern

Andrej Karpathy described this approach: build a personal wiki of markdown files, let an LLM research across them. For AI researchers, this means:

Save every important paper as Markdown
Organize by topic
Add your own notes and annotations
Let Claude or ChatGPT work with the full collection

After a few months, you have a personal research assistant that knows every paper you’ve read.

Get Started

Install Save and start with the next arXiv paper you read. Over time, your Markdown research library compounds into something no generic AI can match.

Turn arXiv papers into a searchable, AI-readable knowledge base. Install Save --- free to start.

How to Convert arXiv Papers to Markdown for AI Research

Why Markdown for Research Papers?

How to Save arXiv Papers as Markdown

Method 1: Save Extension (Recommended)

Method 2: arXiv HTML + Save

Method 3: Semantic Scholar or Papers With Code

Building a Research Knowledge Base

The Karpathy Pattern

Get Started

## Continue reading

How to Save arXiv Papers as Markdown

How to Save Research Papers to Obsidian as Clean Markdown

Why Markdown is the Best Format for AI Prompts

How to Save a Claude Conversation as Markdown (Artifacts, Citations, Projects)

Jean-Sébastien Wallez