How to Save a YouTube Video as Markdown (Transcript, Summary, Timestamps)
YouTube doesn’t want you to leave with the content. There’s no export button, no transcript download, no “copy to notes” option. The closed-captions sidebar gives you raw subtitle blobs with no punctuation. If you’ve ever tried to drop a YouTube video into Claude or ChatGPT as context, you know the problem --- pasting the URL gives the model nothing, because the model can’t watch.
This guide covers every method to convert a YouTube video to clean Markdown --- from a single talk to a multi-hour podcast.
Why Save YouTube Videos as Markdown?
Markdown is the format that works wherever a transcript needs to go:
- Feed it to an LLM --- Claude, ChatGPT, Gemini, and local models all read Markdown natively as context
- Drop it into Obsidian or Notion --- one file, fully searchable, properly headed
- Quote a specific timestamp --- jumping back to “minute 34” in a 2-hour talk is one search away
- Archive a talk before it gets pulled --- channels get removed, videos get privated, your notes shouldn’t depend on YouTube’s uptime
- Translate a foreign-language video --- once it’s text, any translation tool works on it
The use case driving most YouTube-to-Markdown traffic in 2026 is the first one: people want to ask an LLM questions about a video they just watched, and pasting the URL doesn’t work.
Method 1: Save (Fastest, One Click)
Save is a Chrome extension that turns any YouTube page into a Markdown file with one click. It transcribes the audio with a Whisper-class model, runs a short cleanup pass, and produces something that actually reads like prose, not raw captions.
How it works:
- Open the YouTube video in Chrome
- Click the Save extension icon in your toolbar
- A
.mdfile downloads instantly (or lands in your Save Vault if connected)
What you get:
- AI-generated summary at the top so you can scan before reading
- Key points as a bullet list
- Full transcript with timestamps every few minutes
- Chapter headings when the video has them
- Frontmatter with title, channel, publish date, duration, and URL
- Speaker labels when there’s more than one voice
What gets removed:
- Recommended videos sidebar and YouTube nav chrome
- Ad breaks and sponsor segments inside the transcript
- Comments (unless explicitly opted in)
- Repeated caption artefacts from auto-generated subtitles
Best for: Researchers, AI users, students, podcast listeners. If you need a clean transcript that you’ll paste into Claude or read in Obsidian, this is the cleanest path.
Example Output
Saving a 60-minute Karpathy talk produces:
---
title: "[1hr Talk] Intro to Large Language Models"
channel: Andrej Karpathy
url: https://youtube.com/watch?v=zjkBMFhNj_g
duration: 60m
date: 2024-01-15
---
## Summary
Karpathy walks through what an LLM is at the level of bytes on a hard drive,
how training works in practice, and where the discipline is heading. The
core framing: LLMs are file compressors with a thinking layer on top, the
training stack is straightforward but the data work is brutal, and prompt
engineering is becoming software engineering.
## Key Points
- An LLM at rest is two files (parameters and run.c)
- Training is next-token prediction on the internet
- Fine-tuning is what makes models useful for a task
- Scaling laws still hold, but data quality matters more now
- Tool use is the next leap
## Full Transcript
[00:00] Hi everyone, so I've been wanting to do this talk for a while.
We have a lot of really exciting topics to cover...
[02:34] So let's start with what an LLM actually is, at the level of
bytes on a hard drive...
That file is one paste away from being usable Claude context, one keystroke away from being a permanent Obsidian note.
Method 2: YouTube’s Closed Captions (Free, Messy)
YouTube exposes auto-generated captions through the CC sidebar. You can extract them and reformat manually.
Steps:
- Open the video, click the
...menu, choose Open transcript - Copy the timestamped lines into a text editor
- Strip the timestamps, add punctuation, fix the speaker boundaries by hand
Problems with this approach:
- Auto-captions have no punctuation and no sentence boundaries
- Speaker changes aren’t marked at all
- Music, applause, and silence get represented as
[Music]/[Applause]artefacts - Long pauses and filler words (“um”, “uh”, “like”) aren’t stripped
- The output is rarely usable as LLM context without 30 minutes of cleanup
Workable for a 3-minute clip. Falls apart on anything longer.
Method 3: yt-dlp + Whisper Locally
For full control, you can run Whisper yourself on the audio.
yt-dlp -x --audio-format mp3 "https://youtube.com/watch?v=VIDEO_ID"
whisper VIDEO_ID.mp3 --model medium --output_format txt
Best for: Engineering teams transcribing at scale, or anyone running Whisper offline for privacy. Requires a Python environment, a few GB of disk for the model, and either a GPU or patience.
Problems with this approach:
- No summary, no key points, no clean structure --- just raw transcript text
- Speaker diarization needs a separate model (
pyannote.audioor similar) - Chapter markers from the YouTube page aren’t recovered
- Cleanup pass (punctuation, paragraphs, filler removal) is a separate step
This is the right method if you’re building a pipeline. It’s overkill for one video.
Method 4: Third-Party Transcription Services
Tools like Descript, Otter.ai, and Sonix can ingest a YouTube URL and produce a transcript.
Best for: Podcasters and content teams who also need editing, speaker identification, and team collaboration on the transcript.
Problems for the Markdown use case:
- Output is usually proprietary format (Descript project, Otter notes), not clean Markdown
- Most are paid services with per-minute fees that add up fast
- The transcript is rarely structured into summary + key points + body
- Designed for video editing workflows, not for feeding AI models
Which Method Should You Use?
| Scenario | Best Method |
|---|---|
| Paste a video into Claude or ChatGPT | Save --- one click, structured output |
| Save a podcast to read later | Save --- summary makes long content scannable |
| Quote a specific moment in a 2-hour talk | Save --- timestamps preserved |
| Build an internal transcription pipeline | yt-dlp + Whisper --- programmatic and offline |
| Transcribe for video editing | Descript or Otter --- designed for that workflow |
| Get a quick rough transcript of a 3-min clip | YouTube CC --- free, fast, messy |
For most people --- especially anyone using YouTube content as AI context --- Save is the answer. It produces the cleanest Markdown with zero setup, and it handles long-form video at the same speed as a tweet.
Edge Cases Save Handles
- Long videos (2 to 4 hours). Save splits the audio into chunks and re-stitches the transcript with continuous timestamps. The summary at the top is the key piece. Without it, no one’s reading 30,000 words.
- Multiple speakers. Whisper does basic diarization. Save adds speaker labels when there’s more than one voice. Not always perfect on interview shows with rapid back-and-forth, but usually right on podcasts and conference panels.
- Multilingual videos. If the audio is French, the transcript stays in French. No forced translation. If you want it in English, ask Claude to translate after.
- Auto-captions disabled. Doesn’t matter. Save transcribes the audio directly, doesn’t depend on YouTube’s CC track.
- Shorts. Same pipeline, just faster. Output is shorter but still has the metadata frontmatter and a summary.
- Restricted or member-only videos. Save sees what your logged-in browser sees. If you can watch it, Save can transcribe it.
- Live streams (after they end). Works on the archived VOD once YouTube finishes processing it. Live streams in progress aren’t supported.
Pair It With Your Workflow
The Markdown output works wherever you need it:
- Claude / ChatGPT / Gemini --- paste the file in, ask follow-up questions about the video
- Obsidian --- drop it in your vault, link it to related notes, search across all your saved talks
- Notion --- paste directly, headings and code blocks render correctly
- Apple Notes --- clean import via the Markdown share extension
- Save Vault --- if you’ve connected one, every YouTube save lands there automatically with backlinks and tags
FAQ
Does Save work on the YouTube mobile site or app? The extension is desktop Chrome only for now. On mobile, copy the URL and open it on desktop, or paste it into a Save Vault on Mac (which has a URL handler).
What about YouTube Music or playlists? Single videos only. Playlists aren’t crawled as one document. Music videos work, but the transcript is just the lyrics if there are any.
Can I get just the summary, without the full transcript? Yes. The extension lets you pick: transcript only, summary only, or both. Default is both, because both are short on most videos.
Does it preserve chapters? If the video has chapter markers, Save uses them as section headings in the transcript. Long videos become much easier to navigate.
Does the transcript include filler words? The cleanup pass removes most “um”, “uh”, and false starts. It keeps the speaker’s voice and tone, just stripped of the verbal noise that makes raw transcripts hard to read.
Is the transcript accurate enough to quote? For normal-paced speech, yes. For very technical content with rare proper nouns, double-check the spelling against the video. Save uses a Whisper-class model, which is state of the art for English and very good for most major languages.
How much does it cost? Save has a free tier so you can try it on a few videos. After that, a small subscription covers the transcription costs.
Related Save Guides
- Save Reddit Threads as Markdown --- threads with the comment nesting preserved
- Save ChatGPT Conversations as Markdown --- every turn, with code blocks intact
- Save GitHub Repos and Issues as Markdown --- README, issues, PR discussions, all as one file
- Save Notion Pages as Markdown --- toggles expanded, databases as tables
- Save Twitter / X Threads as Markdown --- every tweet, in order, with attribution
## Continue reading
How to Save a ChatGPT Conversation as Markdown (Every Turn, Code Blocks Intact)
Convert any ChatGPT conversation to clean Markdown: every turn, code blocks, tables, citations. Complete 2026 guide for researchers and AI users.
How to Save a Reddit Thread as Markdown (With Comments and Context)
Convert any Reddit thread to clean Markdown with nested comments, karma, flair, and OP markers preserved. Complete 2026 guide for researchers and AI users.
How to Save a Claude Conversation as Markdown (Artifacts, Citations, Projects)
Convert Claude conversations to clean Markdown: every turn, Artifacts as code blocks, citations preserved. Complete guide for researchers and AI users.
How to Save a Substack Post as Markdown (Paywall-Aware, No Cross-Promo)
Convert any Substack newsletter to clean Markdown: full body, pull-quotes, embedded audio, no subscribe-modals. Complete 2026 guide for researchers and AI users.
Written by
Jean-Sébastien Wallez
I've been making internet products for 10+ years. Built Save on weekends because I wanted my own reading library in clean markdown for Claude and Obsidian. Write here about web clipping, AI workflows, and the small things that make a personal knowledge base actually useful.