How to Save a YouTube Video as Markdown (Transcript, Summary, Timestamps)

YouTube doesn’t want you to leave with the content. There’s no export button, no transcript download, no “copy to notes” option. The closed-captions sidebar gives you raw subtitle blobs with no punctuation. If you’ve ever tried to drop a YouTube video into Claude or ChatGPT as context, you know the problem --- pasting the URL gives the model nothing, because the model can’t watch.

This guide covers every method to convert a YouTube video to clean Markdown --- from a single talk to a multi-hour podcast.

Why Save YouTube Videos as Markdown?

Markdown is the format that works wherever a transcript needs to go:

Feed it to an LLM --- Claude, ChatGPT, Gemini, and local models all read Markdown natively as context
Drop it into Obsidian or Notion --- one file, fully searchable, properly headed
Quote a specific timestamp --- jumping back to “minute 34” in a 2-hour talk is one search away
Archive a talk before it gets pulled --- channels get removed, videos get privated, your notes shouldn’t depend on YouTube’s uptime
Translate a foreign-language video --- once it’s text, any translation tool works on it

The use case driving most YouTube-to-Markdown traffic in 2026 is the first one: people want to ask an LLM questions about a video they just watched, and pasting the URL doesn’t work.

Method 1: Save (Fastest, One Click)

Save is a Chrome extension that turns any YouTube page into a Markdown file with one click. It transcribes the audio with a Whisper-class model, runs a short cleanup pass, and produces something that actually reads like prose, not raw captions.

How it works:

Open the YouTube video in Chrome
Click the Save extension icon in your toolbar
A .md file downloads instantly (or lands in your Save Vault if connected)

What you get:

AI-generated summary at the top so you can scan before reading
Key points as a bullet list
Full transcript with timestamps every few minutes
Chapter headings when the video has them
Frontmatter with title, channel, publish date, duration, and URL
Speaker labels when there’s more than one voice

What gets removed:

Recommended videos sidebar and YouTube nav chrome
Ad breaks and sponsor segments inside the transcript
Comments (unless explicitly opted in)
Repeated caption artefacts from auto-generated subtitles

Best for: Researchers, AI users, students, podcast listeners. If you need a clean transcript that you’ll paste into Claude or read in Obsidian, this is the cleanest path.

Example Output

Saving a 60-minute Karpathy talk produces:

---
title: "[1hr Talk] Intro to Large Language Models"
channel: Andrej Karpathy
url: https://youtube.com/watch?v=zjkBMFhNj_g
duration: 60m
date: 2024-01-15
---

## Summary

Karpathy walks through what an LLM is at the level of bytes on a hard drive,
how training works in practice, and where the discipline is heading. The
core framing: LLMs are file compressors with a thinking layer on top, the
training stack is straightforward but the data work is brutal, and prompt
engineering is becoming software engineering.

## Key Points

- An LLM at rest is two files (parameters and run.c)
- Training is next-token prediction on the internet
- Fine-tuning is what makes models useful for a task
- Scaling laws still hold, but data quality matters more now
- Tool use is the next leap

## Full Transcript

[00:00] Hi everyone, so I've been wanting to do this talk for a while.
We have a lot of really exciting topics to cover...

[02:34] So let's start with what an LLM actually is, at the level of
bytes on a hard drive...

That file is one paste away from being usable Claude context, one keystroke away from being a permanent Obsidian note.

Method 2: YouTube’s Closed Captions (Free, Messy)

YouTube exposes auto-generated captions through the CC sidebar. You can extract them and reformat manually.

Steps:

Open the video, click the ... menu, choose Open transcript
Copy the timestamped lines into a text editor
Strip the timestamps, add punctuation, fix the speaker boundaries by hand

Problems with this approach:

Auto-captions have no punctuation and no sentence boundaries
Speaker changes aren’t marked at all
Music, applause, and silence get represented as [Music] / [Applause] artefacts
Long pauses and filler words (“um”, “uh”, “like”) aren’t stripped
The output is rarely usable as LLM context without 30 minutes of cleanup

Workable for a 3-minute clip. Falls apart on anything longer.

Method 3: yt-dlp + Whisper Locally

For full control, you can run Whisper yourself on the audio.

yt-dlp -x --audio-format mp3 "https://youtube.com/watch?v=VIDEO_ID"
whisper VIDEO_ID.mp3 --model medium --output_format txt

Best for: Engineering teams transcribing at scale, or anyone running Whisper offline for privacy. Requires a Python environment, a few GB of disk for the model, and either a GPU or patience.

Problems with this approach:

No summary, no key points, no clean structure --- just raw transcript text
Speaker diarization needs a separate model (pyannote.audio or similar)
Chapter markers from the YouTube page aren’t recovered
Cleanup pass (punctuation, paragraphs, filler removal) is a separate step

This is the right method if you’re building a pipeline. It’s overkill for one video.

Method 4: Third-Party Transcription Services

Tools like Descript, Otter.ai, and Sonix can ingest a YouTube URL and produce a transcript.

Best for: Podcasters and content teams who also need editing, speaker identification, and team collaboration on the transcript.

Problems for the Markdown use case:

Output is usually proprietary format (Descript project, Otter notes), not clean Markdown
Most are paid services with per-minute fees that add up fast
The transcript is rarely structured into summary + key points + body
Designed for video editing workflows, not for feeding AI models

Which Method Should You Use?

Scenario	Best Method
Paste a video into Claude or ChatGPT	Save --- one click, structured output
Save a podcast to read later	Save --- summary makes long content scannable
Quote a specific moment in a 2-hour talk	Save --- timestamps preserved
Build an internal transcription pipeline	yt-dlp + Whisper --- programmatic and offline
Transcribe for video editing	Descript or Otter --- designed for that workflow
Get a quick rough transcript of a 3-min clip	YouTube CC --- free, fast, messy

For most people --- especially anyone using YouTube content as AI context --- Save is the answer. It produces the cleanest Markdown with zero setup, and it handles long-form video at the same speed as a tweet.

Edge Cases Save Handles

Long videos (2 to 4 hours). Save splits the audio into chunks and re-stitches the transcript with continuous timestamps. The summary at the top is the key piece. Without it, no one’s reading 30,000 words.
Multiple speakers. Whisper does basic diarization. Save adds speaker labels when there’s more than one voice. Not always perfect on interview shows with rapid back-and-forth, but usually right on podcasts and conference panels.
Multilingual videos. If the audio is French, the transcript stays in French. No forced translation. If you want it in English, ask Claude to translate after.
Auto-captions disabled. Doesn’t matter. Save transcribes the audio directly, doesn’t depend on YouTube’s CC track.
Shorts. Same pipeline, just faster. Output is shorter but still has the metadata frontmatter and a summary.
Restricted or member-only videos. Save sees what your logged-in browser sees. If you can watch it, Save can transcribe it.
Live streams (after they end). Works on the archived VOD once YouTube finishes processing it. Live streams in progress aren’t supported.

Pair It With Your Workflow

The Markdown output works wherever you need it:

Claude / ChatGPT / Gemini --- paste the file in, ask follow-up questions about the video
Obsidian --- drop it in your vault, link it to related notes, search across all your saved talks
Notion --- paste directly, headings and code blocks render correctly
Apple Notes --- clean import via the Markdown share extension
Save Vault --- if you’ve connected one, every YouTube save lands there automatically with backlinks and tags

FAQ

Does Save work on the YouTube mobile site or app? The extension is desktop Chrome only for now. On mobile, copy the URL and open it on desktop, or paste it into a Save Vault on Mac (which has a URL handler).

What about YouTube Music or playlists? Single videos only. Playlists aren’t crawled as one document. Music videos work, but the transcript is just the lyrics if there are any.

Can I get just the summary, without the full transcript? Yes. The extension lets you pick: transcript only, summary only, or both. Default is both, because both are short on most videos.

Does it preserve chapters? If the video has chapter markers, Save uses them as section headings in the transcript. Long videos become much easier to navigate.

Does the transcript include filler words? The cleanup pass removes most “um”, “uh”, and false starts. It keeps the speaker’s voice and tone, just stripped of the verbal noise that makes raw transcripts hard to read.

Is the transcript accurate enough to quote? For normal-paced speech, yes. For very technical content with rare proper nouns, double-check the spelling against the video. Save uses a Whisper-class model, which is state of the art for English and very good for most major languages.

How much does it cost? Save has a free tier so you can try it on a few videos. After that, a small subscription covers the transcription costs.

Save Reddit Threads as Markdown --- threads with the comment nesting preserved
Save ChatGPT Conversations as Markdown --- every turn, with code blocks intact
Save GitHub Repos and Issues as Markdown --- README, issues, PR discussions, all as one file
Save Notion Pages as Markdown --- toggles expanded, databases as tables
Save Twitter / X Threads as Markdown --- every tweet, in order, with attribution

How to Save a YouTube Video as Markdown (Transcript, Summary, Timestamps)

Why Save YouTube Videos as Markdown?

Method 1: Save (Fastest, One Click)

Example Output

Method 2: YouTube’s Closed Captions (Free, Messy)

Method 3: yt-dlp + Whisper Locally

Method 4: Third-Party Transcription Services

Which Method Should You Use?

Edge Cases Save Handles

Pair It With Your Workflow

FAQ

## Continue reading

How to Save a ChatGPT Conversation as Markdown (Every Turn, Code Blocks Intact)

How to Save a Reddit Thread as Markdown (With Comments and Context)

How to Save a Claude Conversation as Markdown (Artifacts, Citations, Projects)

How to Save a Substack Post as Markdown (Paywall-Aware, No Cross-Promo)

Jean-Sébastien Wallez

Why Save YouTube Videos as Markdown?

Method 1: Save (Fastest, One Click)

Example Output

Method 2: YouTube’s Closed Captions (Free, Messy)

Method 3: yt-dlp + Whisper Locally

Method 4: Third-Party Transcription Services

Which Method Should You Use?

Edge Cases Save Handles

Pair It With Your Workflow

FAQ

Related Save Guides

## Continue reading

How to Save a ChatGPT Conversation as Markdown (Every Turn, Code Blocks Intact)

How to Save a Reddit Thread as Markdown (With Comments and Context)

How to Save a Claude Conversation as Markdown (Artifacts, Citations, Projects)

How to Save a Substack Post as Markdown (Paywall-Aware, No Cross-Promo)

Jean-Sébastien Wallez