How to Save a YouTube Video as Markdown (Transcript, Summary, Timestamps)

·

YouTube doesn’t want you to leave with the content. There’s no export button, no transcript download, no “copy to notes” option. The closed-captions sidebar gives you raw subtitle blobs with no punctuation. If you’ve ever tried to drop a YouTube video into Claude or ChatGPT as context, you know the problem --- pasting the URL gives the model nothing, because the model can’t watch.

This guide covers every method to convert a YouTube video to clean Markdown --- from a single talk to a multi-hour podcast.

Why Save YouTube Videos as Markdown?

Markdown is the format that works wherever a transcript needs to go:

  • Feed it to an LLM --- Claude, ChatGPT, Gemini, and local models all read Markdown natively as context
  • Drop it into Obsidian or Notion --- one file, fully searchable, properly headed
  • Quote a specific timestamp --- jumping back to “minute 34” in a 2-hour talk is one search away
  • Archive a talk before it gets pulled --- channels get removed, videos get privated, your notes shouldn’t depend on YouTube’s uptime
  • Translate a foreign-language video --- once it’s text, any translation tool works on it

The use case driving most YouTube-to-Markdown traffic in 2026 is the first one: people want to ask an LLM questions about a video they just watched, and pasting the URL doesn’t work.

Method 1: Save (Fastest, One Click)

Save is a Chrome extension that turns any YouTube page into a Markdown file with one click. It transcribes the audio with a Whisper-class model, runs a short cleanup pass, and produces something that actually reads like prose, not raw captions.

How it works:

  1. Open the YouTube video in Chrome
  2. Click the Save extension icon in your toolbar
  3. A .md file downloads instantly (or lands in your Save Vault if connected)

What you get:

  • AI-generated summary at the top so you can scan before reading
  • Key points as a bullet list
  • Full transcript with timestamps every few minutes
  • Chapter headings when the video has them
  • Frontmatter with title, channel, publish date, duration, and URL
  • Speaker labels when there’s more than one voice

What gets removed:

  • Recommended videos sidebar and YouTube nav chrome
  • Ad breaks and sponsor segments inside the transcript
  • Comments (unless explicitly opted in)
  • Repeated caption artefacts from auto-generated subtitles

Best for: Researchers, AI users, students, podcast listeners. If you need a clean transcript that you’ll paste into Claude or read in Obsidian, this is the cleanest path.

Example Output

Saving a 60-minute Karpathy talk produces:

---
title: "[1hr Talk] Intro to Large Language Models"
channel: Andrej Karpathy
url: https://youtube.com/watch?v=zjkBMFhNj_g
duration: 60m
date: 2024-01-15
---

## Summary

Karpathy walks through what an LLM is at the level of bytes on a hard drive,
how training works in practice, and where the discipline is heading. The
core framing: LLMs are file compressors with a thinking layer on top, the
training stack is straightforward but the data work is brutal, and prompt
engineering is becoming software engineering.

## Key Points

- An LLM at rest is two files (parameters and run.c)
- Training is next-token prediction on the internet
- Fine-tuning is what makes models useful for a task
- Scaling laws still hold, but data quality matters more now
- Tool use is the next leap

## Full Transcript

[00:00] Hi everyone, so I've been wanting to do this talk for a while.
We have a lot of really exciting topics to cover...

[02:34] So let's start with what an LLM actually is, at the level of
bytes on a hard drive...

That file is one paste away from being usable Claude context, one keystroke away from being a permanent Obsidian note.

Method 2: YouTube’s Closed Captions (Free, Messy)

YouTube exposes auto-generated captions through the CC sidebar. You can extract them and reformat manually.

Steps:

  1. Open the video, click the ... menu, choose Open transcript
  2. Copy the timestamped lines into a text editor
  3. Strip the timestamps, add punctuation, fix the speaker boundaries by hand

Problems with this approach:

  • Auto-captions have no punctuation and no sentence boundaries
  • Speaker changes aren’t marked at all
  • Music, applause, and silence get represented as [Music] / [Applause] artefacts
  • Long pauses and filler words (“um”, “uh”, “like”) aren’t stripped
  • The output is rarely usable as LLM context without 30 minutes of cleanup

Workable for a 3-minute clip. Falls apart on anything longer.

Method 3: yt-dlp + Whisper Locally

For full control, you can run Whisper yourself on the audio.

yt-dlp -x --audio-format mp3 "https://youtube.com/watch?v=VIDEO_ID"
whisper VIDEO_ID.mp3 --model medium --output_format txt

Best for: Engineering teams transcribing at scale, or anyone running Whisper offline for privacy. Requires a Python environment, a few GB of disk for the model, and either a GPU or patience.

Problems with this approach:

  • No summary, no key points, no clean structure --- just raw transcript text
  • Speaker diarization needs a separate model (pyannote.audio or similar)
  • Chapter markers from the YouTube page aren’t recovered
  • Cleanup pass (punctuation, paragraphs, filler removal) is a separate step

This is the right method if you’re building a pipeline. It’s overkill for one video.

Method 4: Third-Party Transcription Services

Tools like Descript, Otter.ai, and Sonix can ingest a YouTube URL and produce a transcript.

Best for: Podcasters and content teams who also need editing, speaker identification, and team collaboration on the transcript.

Problems for the Markdown use case:

  • Output is usually proprietary format (Descript project, Otter notes), not clean Markdown
  • Most are paid services with per-minute fees that add up fast
  • The transcript is rarely structured into summary + key points + body
  • Designed for video editing workflows, not for feeding AI models

Which Method Should You Use?

ScenarioBest Method
Paste a video into Claude or ChatGPTSave --- one click, structured output
Save a podcast to read laterSave --- summary makes long content scannable
Quote a specific moment in a 2-hour talkSave --- timestamps preserved
Build an internal transcription pipelineyt-dlp + Whisper --- programmatic and offline
Transcribe for video editingDescript or Otter --- designed for that workflow
Get a quick rough transcript of a 3-min clipYouTube CC --- free, fast, messy

For most people --- especially anyone using YouTube content as AI context --- Save is the answer. It produces the cleanest Markdown with zero setup, and it handles long-form video at the same speed as a tweet.

Edge Cases Save Handles

  • Long videos (2 to 4 hours). Save splits the audio into chunks and re-stitches the transcript with continuous timestamps. The summary at the top is the key piece. Without it, no one’s reading 30,000 words.
  • Multiple speakers. Whisper does basic diarization. Save adds speaker labels when there’s more than one voice. Not always perfect on interview shows with rapid back-and-forth, but usually right on podcasts and conference panels.
  • Multilingual videos. If the audio is French, the transcript stays in French. No forced translation. If you want it in English, ask Claude to translate after.
  • Auto-captions disabled. Doesn’t matter. Save transcribes the audio directly, doesn’t depend on YouTube’s CC track.
  • Shorts. Same pipeline, just faster. Output is shorter but still has the metadata frontmatter and a summary.
  • Restricted or member-only videos. Save sees what your logged-in browser sees. If you can watch it, Save can transcribe it.
  • Live streams (after they end). Works on the archived VOD once YouTube finishes processing it. Live streams in progress aren’t supported.

Pair It With Your Workflow

The Markdown output works wherever you need it:

  • Claude / ChatGPT / Gemini --- paste the file in, ask follow-up questions about the video
  • Obsidian --- drop it in your vault, link it to related notes, search across all your saved talks
  • Notion --- paste directly, headings and code blocks render correctly
  • Apple Notes --- clean import via the Markdown share extension
  • Save Vault --- if you’ve connected one, every YouTube save lands there automatically with backlinks and tags

FAQ

Does Save work on the YouTube mobile site or app? The extension is desktop Chrome only for now. On mobile, copy the URL and open it on desktop, or paste it into a Save Vault on Mac (which has a URL handler).

What about YouTube Music or playlists? Single videos only. Playlists aren’t crawled as one document. Music videos work, but the transcript is just the lyrics if there are any.

Can I get just the summary, without the full transcript? Yes. The extension lets you pick: transcript only, summary only, or both. Default is both, because both are short on most videos.

Does it preserve chapters? If the video has chapter markers, Save uses them as section headings in the transcript. Long videos become much easier to navigate.

Does the transcript include filler words? The cleanup pass removes most “um”, “uh”, and false starts. It keeps the speaker’s voice and tone, just stripped of the verbal noise that makes raw transcripts hard to read.

Is the transcript accurate enough to quote? For normal-paced speech, yes. For very technical content with rare proper nouns, double-check the spelling against the video. Save uses a Whisper-class model, which is state of the art for English and very good for most major languages.

How much does it cost? Save has a free tier so you can try it on a few videos. After that, a small subscription covers the transcription costs.

## Continue reading

Jean-Sébastien Wallez

Written by

Jean-Sébastien Wallez

I've been making internet products for 10+ years. Built Save on weekends because I wanted my own reading library in clean markdown for Claude and Obsidian. Write here about web clipping, AI workflows, and the small things that make a personal knowledge base actually useful.