Recently, I was working on a similar project and I found that grabbing the transcripts quickly leads to your IP being blocked for the transcripts.
I ended up doing the same as this person, downloading the MP4s and then transcribing myself. I was assuming it was some sort of anti LLM scraper feature they put in place.
Has anyone used this --write-auto-subs flag and not been flagged after doing 20 or so videos?
hamiecod 5 hours ago [-]
—-write-auto-subs gets your IP banned for 12/24 hours if you download video subtitles in bulk but if the subtitles are downloaded with sufficient time gap in between, the ban is not triggered.
My startup has to utilize youtube transcriptions so we just subscribe to a youtube transcriptor api hosted on rapidapi that downloads subtitles. 1$ per 1000 reqs. Pretty cheap
MysticOracle 3 hours ago [-]
Yep, this happened to me & got IP banned for a day.
thangalin 1 hours ago [-]
systemctl start tor
yt-dlp --proxy socks5://127.0.0.1:9050 --write-subs --write-auto-subs --skip-download [URL]
Unless you fetch directly from your browser. It works by getting the YouTube json including the captions track. And then you get the baseUrl to download the xml.
I wrote this webapp that uses this method: it calls Gemini in the background to polish the raw transcript and produce a much better version with punctuation and paragraphs.
It's a good call out. I leverage yt-dlp as a library for downstream tooling (archival of media to long term storage repositories), and always recommend folks rely on yt-dlp whenever possible due to the ecosystem of folks grinding to keep their extractors current. Their maintainers are both helpful and responsive.
(with that said, I do not want to diminish OP's work in any way; great job! "What I cannot build, I do not understand" - Feynman)
paulirish 8 hours ago [-]
Same, yup. OP is indeed already using yt-dlp for the video download. (Then Whisper for transcribing, Ollama/lmstudio/OpenAI for summarizing)
hiAndrewQuinn 5 hours ago [-]
Minus the summarization, that is the same pipeline I use in [1] for generating listening practice Anki flashcards for foreign language students. It surprised me that nobody had really built out a program I could find around yt-dlp and Whisper for this kind of use case even a few years after it came out.
I've found the YT transcripts to be severely lacking sometimes, in accuracy and features. Especially speaker identification is really useful if you want to e.g. summarize podcasts or interviews, so if this project here delivers on that then it's definitely better than the YT transcripts.
paulirish 7 hours ago [-]
An approach I've been using recently is to rely on pyannote/tinydiarize only for the speaker_turn timestamps, but prefer the larger model (or in this case YT's autotranscript) for the actual text.
I’ve had some success with running them through another LLM to have it clean up the transcription errors based on the context. But this obviously does nothing for speaker identitication.
rpastuszak 7 hours ago [-]
IIRC YT also has a "private" API you can call directly (or via an npm package: youtube-transcribe).
Youtube already offers AI transcriptions on their site. As another commenter points out, you grab them with yt-dlp.
And unlike how your tool will be supported in the future, thousands of users make sure yt-dlp keeps working as google keep changing the site (currently 1459 contributors).
swyx 7 hours ago [-]
if you used this in earnest sufficiently, you'd know yt default transcripts are not good enough because youtube often (ok say 5% of time) fails to transcribe videos particularly livestreams and shortly after release.
retranscribing is necessary and important part of the creator toolset.
passivegains 7 hours ago [-]
the volunteer open source effort behind youtube-dl and its forks/descendants are so impressive in large part because of how many features they provide and thus have to maintain:
https://github.com/yt-dlp/yt-dlp#usage-and-options
this tool won't provide the list of available thumbnails or settings for HTTP buffer size, but I think that's a pretty reasonable tradeoff.
I'd be really curious to see some sort of benchmark / evaluation of these context resources against the same coding tasks. Right now, the instructions all sound so prescriptive and authoritative, yet is really hard to evaluation their effectiveness.
eigenvalue 6 hours ago [-]
I made a tool like this a while ago which was useful for transcribing a whole playlist automatically using whisper:
Many channels I follow, such as Vlad Vexler, have taken measures so you can't download the transcript with yt-dlp. Furthermore, they don't provide a transcipt option on their videos. I assume this is to prevent people from just reading AI summaries, which is annoying in Vexler's case because he talks slowly and meanders around. If I really want to hear his point but don't want to listen to that then I download the video with yt-dlp and use Whisper to transcribe it.
Bluestein 3 hours ago [-]
... the ... slower ... the guy the ... less ... content ... and ... more ... advertising.-
How did you get around youtube blocking cloud IP ranges? Are you suing residential proxies?
93po 5 hours ago [-]
bookmarked, thanks, the top google search results always require sign-up. frustrating state of the internet
cmaury 8 hours ago [-]
Thanks for sharing. This is exactly the type of utility that vibecoding is for. It takes 5 secons to ask GPT to write a scripr to do this tailored to your specific use case. It's way faster than trying to get someone elses repo up and running.
On this note, is Ytube also the best transcriber of foreign languages or is there something better?
mikeve 8 hours ago [-]
Interesting project! I've been working on a project in this space myself (WaveMemo)
I must say, speaker diarization is surprisingly tricky to do. The most common approach seems to be to use pyannote, but the quality is not amazing...
ethan_smith 7 hours ago [-]
For better diarization quality than pyannote, check out Whisper-DiarizationX which combines Whisper with ECAPA-TDNN speaker embeddings and spectral clustering.
8 hours ago [-]
lpeancovschi 5 hours ago [-]
Youtube's T&C don't allow downloading youtube audio/video. How do other services get away with it?
nadermx 3 hours ago [-]
"The court held that merely clicking on a download button does not show consent with license terms, if those terms were not conspicuous and if it was not explicit to the consumer that clicking meant agreeing to the license."
I'm not a lawyer but I think even if you offset the legal responsibilities to the user by alerting them with copyrights prompt it's still illegal to download youtube videos.
nadermx 2 hours ago [-]
United States v. Auernheimer, 748 F.3d 525 (3d Cir. 2014). Specifically, on page 12, footnote 5, the court states:
“We also note that in order to be guilty of accessing ‘without authorization, or in excess of authorization’ under New Jersey law, the Government needed to prove that Auernheimer or Spitler circumvented a code- or password-based barrier to access... The account slurper simply accessed the publicly facing portion of the login screen and scraped information that AT&T unintentionally published.”
MysticOracle 3 hours ago [-]
I think they use rotating IP/Proxy services
lpeancovschi 3 hours ago [-]
Might be, but I think google would still be able to chase them down.
5 hours ago [-]
8 hours ago [-]
manishsharan 4 hours ago [-]
Will this make Google mad at me and cancel/freeze all my Google services ?
Rendered at 22:42:48 GMT+0000 (Coordinated Universal Time) with Vercel.
yt-dlp --write-auto-subs --skip-download "https://www.youtube.com/watch?v=7xTGNNLPyMI"
I ended up doing the same as this person, downloading the MP4s and then transcribing myself. I was assuming it was some sort of anti LLM scraper feature they put in place.
Has anyone used this --write-auto-subs flag and not been flagged after doing 20 or so videos?
My startup has to utilize youtube transcriptions so we just subscribe to a youtube transcriptor api hosted on rapidapi that downloads subtitles. 1$ per 1000 reqs. Pretty cheap
I wrote this webapp that uses this method: it calls Gemini in the background to polish the raw transcript and produce a much better version with punctuation and paragraphs.
https://www.appblit.com/scribe
Open source with code to see how to fetch from YouTube servers from the browser https://ldenoue.github.io/readabletranscripts/
(with that said, I do not want to diminish OP's work in any way; great job! "What I cannot build, I do not understand" - Feynman)
[1]: https://github.com/hiAndrewQuinn/audio2anki
(I'm using it in https://butter.sonnet.io)
https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2
For Apple Silicon (MLX) https://huggingface.co/senstella/parakeet-tdt-0.6b-v2-mlx
And unlike how your tool will be supported in the future, thousands of users make sure yt-dlp keeps working as google keep changing the site (currently 1459 contributors).
youtube also blocks transcript exports for some things like https://youtubetranscript.com/
retranscribing is necessary and important part of the creator toolset.
I'd be really curious to see some sort of benchmark / evaluation of these context resources against the same coding tasks. Right now, the instructions all sound so prescriptive and authoritative, yet is really hard to evaluation their effectiveness.
https://github.com/Dicklesworthstone/bulk_transcribe_youtube...
I ended up turning a beefed up version of it which makes polished written documents from the raw transcript, you can try it at
https://youtubetranscriptoptimizer.com/
https://old.reddit.com/r/ChatGPTCoding/comments/1lusr07/self...
Gonna be lots of posts of selfware like that soon.
And, yes, indeed, AI-coding is order-of-magnitude having an effect along the lines that "low-code" was treading ...
... also, for less-capable coders or "borderline" coders the effort/benefit equation has radically shifted.-
- This python one is more amenable to modding into your own custom tool: https://hw.leftium.com/#/item/44353447
- Another bash script: https://hw.leftium.com/#/item/41473379
---
They all seem to be built on top of:
- yt-dlp to download video
- whisper for transcription
- ffmpeg for audio/video extraction/processing
I must say, speaker diarization is surprisingly tricky to do. The most common approach seems to be to use pyannote, but the quality is not amazing...
https://en.m.wikipedia.org/wiki/Specht_v._Netscape_Communica...
“We also note that in order to be guilty of accessing ‘without authorization, or in excess of authorization’ under New Jersey law, the Government needed to prove that Auernheimer or Spitler circumvented a code- or password-based barrier to access... The account slurper simply accessed the publicly facing portion of the login screen and scraped information that AT&T unintentionally published.”