In playing with the exact same use case, I was blown away at how good Gemini (flash 2.5 IIRC) transcoded podcasts with speaker identification and handled common "overlaps" in conversations. I can't remember what local Ollama models I played with but was not very impressed.
MikeLuLu 4 days ago [-]
Yeah, Gemini is really strong at speaker separation and handling overlaps.
I’m taking a local-first approach (privacy, offline, no cost), using Faster-Whisper
BloodAndCode 5 days ago [-]
[dead]
Rendered at 01:57:00 GMT+0000 (Coordinated Universal Time) with Vercel.
I’m taking a local-first approach (privacy, offline, no cost), using Faster-Whisper