I built a menu bar app for macOS that does STT and TTS. There are many such apps, but what frustrated me was that most only transcribe microphone input.
SpeechDock can capture and transcribe system-wide audio or audio from a specific app — video calls, online lectures, podcasts, anything your Mac can play. It also does the reverse: select any text on screen (or capture it via OCR) and have it read aloud.
Key points:
- Works out of the box with macOS native STT/TTS — no API keys needed - Optionally connect OpenAI, Gemini, ElevenLabs, or Grok for higher accuracy - Real-time subtitle overlay with translation (80+ languages) - Global hotkeys — use from anywhere without switching apps - AppleScript support for automation - Open source (Apache 2.0)
Requires macOS 14+. Built with Swift/SwiftUI.
GitHub: https://github.com/yohasebe/speechdock Docs: https://yohasebe.github.io/speechdock/