We beat Whisper Large v3 on LibriSpeech with a 634 MB model running entirely on Apple Silicon — open source Swift library

www.reddit.com

We beat Whisper Large v3 on LibriSpeech with a 634 MB model running entirely on Apple Silicon — open source Swift library

www.reddit.com

eifachposteMB to AI (Reddit RSS)English · 6 hours ago

Original Reddit post

We’ve been building speech-swift, an open-source Swift library for on-device speech AI, and just published benchmarks that surprised us. Two architectures beat Whisper Large v3 (FP16) on LibriSpeech test-clean — for completely different reasons: Qwen3-ASR (audio language model — Qwen3 LLM as the ASR decoder) hits 2.35% WER at 1.7B 8-bit, running on MLX at 40x real-time Parakeet TDT (non-autoregressive transducer) hits 2.74% WER in 634 MB as a CoreML model on the Neural Engine No API. No Python. No audio leaves your Mac. Native Swift async/await. Full article with architecture breakdown, multilingual benchmarks, and how to reproduce: https://blog.ivan.digital/we-beat-whisper-large-v3-with-a-600m-model-running-entirely-on-your-mac-20e6ce191174 Library: github.com/soniqo/speech-swift submitted by /u/ivan_digital

Originally posted by u/ivan_digital on r/ArtificialInteligence

You must log in or # to comment.

Chat