
Vordi
Voice to text that remembers.
What it is
A free, open-source macOS voice-typing app, my answer to WisprFlow and Superwhisper. Hold fn, speak into any app, and a Dynamic Notch streams your speech through Groq, OpenAI, or a local Whisper, then cleans and rewrites it before injecting the text. Pure Swift and SwiftUI, no Electron. It also keeps a private, on-device memory of everything you said. I have dictated 142,000 words through it.
System design
Local-first by default. Speech is routed to the right engine per language: Groq for fast English, OpenAI for Hinglish and multilingual, or a fully local Whisper, followed by an LLM cleanup and rewrite pass. A private, on-device memory built on SQLite FTS5 plus local embeddings powers a small RAG layer, so you can ask questions over everything you have ever dictated and get answers with cited sources. It is pure Swift and SwiftUI, no Electron; cloud keys are yours to bring, and local LLMs through LM Studio or Ollama are a first-class fallback rather than an afterthought.
- Swift
- SwiftUI
- Whisper / Groq
- OpenAI
- CoreML
- SQLite FTS5
- Local LLMs
What I got wrong, then fixed.
01 · the problem
Whisper is fast but English-only and hallucinates ghost phrases like 'thank you for watching' into silent audio. One transcription model could not cover Hinglish, fast English, and clean output at once.
what I did
Route Groq for fast English, OpenAI for multilingual and Hinglish, and run a hallucination guard that strips known phantom text before any cleanup pass touches it.
02 · the problem
Good cleanup needs context, but a dictation widget that steals keyboard focus or demands Accessibility permission up front is dead on arrival.
what I did
fn+key as the primary hotkey so it works without Accessibility, a Dynamic Island that morphs through listening and thinking states without taking focus, and every capture logged to an auditable run log.
03 · the problem
Memory is useless if you can only find things by exact words. Keyword search misses what you meant.
what I did
Hybrid retrieval on-device: SQLite FTS5 for keywords plus local embeddings for meaning, fused into a small RAG layer that answers questions over your own transcripts and cites the source.
04 · the problem
Cloud transcription means handing your voice to someone else. Local models mean slow.
what I did
You bring your own keys for the cloud engines, everything personal stays on-device by default, and local LLMs through LM Studio or Ollama are a first-class fallback, not an afterthought.