Back to blog
Deep Dive·Jan 2, 2026·2 min read

The Architecture Behind Whiskers' On-Device Speech Engine

A technical look at how we deliver fast, accurate transcription without touching the cloud.

Whiskers runs entirely on your Mac. No cloud, no API calls, no network dependency. Here's how we built a speech engine that's fast enough for real-time dictation while running locally on Apple Silicon.

The model: Parakeet

At the core of Whiskers is a Parakeet-based speech recognition model, optimized for Apple's Neural Engine via CoreML. Parakeet is an encoder-decoder architecture that converts mel spectrograms into text tokens. We chose it for its balance of accuracy and speed on consumer hardware.

The model runs in 16-bit precision on the Neural Engine, achieving faster-than-real-time processing on all Apple Silicon Macs. An M1 MacBook Air processes audio at roughly 3x real-time speed. Newer chips are faster still.

The pipeline

Audio flows through four stages:

  1. Capture — CoreAudio captures microphone input at 16kHz mono. A ring buffer holds the last few seconds of audio for processing.
  2. VAD — Voice Activity Detection filters silence and background noise before it reaches the model. This saves processing time and reduces hallucinations on quiet segments.
  3. Transcription — The Parakeet model processes audio chunks and produces partial transcripts in real-time. Final transcripts are assembled from these partials with beam search refinement.
  4. Enhancement — Optionally, the raw transcript is sent to a local or remote LLM for grammar correction, punctuation, and formatting.

Memory management

Running a 600MB model in memory while keeping the rest of the system responsive requires careful resource management. We use lazy loading—the model isn't loaded until the first transcription starts—and aggressive memory reclamation between sessions.

On machines with 8GB of RAM, Whiskers typically uses 400–800MB during active transcription and drops to under 50MB when idle.

What's next

We're exploring smaller, faster models that maintain accuracy while reducing memory footprint. The goal is to make Whiskers feel instant on every Mac, not just the high-end ones.

C
Christian EckenrodeFounder & Lead Engineer
Share

Related articles