How Interview Copilots Work

Four jobs, four hard problems

Strip away marketing and an interview copilot is four pipelines that have to run in lockstep with sub-second latency: capture, transcribe, route, and render. Each one has a non-obvious failure mode.

1. Capture

You need both sides of the call. On macOS that means:

System audio: what the interviewer is saying, coming out of your speakers or headphones. Captured via CoreAudio tap (macOS 14.4+) or ScreenCaptureKit audio.
Microphone: your own voice, for context and for marking when the interviewer is speaking vs you.
Screen: the editor / question / slides currently visible.

Failure mode: Electron-based overlays often go through web audio APIs that fight with the meeting tool. Cloak uses the Rust cpal crate for microphone and a native CoreAudio tap for system audio, so neither competes with Zoom for the device.

2. Transcribe

Latency budget for live transcription is ~250 ms per segment. Anything slower and the answer arrives after you've started talking.

Real-world choices:

OpenAI Whisper hosted via OpenAI — accurate, ~400–600 ms latency on short clips. Default in Cloak.
Groq-hosted Whisper — same accuracy, ~150 ms latency. Best option when available.
ElevenLabs Scribe — strong diarization, ~250 ms.
Local whisper.cpp — slower (1–2s) but offline.

Cloak's STT layer is pluggable so you can pick per-task. The transcript pipeline also runs a rolling de-duplication pass (the kind of thing where "I think I think" becomes "I think") because real-world STT stutters on word boundaries.

3. Route

"Route" is the unsexy heart of an interview copilot. The user pressed a hotkey. What does the system send to the model?

The last N seconds of transcript (typically 30–60).
An optional screenshot encoded as base64.
A system prompt corresponding to the active persona.
Resume + JD as a tool-injected context block (Pro feature).
An intent classifier output: was that a question, a follow-up, or thinking aloud?

Get the routing wrong and the model answers the wrong question. Cloak's intent classifier is a fast small-model call that runs in parallel with the main model call. If intent is "thinking aloud" the main call gets cancelled.

4. Render

The overlay window has to be:

Always on top.
Non-capturable by screen sharing (this is the whole point).
Streaming tokens at ~30 fps so reading feels live.
Resizable based on content without flicker or scroll-jump.

On macOS the non-capturable property is a real OS-level guarantee. You set sharingType: .none on an NSPanel via tauri-nspanel and the window server itself filters the surface out of every capture API. There is no CSS / JavaScript way to fake this on Windows or Linux — which is why Cloak is macOS only on purpose.

The streaming token render is harder than it looks. Markdown reflow during stream causes layout thrash; framer-motion exit animations during the same period cause the window to "remember" a stale measured size. Cloak's useWindow hook measures the live content root with a ResizeObserver and dispatches Tauri IPC resize calls clamped to OVERLAY_MAX_HEIGHT so the overlay never grows past the user's screen.

The thing you can't engineer around

All this delivers you an answer. It cannot deliver you composure. Interview copilots that promise "auto-answer" are lying — there is no way for an external tool to inhabit your voice, your body language, or your willingness to admit you don't know something. The best overlays keep you sharp and let your real ability come through faster, not pretend to replace it.

Built into Cloak

Every architectural choice above is in Cloak's source on GitHub. If you want to see exactly how an interview copilot is built — read the source. If you want to use one — download Cloak.

How Interview Copilots Work

Four jobs, four hard problems

1. Capture

2. Transcribe

3. Route

4. Render

The thing you can't engineer around

Built into Cloak

How to install Cloak

Extract the ZIP

Move to Applications

macOS security check

One-line fix (if blocked)

On Windows or Linux?

Windows

Linux

All versions & platforms

macOS

Linux