"Local AI" is a spectrum, not a checkbox
A vendor calling itself a "local AI assistant" can mean any of:
- The app runs locally, but every prompt is shipped to a hosted model.
- The app runs locally and uses a small on-device model for fast tasks, falling back to cloud for hard ones.
- Everything — including the model — runs on your machine. No network at all.
Each tier is a different product. Treating them as the same is how marketing pages get away with claims like "your data never leaves your device" while routing prompts to a cloud LLM.
Where Cloak sits
Cloak is tier 1 by default, with strong opt-in tier 2 paths:
- Local always: audio capture, screenshot capture, transcript storage, conversation history, settings, license keys.
- Network only when you ask: the actual LLM call goes to the provider you configure. Bring-Your-Own-Key sends your prompt direct to OpenAI / Anthropic / Google / Groq. Managed-tier sends through our Cloudflare Worker.
- Local speech-to-text option: local Whisper is a supported STT backend. With it enabled and BYOK off, you can run a meeting with zero outbound network traffic for transcription. Outbound only happens when you press the answer hotkey.
What "fully local" requires
Honest tier-3 local AI on consumer hardware in 2026 needs:
- A model small enough to fit in your GPU / unified memory (typically 4–32B parameters).
- Inference framework like Ollama, llama.cpp, or MLX.
- Acceptance that quality on the hardest tasks lags the frontier hosted models by ~12 months.
Cloak doesn't bundle its own local inference engine — that's a non-trivial product on its own —
but it does support local Whisper for transcription and is fully compatible with local model
servers that expose an OpenAI-compatible endpoint (Ollama's /v1/chat/completions,
LM Studio, vLLM). Point Cloak's "Custom" provider at http://localhost:11434/v1 and
you have a fully local pipeline.
The trade-off
Local inference is private and offline-capable, but slower and weaker. Hosted inference is fast and strong, but every prompt is a network call. The honest answer is: pick per task.
Cloak's Settings → Models pane lets you assign different providers to different intents — for example, local Whisper + local Llama for transcription and rough summaries, cloud GPT-5.x for the actual interview answer where latency and quality matter most.
What we don't claim
We don't claim Cloak is "fully local" or "offline AI". When you press the answer hotkey and you've configured a cloud provider, a network call happens. We make that explicit in Settings and in the Privacy Policy.
Get the local pieces
Download Cloak from the home page. The default settings will get you running against a cloud provider in two clicks; switch to local Whisper + a local model server in Settings → STT → Local and Settings → Models → Custom.