Resource

Local AI Assistant

In 2026 'local AI' means different things to different vendors. Here is how Cloak defines it — and what we don't pretend to be.

"Local AI" is a spectrum, not a checkbox

A vendor calling itself a "local AI assistant" can mean any of:

  1. The app runs locally, but every prompt is shipped to a hosted model.
  2. The app runs locally and uses a small on-device model for fast tasks, falling back to cloud for hard ones.
  3. Everything — including the model — runs on your machine. No network at all.

Each tier is a different product. Treating them as the same is how marketing pages get away with claims like "your data never leaves your device" while routing prompts to a cloud LLM.

Where Cloak sits

Cloak is tier 1 by default, with strong opt-in tier 2 paths:

  • Local always: audio capture, screenshot capture, transcript storage, conversation history, settings, license keys.
  • Network only when you ask: the actual LLM call goes to the provider you configure. Bring-Your-Own-Key sends your prompt direct to OpenAI / Anthropic / Google / Groq. Managed-tier sends through our Cloudflare Worker.
  • Local speech-to-text option: local Whisper is a supported STT backend. With it enabled and BYOK off, you can run a meeting with zero outbound network traffic for transcription. Outbound only happens when you press the answer hotkey.

What "fully local" requires

Honest tier-3 local AI on consumer hardware in 2026 needs:

  • A model small enough to fit in your GPU / unified memory (typically 4–32B parameters).
  • Inference framework like Ollama, llama.cpp, or MLX.
  • Acceptance that quality on the hardest tasks lags the frontier hosted models by ~12 months.

Cloak doesn't bundle its own local inference engine — that's a non-trivial product on its own — but it does support local Whisper for transcription and is fully compatible with local model servers that expose an OpenAI-compatible endpoint (Ollama's /v1/chat/completions, LM Studio, vLLM). Point Cloak's "Custom" provider at http://localhost:11434/v1 and you have a fully local pipeline.

The trade-off

Local inference is private and offline-capable, but slower and weaker. Hosted inference is fast and strong, but every prompt is a network call. The honest answer is: pick per task.

Cloak's Settings → Models pane lets you assign different providers to different intents — for example, local Whisper + local Llama for transcription and rough summaries, cloud GPT-5.x for the actual interview answer where latency and quality matter most.

What we don't claim

We don't claim Cloak is "fully local" or "offline AI". When you press the answer hotkey and you've configured a cloud provider, a network call happens. We make that explicit in Settings and in the Privacy Policy.

Get the local pieces

Download Cloak from the home page. The default settings will get you running against a cloud provider in two clicks; switch to local Whisper + a local model server in Settings → STT → Local and Settings → Models → Custom.

How to install Cloak

macOS · 4 quick steps

  1. 1

    Extract the ZIP

    Open Cloak.zip from your Downloads folder. Double-clicking it will extract automatically.

  2. 2

    Move to Applications

    Drag Cloak.app into your /Applications folder.

  3. 3

    macOS security check

    macOS may warn that it can't verify the developer. This is normal for unsigned indie apps — it's not malware.

    "Cloak.app" can't be opened

    Apple cannot check it for malicious software.
    This item is on the disk image.

    Cancel
    OK

    If you see this, use the fix in Step 4 below — it removes the quarantine flag instantly.

  4. 4

    One-line fix (if blocked)

    Open Terminal (press ⌘ Space, type "Terminal"), paste this command and hit Return:

    Terminal — zsh
    $ xattr -cr /Applications/Cloak.app

    This removes the quarantine attribute macOS attaches to downloaded files. Cloak's source is open source — inspect it any time.

Need help? Open an issue on GitHub →