How to think about the stack
"AI tools" is too broad to be useful as a category. A more honest cut is to split by what they sit on top of: code, conversation, screen, voice, data, or judgment. Most production AI workflows in 2026 use one tool from each.
Coding copilots (IDE-resident)
The mature category. Live inline completion, refactor, chat, agent loop, multi-file edit.
- Cursor — the agent-IDE leader. Strong for new code and multi-file refactors.
- GitHub Copilot — enterprise default; deeply integrated with GitHub PR flow.
- Cody (Sourcegraph) — best for very large monorepos with cross-repo context.
- Continue.dev — open-source, supports local models out of the box.
Conversational copilots (overlay-resident)
The newer category Cloak lives in. Lives outside any single app, listens to meetings, reads your screen, answers on demand.
- Cloak — open-source, macOS, screenshot-proof, local-first capture, BYOK or managed tier.
- Cluely — proprietary, cross-platform, focus on interviews and sales.
- Granola / Fireflies / Otter — meeting transcription + summary, not live answer.
Voice and STT
- OpenAI Whisper — the most-used STT model. Runs locally via
whisper.cppor hosted via the OpenAI API. - ElevenLabs Scribe — fast hosted STT with strong speaker diarization.
- Groq Whisper — Whisper on Groq's LPU, often the fastest hosted option.
- Google Cloud Speech-to-Text — the enterprise default; best for long-tail languages.
Agent frameworks and orchestration
- OpenAI Agents SDK — the simplest production agent loop for OpenAI models.
- LangGraph — graph-based orchestration with state and replay.
- Vercel AI SDK + Workflow DevKit — agent loops with durable execution for web apps.
- Anthropic's MCP — Model Context Protocol; the rising standard for letting models call external tools.
Foundation models worth knowing
The mid-2026 frontier — pick based on the task profile, not loyalty:
- OpenAI GPT-5 series — strong general reasoning; the codex variants lead coding benchmarks.
- Anthropic Claude 4.5/4.6 Sonnet & Opus — strong long-context reasoning and writing; best for structured generation.
- Google Gemini 3.x Pro — best in class for multimodal (vision + audio).
- Groq-hosted open weights (Llama, Qwen) — the speed tier when latency matters more than capability.
- Local Qwen-Coder, DeepSeek-Coder, Llama-3-Code — viable for offline / private workloads on M-series Macs.
Evaluation and observability
- Braintrust, Langfuse, Arize Phoenix — trace, eval, and regression test LLM pipelines.
- PromptLayer — version control for prompts.
- Vercel Agent — anomaly investigation and PR review.
Where Cloak fits in this stack
Cloak is the overlay layer — the always-on conversational copilot that sits across every app and call. It is complementary to, not a replacement for:
- Your IDE copilot (Cursor / Copilot) — they edit code; Cloak helps you talk about it.
- Your meeting note-taker (Granola / Fireflies) — they summarize after; Cloak answers during.
- Your foundation model — Cloak is BYOM; it routes to whichever model fits.
Try the overlay layer
Download Cloak from the home page and plug it into the rest of your stack.