Prism MCP
The BCBA server is an AI-driven MCP platform combining real-time web search, enterprise AI services, data transformation, and optional persistent session memory.
Search & Discovery
Web search (
brave_web_search): Real-time searches via Brave Search API with pagination, filtering, and up to 20 results per requestLocal business search (
brave_local_search): Find nearby businesses with addresses, ratings, phone numbers, and hours; auto-falls back to web search if no local resultsAI-grounded answers (
brave_answers): Concise, direct answers grounded in live web results via Brave's AI GroundingEnterprise search: Domain-specific document retrieval via Vertex AI Discovery Engine, with a hybrid pipeline that combines and deduplicates web + curated results
Data Transformation
Code-mode search variants (
brave_web_search_code_mode,brave_local_search_code_mode): Run a search and immediately apply a custom JavaScript script (in a secure QuickJS sandbox) to extract only needed fields, reducing context window usage by 85–95%Universal transformer (
code_mode_transform): Apply custom JavaScript extraction to raw output from any MCP tool — useful for GitHub issues, DOM snapshots, transcripts, and more
AI Analysis & Orchestration
Research paper analysis (
gemini_research_paper_analysis): Deep academic analysis using Gemini 2.0 Flash — summaries, critiques, key findings, literature reviews, or comprehensive analysisMulti-model orchestration: Supports Google Gemini and Claude via Vertex AI infrastructure with secure Application Default Credentials (ADC)
Session Memory (optional, requires Supabase)
Save immutable session logs, update project state for continuity, progressively load prior context, search accumulated knowledge, and prune old memories
Integrations: Brave Search, Google Gemini, Vertex AI, Gmail, Chrome DevTools Protocol, and Supabase
Provides real-time web and local search capabilities, including AI-powered answers, to enhance model context.
Facilitates data extraction and automated pipeline processing through Gmail OAuth integration.
Orchestrates various Google ecosystem services, including Gemini and Gmail, for cross-platform data retrieval.
Leverages Vertex AI infrastructure, specifically Discovery Engine for enterprise search and managed generative model deployment.
Enables deep research paper analysis and structured data synthesis using the Google Gemini API.
Provides a session memory layer for progressive context loading, work ledgers, and persistent state handoffs via Supabase REST APIs.
Prism Coder
Give your AI agent memory that lasts. Persistent sessions, knowledge graphs, and offline tool-routing — fully local and free.
Prism Coder is an MCP server that gives Claude, Cursor, and other AI tools long-term memory that survives across sessions. It ships with the open-weight prism-coder model fleet (2B–32B) for fast, offline tool-routing — no cloud required.
No account needed. No API keys. Runs on your machine.
A paid subscription adds cloud sync, higher model tiers, and team features through the Synalux portal.
Quickstart
The free tier needs no account, no API key, and no cloud. Add the server to your MCP client:
{
"mcpServers": {
"prism": {
"command": "npx",
"args": ["-y", "prism-mcp-server"]
}
}
}Open Claude Desktop or Cursor and your agent now has memory backed by a local SQLite database (~/.prism-mcp/data.db).
Optional — local model fleet for offline tool-routing. Pull whichever fits your hardware:
ollama pull dcostenco/prism-coder:2b # 2.3 GB · mobile / lightweight (99.1% routing accuracy)
ollama pull dcostenco/prism-coder:4b # 3.4 GB · balanced (100% accuracy)
ollama pull dcostenco/prism-coder:14b # 8.4 GB · Mac default (100% accuracy)
ollama pull dcostenco/prism-coder:32b # 16 GB · complex tasks (100% accuracy)Prism detects both the namespaced (dcostenco/prism-coder:14b) and bare (prism-coder:14b) Ollama tags automatically.
Related MCP server: Knowledge Graph MCP Server
What it does
Your AI agent forgets everything between sessions. Prism fixes that — and adds verification, drift detection, and multi-agent coordination on top.
Mind Palace — persistent memory that survives across sessions
Every conversation feeds a persistent store. The next session loads the right context automatically — no re-explaining.
The dashboard shows your current project state, pending TODOs, intent health, and a neural knowledge graph — all built automatically from your agent sessions.
Knowledge Graph — semantic + keyword + graph search
Ask "what did I decide about the auth flow last month?" and get an answer with citations, combining vector similarity, full-text search, and graph traversal.
Session History — immutable audit trail
Every session is logged with files changed, decisions made, and TODOs. Search, filter, and replay any past session.
Session Drift Detection
Long agent sessions can wander from their original goal. session_detect_drift compares current work against the stated goal and returns on_track / minor_drift / major_drift so the agent can self-correct.
Behavioral Verification — catch bad edits before they happen
AI agents apply patterns from checklists without understanding the real-world impact. The verify_behavior tool challenges the agent with a scenario it must answer before editing — forcing it to think through what the end user will experience.
Agent: "I'll revert this kitchen display change"
Prism: "⚠️ Scenario: A cook sees a 3-item ticket. One item is voided.
What should the cook see after the void?"
Agent: "The ticket stays visible with the remaining 2 items."
Prism: "Correct — your revert would hide the ticket entirely."17 built-in domains (billing, auth, ordering, clinical, HR, and more). Custom domains per workspace on Enterprise. No hooks needed — works in any MCP client.
Time Travel
Roll back to any previous session state. Compare diffs between versions. Restore a known-good state with one click.
Cognitive Routing
Three memory types, automatically sorted: episodic (what happened — session logs, decisions), semantic (what's true — facts, architecture), and procedural (how to do X — workflows, patterns). When you search, the router picks the right store instead of dumping everything.
Multi-Agent Hivemind
Coordinate multiple AI agents working on the same project. Each agent has its own session, but they share memory through the knowledge graph. The Hivemind Radar shows real-time agent status, tasks, and activity.
Neural Search
Search across all memories with highlighted results, knowledge graph editing, and memory density metrics.
Local-first and privacy
The free tier runs entirely on your machine. Paid tiers add cloud sync through the Synalux portal, which is what enables cross-device memory and team sharing.
Local tier (free) | Cloud tier (paid) | |
Memory storage | Local SQLite | Synalux portal (Supabase-backed) |
Inference | Local Ollama models | Local models + cloud fallback |
API keys required | None | Synalux subscription key |
Web search / scrape | Not included | Via Synalux portal (provider keys server-side) |
What leaves your machine | Nothing | Memory text + file paths + search queries, sent to the portal over TLS (PHI-redacted before transit) |
Works offline | ✅ | Local features yes; sync/cloud no |
Handling sensitive data. All cloud writes pass through automatic redaction (SSNs, dates of birth, medical record numbers, phone numbers, emails, and clinical identifiers are stripped before transit). For regulated workloads, run the local tier for full air-gap, or use Enterprise which includes a HIPAA Business Associate Agreement.
Models
The prism-coder fleet uses Qwen3.5 for MCP tool-routing. The 14B and 32B are fine-tuned from Qwen3; the 2B and 4B slots use stock Qwen3.5-4B with prompt engineering at different quantization levels (100% routing accuracy without fine-tuning). They are not general-purpose chat models — they route reliably and run offline; Claude and other frontier models remain better at reasoning, coding, and open-domain work. The intended pattern is local routing with an optional cloud fallback for hard cases.
Model | Ollama tag | Size | BFCL Accuracy | Role | Tier |
Qwen3.5-4B Q3_K_M |
| 2.3 GB | 99.1% × 3 seeds | iPhone / mobile first gate | Free |
Qwen3.5-4B Q4_K_M |
| 3.4 GB | 100% × 3 seeds | Verifier + 8 GB+ devices | Free |
prism-coder:14b |
| 8.4 GB | 100% × 3 seeds | Default router | Standard+ |
prism-coder:32b |
| 16 GB | 100% × 3 seeds | Complex tasks | Advanced+ |
Weights: huggingface.co/dcostenco (public GGUF). Latency depends on model size and hardware — see Benchmarks to measure it on your own machine rather than trusting a printed number.
Cascade
query → prism-coder:14b (local router, Mac default)
→ qwen3.5:4b (grounding verifier)
→ prism-coder:2b (iPhone / mobile, auto-selected by RAM)
→ prism-coder:32b (complex tasks, on demand)
→ cloud fallback (paid tiers, for max quality)Benchmarks
Reproduce every number yourself. All evals are open-source and self-contained:
git clone https://github.com/dcostenco/prism-coder && cd prism-coder
pip install anthropic requests
python3 tests/benchmarks/prism-routing-100/benchmark.py --models 2b 4b 14b 32bRouting eval (115 cases, 12 categories, 3-seed mean). On this narrow tool-routing task all fleet models achieve near-perfect accuracy. Be honest with yourself about what that means: the eval is near-saturated for this taxonomy — it measures whether the right one of a small set of MCP tools is selected, not general capability. The useful takeaway is offline routing reliability at zero cost, not that a 2.3 GB model rivals a frontier model in general.
Model | Routing accuracy | Notes |
prism-coder:2b (Q3_K_M) | 99.1% × 3 seeds | 1 failure: regex→knowledge_search |
prism-coder:4b / 14b / 32b | 100% × 3 seeds | Perfect on all 115 cases |
Claude (frontier, same eval) | ~98% | Stronger everywhere outside this narrow task |
Memory uplift (LoCoMo-Plus, self-published). A separate long-context dialogue benchmark (dcostenco/Locomo-Plus) measures how much structured memory helps a base model retain multi-day context. Results show large gains when a model is paired with Prism memory versus running raw. Note this benchmark is authored, run, and LLM-judged by this project — treat it as a reproducible demonstration, not an independent third-party result, and run it yourself with the commands in that repo.
Why Prism Coder
vs AI coding assistants
These tables are the maintainer's assessment as of June 2026. Verify claims that matter to you — products change fast.
Feature | Prism Coder | GitHub Copilot | Cursor | Windsurf | Amazon Q | Devin |
Local inference (open-weight) | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
Works fully offline | ✅ (free tier) | ❌ | ❌ | ❌ | ❌ | ❌ |
Persistent cross-session memory | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
Session drift detection | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
L3 grounding verifier | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
Behavioral verification (pre-edit) | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
MCP server (tools + memory) | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
Web IDE | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
VS Code extension | ✅ | ✅ | — | — | ✅ | ❌ |
Flat-rate team pricing | ✅ | ❌ (per-seat) | ❌ (per-seat) | ❌ | ❌ | ❌ |
HIPAA BAA available | ✅ (Enterprise) | ❌ | ❌ | ❌ | ❌ | ❌ |
vs local AI / memory tools
Feature | Prism Coder | Ollama | LM Studio | Mem0 | Zep |
Local inference cascade | ✅ | ✅ | ✅ | ❌ | ❌ |
Cloud fallback | ✅ | ❌ | ❌ | ❌ | ❌ |
Persistent cross-session memory | ✅ | ❌ | ❌ | ✅ | ✅ |
Knowledge ingestion (MCP + webhook) | ✅ | ❌ | ❌ | ❌ | ❌ |
Cognitive routing (3-store) | ✅ | ❌ | ❌ | ❌ | ❌ |
Session drift detection | ✅ | ❌ | ❌ | ❌ | ❌ |
Native MCP server | ✅ | ❌ | ❌ | ❌ | ❌ |
Web IDE + VS Code extension | ✅ | ❌ | ❌ | ❌ | ❌ |
Pricing — flat-rate, not per-seat
Prism Coder | GitHub Copilot | Cursor | Amazon Q | |
Individual | $19/mo | $10/mo | $20/mo | $19/mo |
Team (5 devs) | $49/mo flat | $95/mo | $200/mo | $95/mo |
Enterprise (25 devs) | $99/mo flat | $195/mo | $1,000/mo | Custom |
Plans
All on-device models are free to run locally via Ollama on every tier. A subscription gates cloud features, higher model ceilings, and increased limits. Local model ceilings are advisory — on-device models run on your Ollama regardless of plan; the ceiling gates cloud inference and prism_infer routing.
Free | Standard $19/mo | Advanced $49/mo | Enterprise $99/mo | |
Seats | 1 | 1 | up to 5 | up to 25 |
Local model ceiling | up to 4b | up to 14b | up to 32b | up to 32b |
Daily cloud inference | -- | 200 | 2,000 | 100,000 |
Cloud Coder (Web IDE) | -- | 100/day | 1,000/day | 100,000/day |
Cloud search | -- | 50/day | 500/day | 100,000/day |
Max output tokens | 512 | 1,024 | 2,048 | 4,096 |
Cloud fallback | -- | Claude Sonnet 4 | Claude Sonnet 4 | Priority + Sonnet 4 |
Grounding verifier (fact-check AI output) | -- | ✅ | ✅ | ✅ |
Memory sync (cloud) | -- | ✅ | ✅ | ✅ |
Knowledge / session memory | limited | unlimited | unlimited | unlimited |
Analytics dashboard | -- | ✅ | ✅ | ✅ |
HIPAA BAA | -- | -- | -- | ✅ |
14-day free trial on paid plans. Pricing | 25+ seats: contact sales
How agents use it
Prism exposes 40+ MCP tools. The core memory loop:
Tool | What it does |
| Recover the prior session's state on boot |
| Append an immutable session log entry |
| Save live state for the next session |
| Semantic + keyword search over all memories |
| Natural-language Q&A over the memory store |
| Detect when a session has drifted from its goal |
| Pre-edit scenario challenge — catch bad changes before they happen |
| Teach Prism a codebase or document |
Full TypeScript signatures live in src/tools/; architecture in docs/ARCHITECTURE.md.
The LLM context window is treated as ephemeral scratch space; durable state lives in the persistent store (SQLite locally, the portal in the cloud). Every session begins with a mandatory session_load_context call, so the agent is oriented before it writes a response. When a project exceeds a threshold (default 50 entries), session_compact_ledger summarizes old entries into a rollup, soft-archives the originals, and links them in the graph. See docs/COMPACTION.md
CLI
prism load <project> # load session context
prism save # save ledger + handoff
prism search <query> # search code across repos (exact / regex / symbol / semantic)
prism review <files...> # AI code review — security, performance, style
prism scan <files...> # security scan — secrets, licenses, Dockerfile
prism push # push local SQLite to the cloud backend
prism register-models # alias dcostenco/prism-coder:* -> prism-coder:*prism search — semantic code search
prism review — AI code review with HIPAA checks
prism scan — security scanner for secrets, Dockerfiles, licenses
Companions
Prism works alongside these tools — use whichever fits your workflow.
Web IDE — Prism Coder
A browser-based IDE at synalux.ai/coder. Import any GitHub repo and get:
Monaco editor with multi-tab, split view, syntax highlighting, and VS Code keybindings
In-browser Node.js via WebContainer (your code runs in the browser sandbox, not on a server)
Integrated terminal — WebContainer shell in-browser; optional server PTY via WebSocket when connected to a dev server
AI Agent Mode — describe a task and the agent creates files, runs type-checks, and verifies
Source control — commit, branch, push/pull, stash, blame, tag management
Live Share — real-time collaborative editing with session links
Node.js debugger via Chrome DevTools Protocol
Tasks runner (VS Code
tasks.jsoncompatible), Problems panel (Monaco diagnostics)12-language i18n — full UI localization
Standard+ plans get cloud AI and higher rate limits. Free tier works with local Ollama. Code execution uses the in-browser WebContainer by default; Live Share and the optional PTY terminal connect to external servers when explicitly enabled.
VS Code Extension — Synalux
Memory-augmented AI inside VS Code with clinical practice management features. Install from the marketplace:
code --install-extension synalux-ai.synaluxAI chat, voice input, SOAP note generator, team collaboration, and video calls — all inside VS Code. Routes through local Ollama by default; cloud on paid tiers.
AI: Chat participant (
@synalux), multi-agent pipeline, voice input, model switching, 10 tonesClinical: SOAP note generator, role-based access, document signing, patient board
Collaboration: Team chat, DMs, video calls, customer board, visual builder, DevContainers
Privacy: Local Ollama by default.
preferLocal=truetries local first. Enterprise BAA available.
Prism AAC
Communication app for non-speaking users, powered by the on-device prism-coder fleet for phrase prediction. macOS / iOS / web.
See github.com/dcostenco/prism-aac
Self-hosting (Enterprise)
Run the full model stack on your own hardware — no cloud, full data sovereignty.
Requirements: Mac M2 Pro+ (48 GB recommended) or Linux + NVIDIA GPU, plus Ollama.
ollama pull dcostenco/prism-coder:14b # default router
export LOCAL_LLM_URL=http://localhost:11434Routing is automatic: 14b → 4b → cloud fallback on desktop/server, 2b → cloud fallback on mobile/iPhone. For iOS or another machine on the same network, run OLLAMA_HOST=0.0.0.0 ollama serve and point LOCAL_LLM_URL at the host's IP.
Configuration reference
Variable | Purpose | Default |
|
|
|
| Paid-tier portal key ( | -- (local if unset) |
| Ollama endpoint |
|
| Force local SQLite regardless of credentials |
|
With no variables set, Prism runs fully local. Set PRISM_SYNALUX_API_KEY (and leave PRISM_STORAGE=auto) to use the cloud backend.
Testing
npm test # full suite (vitest)
npm test -- --coverage # coverage reportCoverage spans HRR retrieval, knowledge ingestion, the inference cascade and grounding verifier, compaction, the model picker, and storage round-trips.
Migration: local to cloud
To move free-tier history into the paid portal:
node scripts/migrate-local-to-portal.mjs --dry-run # preview, no network
PRISM_SYNALUX_API_KEY=synalux_sk_... \
node scripts/migrate-local-to-portal.mjs # push ledger + handoffsIt reads ~/.prism-mcp/data.db and POSTs entries to the portal. Ledger entries are append-only and de-duped server-side; handoffs use last-write-wins per project. Re-running on the same DB is safe. This is a one-shot migration, not a sync daemon — after it, set PRISM_STORAGE=synalux (or leave it on auto).
License
Product | License |
prism-mcp-server (this repo) | |
VS Code extension (synalux-ai.synalux) | BSL-1.1 |
Web IDE (synalux.ai/coder) | Synalux Terms of Service |
Prism AAC | AGPL-3.0 |
The AGPL-3.0 license covers the MCP server and its source code. The VS Code extension and Web IDE are separate products with their own licenses. Commercial hosted/managed deployment of the MCP server is available via the Synalux subscription.
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/dcostenco/prism-coder'
If you have feedback or need assistance with the MCP directory API, please join our Discord server