Prism MCP
The BCBA server is an AI-driven MCP platform combining real-time web search, enterprise AI services, data transformation, and optional persistent session memory.
Search & Discovery
Web search (
brave_web_search): Real-time searches via Brave Search API with pagination, filtering, and up to 20 results per requestLocal business search (
brave_local_search): Find nearby businesses with addresses, ratings, phone numbers, and hours; auto-falls back to web search if no local resultsAI-grounded answers (
brave_answers): Concise, direct answers grounded in live web results via Brave's AI GroundingEnterprise search: Domain-specific document retrieval via Vertex AI Discovery Engine, with a hybrid pipeline that combines and deduplicates web + curated results
Data Transformation
Code-mode search variants (
brave_web_search_code_mode,brave_local_search_code_mode): Run a search and immediately apply a custom JavaScript script (in a secure QuickJS sandbox) to extract only needed fields, reducing context window usage by 85–95%Universal transformer (
code_mode_transform): Apply custom JavaScript extraction to raw output from any MCP tool — useful for GitHub issues, DOM snapshots, transcripts, and more
AI Analysis & Orchestration
Research paper analysis (
gemini_research_paper_analysis): Deep academic analysis using Gemini 2.0 Flash — summaries, critiques, key findings, literature reviews, or comprehensive analysisMulti-model orchestration: Supports Google Gemini and Claude via Vertex AI infrastructure with secure Application Default Credentials (ADC)
Session Memory (optional, requires Supabase)
Save immutable session logs, update project state for continuity, progressively load prior context, search accumulated knowledge, and prune old memories
Integrations: Brave Search, Google Gemini, Vertex AI, Gmail, Chrome DevTools Protocol, and Supabase
Provides real-time web and local search capabilities, including AI-powered answers, to enhance model context.
Facilitates data extraction and automated pipeline processing through Gmail OAuth integration.
Orchestrates various Google ecosystem services, including Gemini and Gmail, for cross-platform data retrieval.
Leverages Vertex AI infrastructure, specifically Discovery Engine for enterprise search and managed generative model deployment.
Enables deep research paper analysis and structured data synthesis using the Google Gemini API.
Provides a session memory layer for progressive context loading, work ledgers, and persistent state handoffs via Supabase REST APIs.
🧠 Prism Coder
🌐 Read in your language: 🇬🇧 English · 🇪🇸 Español · 🇫🇷 Français · 🇵🇹 Português · 🇷🇴 Română · 🇺🇦 Українська · 🇷🇺 Русский · 🇩🇪 Deutsch · 🇯🇵 日本語 · 🇰🇷 한국어 · 🇨🇳 中文 · 🇸🇦 العربية
Persistent memory + tool-calling intelligence for AI agents. (formerly Prism MCP)
A Model Context Protocol server that gives Claude, Cursor, and other AI tools a Mind Palace — long-term memory that survives across sessions, with semantic search, cognitive routing, a visual dashboard, and the open-weights prism-coder:7b / prism-coder:14b LLM fleet for offline tool-calling (BFCL Gold Certified, 100 % JSON validity).
Renamed in v14.0.0: the project is now Prism Coder to cover both the Mind Palace memory server and the
prism-coder:7b/prism-coder:14bLLM fleet on HuggingFace + Ollama. The npm package staysprism-mcp-serverso existing install URLs andmcp.jsonentries keep working — theprism-coderbinary has been the canonical entry point since v12.
What Prism Coder does
💾 Your AI remembers across sessions
Every conversation feeds the Mind Palace. Next session, your AI agent loads the right context automatically — no re-explaining.
🔍 Semantic search over your history
Ask "what did I decide about the auth flow last month?" and get the answer with citations. Vector search + keyword + graph traversal.
🧬 Cognitive routing
Different memory types live in different stores: episodic (what happened), semantic (what's true), procedural (how to do X). The router picks where to store and where to retrieve.
🔄 Proactive session drift detection (new in v15)
Your AI agent can now detect when it has drifted from your original goals — mid-session, automatically — and self-correct before you notice the problem.
Three direct Prism calls:
session_save_ledger— snapshot current statesession_cognitive_route— compare current work against original goals, returnson_track / minor_drift / major_driftsession_compact_ledger— if drifted, compress and reload only what matters
When major drift is detected, the alert routes to the Synalux portal so it's visible across sessions and devices — not just in the current conversation.
Real example it caught: A training session promised BFCL ≥90% for three AI models. The agent spent 3 hours debugging audio bugs instead. The drift check surfaced: "Training goal unmet. Layer3 corpus missing from all training sets. 0 BFCL scores measured." The session immediately re-aligned.
No scripts. No cron. No hooks. Three tool calls, Prism handles the rest.
🛡 Local-first
Free tier runs entirely on your machine — SQLite, local embedding model, no API keys, no cloud. Paid tier adds cloud sync via Synalux portal.
⚡ Zero-search retrieval
Holographic Reduced Representations (HRR) for instant similarity lookups without an index. ~5ms over 100K memories.
🌐 Multi-agent Hivemind
Multiple AI agents share the same Mind Palace. Each agent has a role (dev / qa / pm / etc.) and sees scoped context. Heartbeat + roster for coordination.
Get started
# Install globally
npm install -g prism-mcp-server
# Or use npx (no install)
npx prism-mcp-serverAdd to Claude Desktop / Cursor config:
{
"mcpServers": {
"prism": {
"command": "npx",
"args": ["-y", "prism-mcp-server"]
}
}
}That's it. Open Claude / Cursor and your AI now has memory.
More setup details in docs/SETUP_GEMINI.md.
How AI agents use it
Tool | What it does |
| Recover prior session's state on boot |
| Append immutable session log entry |
| Save live state for the next session |
| Semantic + keyword search over all memories |
| Natural-language Q&A over your Mind Palace |
| Pull people / projects / decisions from text |
| Auto-link related memories into a graph |
(35+ tools total — full TypeScript signatures in src/tools/. Architecture overview in docs/ARCHITECTURE.md.)
The LLM context window is treated as ephemeral scratch space. All durable state lives in Prism's persistent store (SQLite / Supabase). Context compaction is a non-event.
Boot protocol — every session (including post-compaction) begins with a mandatory session_load_context call, enforced via CLAUDE.md. The agent is fully oriented before writing a single byte of response.
Two persistent stores:
session_save_ledger— immutable append-only work log (decisions, files changed, summaries)session_save_handoff— versioned live-state snapshot (current task, TODOs, open context)
Ledger compaction (session_compact_ledger) — when a project exceeds a threshold (default: 50 entries), Prism summarizes old entries via LLM into a rollup row, soft-archives originals, and links them via spawned_from graph edges. Runs on a 12-hour background scheduler.
→ Full details: docs/COMPACTION.md
Models
All Prism Coder inference uses only fine-tuned Prism Coder models — no Claude, no Gemini, no OpenRouter fallbacks. Models are exclusively accessible through the Synalux router (authentication + subscription required).
Model | Where | Tier | Latency |
Qwen3-1.7B (fine-tuned) | On-device — iOS CoreML / Android ONNX | Free | ~50ms offline |
Qwen3-14B (fine-tuned) | RunPod A100 via Synalux | Standard+ | ~200ms |
QwQ-32B (fine-tuned) | RunPod A100 80GB via Synalux | Pro/Enterprise | ~3–5s |
Qwen3-30B-A3B (fine-tuned MoE) | RunPod via Synalux | Enterprise | ~2–3s |
Fine-tuned on the 3-layer corpus: AAC + BFCL tool-calling + clinical workflows. BFCL gate: ≥ 90% on all tiers before production promotion. Adapters stored at dcostenco/prism-coder-* (private HuggingFace).
Eval results (private, May 2026):
Model | Tool-call accuracy | Gate | Status |
Qwen3-1.7B (on-device) | 90.0% (27/30) | ≥90% | ✅ Passed |
Qwen3-14B (cloud) | pending | ≥90% | ⏳ Training |
QwQ-32B (reasoning) | pending | ≥90% | ⏳ Training |
Qwen3-30B-A3B (MoE) | pending | ≥90% | ⏳ Training |
Eval methodology: 30 natural-language tool-call prompts, exact tool name + valid JSON argument structure required. Private eval — model weights never leave Synalux infrastructure.
Plans
Free | Standard $19/mo | Pro $49/mo | Enterprise $99/mo | |
Qwen3-1.7B on-device | ✅ unlimited | ✅ | ✅ | ✅ |
Qwen3-14B cloud | — | ✅ 200 req/day | ✅ 2K req/day | ✅ unlimited |
QwQ-32B reasoning | — | — | ✅ | ✅ priority |
Qwen3-30B-A3B MoE | — | — | — | ✅ |
Custom fine-tuning | — | — | — | ✅ |
HIPAA BAA | — | — | — | ✅ |
What you can build with it
Persistent coding assistant that remembers your codebase, your decisions, your team's conventions
Research agent that builds knowledge over time — Auto-Scholar pipeline ingests papers / docs and synthesizes
Clinical scribe that retains patient context across visits (HIPAA-compliant cloud + local)
Customer support agent that learns from every ticket
Writing assistant that knows your voice, your prior drafts, and what you've already published
Companions
Synalux — VS Code Extension
Memory-augmented AI inside VS Code, backed by Prism. 20 multimodal tools, multi-agent orchestration, 12-language support. Works offline (Ollama) or cloud (OpenRouter). HIPAA-compliant healthcare workflows.
# Install from terminal
code --install-extension synalux-ai.synaluxOr open VS Code → Extensions (⇧⌘X) → search "Synalux" → Install.
PrismAAC
AAC communication app for non-speaking users. Powered by Prism's spreading-activation phrase ranking + on-device 7B model. macOS / iOS / Android via web. → github.com/dcostenco/prism-aac
🆕 Prism as Foundation (v14.0.0)
As of v14.0.0, Prism's algorithm exports are a stable public contract under SemVer. External systems can port actrActivation.ts (ACT-R cognitive decay), spreadingActivation.ts (the 0.7 similarity + 0.3 activation hybrid score), routerExperience.ts (experience bias with MIN_SAMPLES=5 cold-start gate), compactionHandler.ts (the 25KB prompt-budget cap), and graphMetrics.ts (warning ratios) with citations and pin a Prism version.
Reference consumers
Consumer | What it uses from Prism |
ACT-R decay ( | |
Spreading-activation phrase ranking (recency × frequency × per-user history). Caregiver corrections auto-harvest into the personalization corpus via the audit-hooks postflight harvester. The on-device 7B model + this algorithm stack is what makes PrismAAC defensible. | |
Synalux portal | Tier-aware model routing using experience bias on prior outcomes per fingerprint. HIPAA-compliant clinical scribe with on-device-first privacy guarantees. |
Production Infrastructure (v16)
Architecture
CLIENTS
┌─────────────────────┐ ┌─────────────────────────────┐
│ prism-aac (iOS/web)│ │ Claude Code · Cursor · IDE │
│ Vercel │ │ MCP config → Railway URL │
└──────────┬──────────┘ └─────────────┬───────────────┘
│ inference │ memory
▼ ▼
┌──────────────────────┐ ┌─────────────────────────────┐
│ SYNALUX ROUTER │ │ prism-mcp SERVER │
│ Vercel │ │ │
│ │ │ Primary — Railway │
│ • JWT auth │ │ Standby — Fly.io │
│ • complexity route │ │ Fallback — Supabase REST │
│ • tier enforcement │ │ │
│ • proxy to RunPod │ │ auto-failover chain │
└──────────┬───────────┘ └─────────────┬───────────────┘
│ │
▼ ▼
┌──────────────────────┐ ┌─────────────────────────────┐
│ RUNPOD SERVERLESS │ │ SUPABASE │
│ │ │ session ledgers │
│ Qwen3-14B ~200ms │ │ knowledge graph │
│ Qwen3-30B ~500ms │ │ handoffs & todos │
│ QwQ-32B ~3-5s │ │ │
│ │ │ source of truth │
└──────────┬───────────┘ └─────────────────────────────┘
│
▼
┌──────────────────────┐
│ ON-DEVICE │
│ Qwen3-1.7B Q4_K_M │
│ iOS CoreML/Android │
│ ~50ms · offline │
└──────────────────────┘Synalux Inference Router — Architecture (v16)
All Prism AAC model inference is protected behind Synalux as a mandatory router. Models are never accessible directly — all traffic goes through Synalux for auth, billing, and rate limiting.
┌─────────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
│ prism-aac (iOS/web) │ Synalux Portal │
└──────────────┬──────────────────────────────────────────────┘
│ POST /api/v1/prism-aac/inference
│ Authorization: Bearer <user-JWT>
▼
┌─────────────────────────────────────────────────────────────┐
│ SYNALUX ROUTER │
│ 1. Verify JWT (no anonymous access) │
│ 2. Check subscription tier │
│ 3. Enforce rate limit (50–2000 req/day by plan) │
│ 4. Route to model tier by complexity │
│ 5. Proxy → RunPod with SECRET key (never sent to client) │
│ 6. Log → aac_inference_log (billing audit trail) │
└──────────┬─────────────────────────────────────┬────────────┘
│ tier=fast │ tier=reason
▼ ▼
┌──────────────────┐ ┌───────────────────────┐
│ Qwen3-14B │ │ QwQ-32B │
│ RunPod A100 40G │ │ RunPod A100 80G │
│ ~200ms │ │ ~3–5s (reasoning) │
│ standard/pro │ │ pro/enterprise only │
└──────────────────┘ └───────────────────────┘
│ │
└────────────────┬─────────────────────┘
▼
HuggingFace dcostenco/prism-coder-* (private)
RunPod pulls at pod start with server-side token
On-device (free, zero latency, offline):
Qwen3-1.7B GGUF Q4_K_M → iOS CoreML / Android ONNXPlan | Cloud model | Daily limit | On-device |
Free | — | unlimited local | Qwen3-1.7B |
Standard $5/mo | Qwen3-14B | 200 req | + cloud |
Pro $15/mo | QwQ-32B | 2,000 req | + reasoning |
Enterprise | QwQ-32B priority | unlimited | full stack |
See docs/WOW_FEATURES.md for the algorithm catalogue. Release notes in docs/releases/v14.0.0-prism-as-foundation.md.
Detailed docs in this repo:
docs/ARCHITECTURE.md— system architecture, memory routing, HRRdocs/COMPACTION.md— how Prism handles LLM context compaction and ledger compactiondocs/SETUP_GEMINI.md— Gemini configurationdocs/self-improving-agent.md— adversarial eval / anti-sycophancydocs/rfcs/— design RFCsdocs/releases/— per-version release notesCHANGELOG.md— version history (v12.5 Unified Billing, v11.6 Hivemind, v11.5.1 Auto-Scholar, etc.)CONTRIBUTING.md— contributor guide
The original 1933-line README is preserved in git history. To browse the prior version (full feature catalog, Cognitive Architecture v7.8, Autonomous Cognitive OS v9.0, HRR Zero-Search, Adversarial Evaluation walkthroughs, Universal Import patterns, competitive analysis vs LangMem/MemGPT/Letta/Zep, v12.5 Unified Billing details, v11.6 Hivemind, v11.5.1 Auto-Scholar): git show HEAD~1:README.md.
License
BUSL-1.1 — Business Source License. Free for non-production use. Production use requires a Synalux subscription or commercial license. After 2 years, converts to MIT.
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/dcostenco/prism-coder'
If you have feedback or need assistance with the MCP directory API, please join our Discord server