gnosis-mcp

README.md•19.5 KiB

<div align="center"> <h1>Gnosis MCP</h1> <p><strong>Turn your docs into a searchable knowledge base for AI agents.<br>pip install, ingest, serve.</strong></p> <p> <a href="https://pypi.org/project/gnosis-mcp/"><img src="https://img.shields.io/pypi/v/gnosis-mcp?color=blue" alt="PyPI"></a> <a href="https://pypi.org/project/gnosis-mcp/"><img src="https://img.shields.io/pypi/dm/gnosis-mcp?color=green" alt="Downloads"></a> <a href="https://pypi.org/project/gnosis-mcp/"><img src="https://img.shields.io/pypi/pyversions/gnosis-mcp" alt="Python"></a> <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License"></a> <a href="https://github.com/nicholasglazer/gnosis-mcp/actions"><img src="https://github.com/nicholasglazer/gnosis-mcp/actions/workflows/publish.yml/badge.svg" alt="CI"></a> </p> <p> <a href="#quick-start">Quick Start</a> · <a href="#git-history">Git History</a> · <a href="#web-crawl">Web Crawl</a> · <a href="#backends">Backends</a> · <a href="#editor-integrations">Editors</a> · <a href="#tools--resources">Tools</a> · <a href="#embeddings">Embeddings</a> · <a href="llms-full.txt">Full Reference</a> </p> <a href="#quick-start"><img src="https://raw.githubusercontent.com/nicholasglazer/gnosis-mcp/main/demo-hero.gif" alt="Gnosis MCP — ingest docs, search, view stats, serve" width="700"></a> <br> <sub>Ingest docs → Search with highlights → Stats overview → Serve to AI agents</sub> </div> --- ### Without a docs server - LLMs hallucinate API signatures that don't exist - Entire files dumped into context — 3,000 to 8,000+ tokens each - Architecture decisions buried across dozens of files ### With Gnosis MCP - `search_docs` returns ranked, highlighted excerpts (~600 tokens) - Real answers grounded in your actual documentation - Works across hundreds of docs instantly --- ## Features - **Zero config** — SQLite by default, `pip install` and go - **Hybrid search** — keyword (BM25) + semantic (local ONNX embeddings, no API key) - **Git history** — ingest commit messages as searchable context (`ingest-git`) - **Web crawl** — ingest documentation from any website via sitemap or link crawl - **Multi-format** — `.md` `.txt` `.ipynb` `.toml` `.csv` `.json` + optional `.rst` `.pdf` - **Auto-linking** — `relates_to` frontmatter creates a navigable document graph - **Watch mode** — auto-re-ingest on file changes - **PostgreSQL ready** — pgvector + tsvector when you need scale ## Quick Start ```bash pip install gnosis-mcp gnosis-mcp ingest ./docs/ # loads docs into SQLite (auto-created) gnosis-mcp serve # starts MCP server ``` That's it. Your AI agent can now search your docs. **Want semantic search?** Add local embeddings — no API key needed: ```bash pip install gnosis-mcp[embeddings] gnosis-mcp ingest ./docs/ --embed # ingest + embed in one step gnosis-mcp serve # hybrid search auto-activated ``` Test it before connecting to an editor: ```bash gnosis-mcp search "getting started" # keyword search gnosis-mcp search "how does auth work" --embed # hybrid semantic+keyword gnosis-mcp stats # see what was indexed ``` <details> <summary>Try without installing (uvx)</summary> ```bash uvx gnosis-mcp ingest ./docs/ uvx gnosis-mcp serve ``` </details> ## Web Crawl <div align="center"> <img src="https://raw.githubusercontent.com/nicholasglazer/gnosis-mcp/main/demo-crawl.gif" alt="Gnosis MCP — crawl docs with dry-run, fetch, search, SSRF protection" width="700"> <br> <sub>Dry-run discovery → Crawl & ingest → Search crawled docs → SSRF protection</sub> </div> <br> Ingest docs from any website — no local files needed: ```bash pip install gnosis-mcp[web] # Crawl via sitemap (best for large doc sites) gnosis-mcp crawl https://docs.stripe.com/ --sitemap # Depth-limited link crawl with URL filter gnosis-mcp crawl https://fastapi.tiangolo.com/ --depth 2 --include "/tutorial/*" # Preview what would be crawled gnosis-mcp crawl https://docs.python.org/ --dry-run # Force re-crawl + embed for semantic search gnosis-mcp crawl https://docs.sveltekit.dev/ --sitemap --force --embed ``` Respects `robots.txt`, caches with ETag/Last-Modified for incremental re-crawl, and rate-limits requests (5 concurrent, 0.2s delay). Crawled pages use the URL as the document path and hostname as the category — searchable like any other doc. ## Git History Turn commit messages into searchable context — your agent learns *why* things were built, not just *what* exists: ```bash gnosis-mcp ingest-git . # current repo, all files gnosis-mcp ingest-git /path/to/repo --since 6m # last 6 months only gnosis-mcp ingest-git . --include "src/*" --max-commits 5 # filtered + limited gnosis-mcp ingest-git . --dry-run # preview without ingesting gnosis-mcp ingest-git . --embed # embed for semantic search ``` Each file's commit history becomes a searchable markdown document stored as `git-history/<file-path>`. The agent finds it via `search_docs` like any other doc — no new tools needed. Incremental re-ingest skips files with unchanged history. ## Editor Integrations Add the server config to your editor — your AI agent gets `search_docs`, `get_doc`, and `get_related` tools automatically: ```json { "mcpServers": { "docs": { "command": "gnosis-mcp", "args": ["serve"] } } } ``` | Editor | Config file | |--------|------------| | **Claude Code** | `.claude/mcp.json` (or [install as plugin](#claude-code-plugin)) | | **Cursor** | `.cursor/mcp.json` | | **Windsurf** | `~/.codeium/windsurf/mcp_config.json` | | **JetBrains** | Settings > Tools > AI Assistant > MCP Servers | | **Cline** | Cline MCP settings panel | <details> <summary>VS Code (GitHub Copilot) — slightly different key</summary> Add to `.vscode/mcp.json` (note: `"servers"` not `"mcpServers"`): ```json { "servers": { "docs": { "command": "gnosis-mcp", "args": ["serve"] } } } ``` Also discoverable via the VS Code MCP gallery — search `@mcp gnosis` in the Extensions view. </details> For remote deployment, use Streamable HTTP: ```bash gnosis-mcp serve --transport streamable-http --host 0.0.0.0 --port 8000 ``` ## REST API > v0.10.0+ — Enable native HTTP endpoints alongside MCP on the same port. ```bash gnosis-mcp serve --transport streamable-http --rest ``` Web apps can now query your docs over plain HTTP — no MCP protocol required. | Endpoint | Description | |----------|-------------| | `GET /health` | Server status, version, doc count | | `GET /api/search?q=&limit=&category=` | Search docs (auto-embeds with local provider) | | `GET /api/docs/{path}` | Get document by file path | | `GET /api/docs/{path}/related` | Get related documents | | `GET /api/categories` | List categories with counts | **Environment variables:** | Variable | Description | |----------|-------------| | `GNOSIS_MCP_REST=true` | Enable REST API (same as `--rest`) | | `GNOSIS_MCP_CORS_ORIGINS` | CORS allowed origins: `*` or comma-separated list | | `GNOSIS_MCP_API_KEY` | Optional Bearer token auth | **Examples:** ```bash # Health check curl http://127.0.0.1:8000/health # Search curl "http://127.0.0.1:8000/api/search?q=authentication&limit=5" # With API key curl -H "Authorization: Bearer sk-secret" "http://127.0.0.1:8000/api/search?q=setup" ``` ## Backends | | SQLite (default) | SQLite + embeddings | PostgreSQL | |---|---|---|---| | **Install** | `pip install gnosis-mcp` | `pip install gnosis-mcp[embeddings]` | `pip install gnosis-mcp[postgres]` | | **Config** | Nothing | Nothing | Set `GNOSIS_MCP_DATABASE_URL` | | **Search** | FTS5 keyword (BM25) | Hybrid keyword + semantic (RRF) | tsvector + pgvector hybrid | | **Embeddings** | None | Local ONNX (23MB, no API key) | Any provider + HNSW index | | **Multi-table** | No | No | Yes (`UNION ALL`) | | **Best for** | Quick start, keyword-only | Semantic search without a server | Production, large doc sets | **Auto-detection:** Set `GNOSIS_MCP_DATABASE_URL` to `postgresql://...` and it uses PostgreSQL. Don't set it and it uses SQLite. Override with `GNOSIS_MCP_BACKEND=sqlite|postgres`. <details> <summary>PostgreSQL setup</summary> ```bash pip install gnosis-mcp[postgres] export GNOSIS_MCP_DATABASE_URL="postgresql://user:pass@localhost:5432/mydb" gnosis-mcp init-db # create tables + indexes gnosis-mcp ingest ./docs/ # load your markdown gnosis-mcp serve ``` For hybrid semantic+keyword search, also enable pgvector: ```sql CREATE EXTENSION IF NOT EXISTS vector; ``` Then backfill embeddings: ```bash gnosis-mcp embed # via OpenAI (default) gnosis-mcp embed --provider ollama # or use local Ollama ``` </details> ## Claude Code Plugin For Claude Code users, install as a plugin to get the MCP server plus slash commands: ```bash claude plugin marketplace add nicholasglazer/gnosis-mcp claude plugin install gnosis ``` This gives you: | Component | What you get | |-----------|-------------| | **MCP server** | `gnosis-mcp serve` — auto-configured | | **`/gnosis:search`** | Search docs with keyword or `--semantic` hybrid mode | | **`/gnosis:status`** | Health check — connectivity, doc stats, troubleshooting | | **`/gnosis:manage`** | CRUD — add, delete, update metadata, bulk embed | The plugin works with both SQLite and PostgreSQL backends. <details> <summary>Manual setup (without plugin)</summary> Add to `.claude/mcp.json`: ```json { "mcpServers": { "gnosis": { "command": "gnosis-mcp", "args": ["serve"] } } } ``` For PostgreSQL, add `"env": {"GNOSIS_MCP_DATABASE_URL": "postgresql://..."}`. </details> ## Tools & Resources Gnosis MCP exposes 6 tools and 3 resources over [MCP](https://modelcontextprotocol.io/). Your AI agent calls these automatically when it needs information from your docs. | Tool | What it does | Mode | |------|-------------|------| | `search_docs` | Search by keyword or hybrid semantic+keyword | Read | | `get_doc` | Retrieve a full document by path | Read | | `get_related` | Find linked/related documents | Read | | `upsert_doc` | Create or replace a document | Write | | `delete_doc` | Remove a document and its chunks | Write | | `update_metadata` | Change title, category, tags | Write | Read tools are always available. Write tools require `GNOSIS_MCP_WRITABLE=true`. | Resource URI | Returns | |-----|---------| | `gnosis://docs` | All documents — path, title, category, chunk count | | `gnosis://docs/{path}` | Full document content | | `gnosis://categories` | Categories with document counts | ### How search works ```bash # Keyword search — works on both SQLite and PostgreSQL gnosis-mcp search "stripe webhook" # Hybrid search — keyword + semantic (requires [embeddings] or pgvector) gnosis-mcp search "how does billing work" --embed # Filtered — narrow results to a specific category gnosis-mcp search "auth" -c guides ``` When called via MCP, the agent passes a `query` string for keyword search. With embeddings configured, search automatically combines keyword and semantic results using Reciprocal Rank Fusion. Results include a `highlight` field with matched terms in `<mark>` tags. ## Embeddings Embeddings enable semantic search — finding docs by meaning, not just keywords. **Local ONNX (recommended)** — zero-config, no API key: ```bash pip install gnosis-mcp[embeddings] gnosis-mcp ingest ./docs/ --embed # ingest + embed in one step gnosis-mcp embed # or embed existing chunks separately ``` Uses [MongoDB/mdbr-leaf-ir](https://huggingface.co/MongoDB/mdbr-leaf-ir) (~23MB quantized, Apache 2.0). Auto-downloads on first run. **Remote providers** — OpenAI, Ollama, or any OpenAI-compatible endpoint: ```bash gnosis-mcp embed --provider openai # requires GNOSIS_MCP_EMBED_API_KEY gnosis-mcp embed --provider ollama # uses local Ollama server ``` **Pre-computed vectors** — pass `embeddings` to `upsert_doc` or `query_embedding` to `search_docs` from your own pipeline. ## Configuration All settings via environment variables. Nothing required for SQLite — it works with zero config. | Variable | Default | Description | |----------|---------|-------------| | `GNOSIS_MCP_DATABASE_URL` | SQLite auto | PostgreSQL URL or SQLite file path | | `GNOSIS_MCP_BACKEND` | `auto` | Force `sqlite` or `postgres` | | `GNOSIS_MCP_WRITABLE` | `false` | Enable write tools | | `GNOSIS_MCP_TRANSPORT` | `stdio` | Transport: `stdio`, `sse`, or `streamable-http` | | `GNOSIS_MCP_EMBEDDING_DIM` | `1536` | Vector dimension for init-db | <details> <summary>All configuration variables</summary> **Database:** `GNOSIS_MCP_SCHEMA` (public), `GNOSIS_MCP_CHUNKS_TABLE` (documentation_chunks), `GNOSIS_MCP_LINKS_TABLE` (documentation_links), `GNOSIS_MCP_SEARCH_FUNCTION` (custom search on PG). **Search & chunking:** `GNOSIS_MCP_CONTENT_PREVIEW_CHARS` (200), `GNOSIS_MCP_CHUNK_SIZE` (4000), `GNOSIS_MCP_SEARCH_LIMIT_MAX` (20). **Connection pool (PostgreSQL):** `GNOSIS_MCP_POOL_MIN` (1), `GNOSIS_MCP_POOL_MAX` (3). **Webhooks:** `GNOSIS_MCP_WEBHOOK_URL`, `GNOSIS_MCP_WEBHOOK_TIMEOUT` (5s). **Embeddings:** `GNOSIS_MCP_EMBED_PROVIDER` (openai/ollama/custom/local), `GNOSIS_MCP_EMBED_MODEL`, `GNOSIS_MCP_EMBED_DIM` (384), `GNOSIS_MCP_EMBED_API_KEY`, `GNOSIS_MCP_EMBED_URL`, `GNOSIS_MCP_EMBED_BATCH_SIZE` (50). **Column overrides:** `GNOSIS_MCP_COL_FILE_PATH`, `GNOSIS_MCP_COL_TITLE`, `GNOSIS_MCP_COL_CONTENT`, `GNOSIS_MCP_COL_CHUNK_INDEX`, `GNOSIS_MCP_COL_CATEGORY`, `GNOSIS_MCP_COL_AUDIENCE`, `GNOSIS_MCP_COL_TAGS`, `GNOSIS_MCP_COL_EMBEDDING`, `GNOSIS_MCP_COL_TSV`, `GNOSIS_MCP_COL_SOURCE_PATH`, `GNOSIS_MCP_COL_TARGET_PATH`, `GNOSIS_MCP_COL_RELATION_TYPE`. **Logging:** `GNOSIS_MCP_LOG_LEVEL` (INFO). </details> <details> <summary>Custom search function (PostgreSQL)</summary> Delegate search to your own PostgreSQL function for custom ranking: ```sql CREATE FUNCTION my_schema.my_search( p_query_text text, p_categories text[], p_limit integer ) RETURNS TABLE ( file_path text, title text, content text, category text, combined_score double precision ) ... ``` ```bash GNOSIS_MCP_SEARCH_FUNCTION=my_schema.my_search ``` </details> <details> <summary>Multi-table mode (PostgreSQL)</summary> Query across multiple doc tables: ```bash GNOSIS_MCP_CHUNKS_TABLE=documentation_chunks,api_docs,tutorial_chunks ``` All tables must share the same schema. Reads use `UNION ALL`. Writes target the first table. </details> <details> <summary>CLI reference</summary> ``` gnosis-mcp ingest <path> [--dry-run] [--force] [--embed] Load files into database gnosis-mcp ingest-git <repo> [--since] [--max-commits] [--include] [--exclude] [--dry-run] [--embed] gnosis-mcp crawl <url> [--sitemap] [--depth N] [--include] [--exclude] [--dry-run] [--force] [--embed] gnosis-mcp serve [--transport stdio|sse|streamable-http] [--ingest PATH] [--watch PATH] gnosis-mcp search <query> [-n LIMIT] [-c CAT] [--embed] Search docs gnosis-mcp stats Document, chunk, and embedding counts gnosis-mcp check Verify DB connection + sqlite-vec gnosis-mcp embed [--provider P] [--model M] [--dry-run] Backfill embeddings gnosis-mcp init-db [--dry-run] Create tables + indexes gnosis-mcp export [-f json|markdown|csv] [-c CAT] Export documents gnosis-mcp diff <path> Preview changes on re-ingest ``` </details> <details> <summary>How ingestion works</summary> `gnosis-mcp ingest` scans a directory for supported files and loads them into the database: - **Multi-format** — Markdown native; `.txt`, `.ipynb`, `.toml`, `.csv`, `.json` auto-converted. Optional: `.rst` (`[rst]` extra), `.pdf` (`[pdf]` extra) - **Smart chunking** — splits by H2 headings (H3/H4 for oversized sections), never splits inside code blocks or tables - **Frontmatter** — extracts `title`, `category`, `audience`, `tags` from YAML frontmatter - **Auto-linking** — `relates_to` in frontmatter creates bidirectional links for `get_related` - **Auto-categorization** — infers category from parent directory name - **Incremental** — content hashing skips unchanged files (`--force` to override) - **Watch mode** — `gnosis-mcp serve --watch ./docs/` auto-re-ingests on changes </details> <details> <summary>Architecture</summary> ``` src/gnosis_mcp/ ├── backend.py DocBackend protocol + create_backend() factory ├── pg_backend.py PostgreSQL — asyncpg, tsvector, pgvector ├── sqlite_backend.py SQLite — aiosqlite, FTS5, sqlite-vec hybrid search (RRF) ├── sqlite_schema.py SQLite DDL — tables, FTS5, triggers, vec0 virtual table ├── config.py Config from env vars, backend auto-detection ├── db.py Backend lifecycle + FastMCP lifespan ├── server.py FastMCP server — 6 tools, 3 resources, auto-embed queries ├── ingest.py File scanner + converters — multi-format, smart chunking ├── crawl.py Web crawler — sitemap/BFS, robots.txt, ETag caching ├── parsers/ Non-file ingest sources (git history, future: schemas) │ └── git_history.py Git log → markdown documents per file ├── watch.py File watcher — mtime polling, auto-re-ingest ├── schema.py PostgreSQL DDL — tables, indexes, search functions ├── embed.py Embedding providers — OpenAI, Ollama, custom, local ONNX ├── local_embed.py Local ONNX embedding engine — HuggingFace model download └── cli.py CLI — serve, ingest, crawl, search, embed, stats, check ``` </details> ## Available On [MCP Registry](https://registry.modelcontextprotocol.io) (feeds VS Code MCP gallery and GitHub Copilot) · [PyPI](https://pypi.org/project/gnosis-mcp/) · [mcp.so](https://mcp.so) · [Glama](https://glama.ai) · [cursor.directory](https://cursor.directory) ## AI-Friendly Docs | File | Purpose | |------|---------| | [`llms.txt`](llms.txt) | Quick overview — what it does, tools, config | | [`llms-full.txt`](llms-full.txt) | Complete reference in one file | | [`llms-install.md`](llms-install.md) | Step-by-step installation guide | ## Performance Benchmarked on SQLite (in-memory) with keyword search (FTS5 + BM25): | Corpus | QPS | p50 | p95 | p99 | Hit Rate | |--------|-----|-----|-----|-----|----------| | 100 docs / 300 chunks | ~9,800 | 0.09ms | 0.16ms | 0.18ms | 100% | | 500 docs / 1,500 chunks | ~3,500 | 0.24ms | 0.51ms | 0.82ms | 100% | Install size: ~23MB with `[embeddings]` (ONNX model). Base install is ~5MB. Run the benchmark yourself: ```bash python tests/bench/bench_search.py # 100 docs, 1000 queries python tests/bench/bench_search.py --docs 500 # larger corpus python tests/bench/bench_search.py --json # machine-readable output ``` 550+ tests, 10 eval cases (90% hit rate, 0.85 MRR on sample corpus). All tests run without a database. ## Development ```bash git clone https://github.com/nicholasglazer/gnosis-mcp.git cd gnosis-mcp python -m venv .venv && source .venv/bin/activate pip install -e ".[dev]" pytest # 550+ tests, no database needed ruff check src/ tests/ ``` All tests run without a database. Keep it that way. Good first contributions: new embedding providers, export formats, ingestion for new file types (via optional extras). Open an issue first for larger changes. ## Sponsors If Gnosis MCP saves you time, consider [sponsoring the project](https://github.com/sponsors/nicholasglazer). ## License [MIT](LICENSE)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/nicholasglazer/gnosis-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•19.5 KiB