Which integrations are available for this server?

Provides a CLI and HTTP server that Hermes agents can use to ingest documents and recall information from a typed knowledge graph.

How do I use Cognify MCP Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Cognify MCP Server ingest meeting notes and recall action items related to design" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Cognify MCP Server

by S3YED

Overview Schema Related Servers Score Discussions

Python

Hybrid

Cognify

A lightweight document-ingestion and typed knowledge-graph engine you can hand to an agent. Drop in raw documents, get back a queryable graph of typed entities and relations plus hybrid (vector + graph) retrieval.

Two interchangeable backends behind one API:

backend	vectors	graph	needs	use
`local` (default)	ChromaDB (ONNX MiniLM)	networkx	nothing external, no torch	drop into an agent box
`neo4j`	TurboVec	Neo4j	a Neo4j instance	shared/server graph

Same 384d embedding space on both, so retrieval behaves identically.

Why not plain RAG

Plain RAG embeds chunks and does similarity search. Cognify also asks a cheap LLM to extract typed entities (Person, Project, Technology, ...) and typed relations (USES, WORKS_AT, BUILT, ...) from every chunk, builds a graph, and expands that graph around your search hits. You get the facts and how they connect, which is what makes multi-hop questions work.

Related MCP server: Enterprise Knowledge MCP Server

How it compares

	Cognify	Cognee	Mem0	Graphiti / Zep	LightRAG	plain RAG
Typed entity+relation graph	✅	✅	partial (dropped graph)	✅ (temporal)	✅	❌
Runs with zero external services	✅ (ChromaDB+networkx)	❌ (Kuzu file-lock; Neo4j for multi-agent)	❌ (hosted/Qdrant)	❌ (Neo4j)	⚠️	✅
Torch-free local install	✅ (ONNX embedder)	❌	❌	❌	❌	varies
Same API, swap local ↔ server	✅	⚠️	❌	❌	❌	n/a
Built-in multi-tenancy	✅ (every node)	⚠️	✅	✅	❌	❌
MCP server for Claude	✅	✅	❌	❌	❌	❌
Lines of core code	~1k, readable	large	large	large	medium	tiny
Reconstruction spec for agents	✅ `BLUEPRINT.md`	❌	❌	❌	❌	❌

Where each wins, honestly. Graphiti/Zep is the choice if you need temporal fact-tracking and SOC2/HIPAA compliance. Cognee has more managed connectors and a cloud tier. Mem0 is simplest for pure conversational memory. Cognify wins when you want a real typed graph that an agent can run anywhere — a laptop, an isolated box, or a shared server — with one dependency-light install, one API across backends, and code small enough to read in a sitting. It is the embed-it-in-your-agent option, not the managed-platform option.

Why it's so lightweight

Default backend needs nothing external — ChromaDB (embedded) + a networkx graph in a JSON file. No database server, no Docker, no cloud.
No PyTorch — embeddings come from ChromaDB's bundled ONNX MiniLM. The whole default install is small and CPU-only.
The LLM is the only heavy lift, and it's remote — entity/relation extraction is one cheap API call per chunk; nothing large runs locally.
~1k lines of pure-function code, src layout, one file per concern. The backend protocol is four methods; adding a store is one file.
Scales by swapping a backend, not rewriting — move to TurboVec + Neo4j for a shared graph by changing one env var; the same embeddings and API carry over.

Pipeline (ECL)

ingest(doc) ->  Extract: file/text -> heading-aware ~512-token chunks
                Cognify: per chunk, cheap LLM -> typed entities + relations
                Load:    embed chunks (384d) -> vectors ; write graph
recall(q)   ->  vector search (tenant-scoped) -> expand graph (hops=1..3)
                -> chunks + subgraph
forget(doc) ->  delete a document + its chunks/vectors, prune entities no
                longer mentioned anywhere (CLI `cognify forget`, HTTP DELETE /doc)

Quickstart

./setup.sh local            # venv + deps + .env
source .venv/bin/activate
echo 'OPENROUTER_API_KEY=sk-or-...' >> .env
set -a && . ./.env && set +a

cognify ingest examples/sample_docs/acme.md --tenant demo
cognify recall "what does Pathfinder run on and who owns it?" --tenant demo
cognify stats --tenant demo

Python:

import cognify
be = cognify.get_backend("local")
cognify.ingest(be, "handbook.pdf", tenant="acme", namespace="hr")
res = cognify.recall(be, "who owns onboarding?", tenant="acme")
print(res.entities, res.relations)

Use with Claude

Claude as the extractor — just set the key (auto-detected):

pip install 'cognify-kg[local]'
export ANTHROPIC_API_KEY=sk-ant-...
cognify ingest notes.md --tenant demo && cognify recall "what connects to X?" --tenant demo

Cognify as MCP tools in Claude Code / Desktop:

pip install 'cognify-kg[local,claude]'
claude mcp add cognify -- cognify-mcp

Claude then has cognify_ingest, cognify_recall, cognify_stats. Details in integrations/claude/.

Use with Hermes (and any agent runtime)

The cognify CLI works as-is — a Hermes agent shells out to it. Drop integrations/hermes/SKILL.md into the agent's skills. Or run the HTTP server for a shared/long-running graph:

pip install 'cognify-kg[serve]'
cognify-serve                      # 127.0.0.1:8799
curl -s localhost:8799/recall -d '{"query":"refund policy?","tenant":"acme"}' -H 'content-type: application/json'

For a fleet-shared server, bind loopback + your VPN/tailnet IP (comma-separated; never 0.0.0.0), gate it with an API key, and go torch-free with the ONNX embedder:

pip install 'cognify-kg[neo4j,serve,fastembed]'
export COGNIFY_BACKEND=neo4j COGNIFY_EMBED_PROVIDER=fastembed
export COGNIFY_HOST=127.0.0.1,100.x.y.z COGNIFY_API_KEY=$(openssl rand -hex 24)
cognify-serve   # /health stays open; everything else needs x-api-key

Bulk ingest is network-bound on the extractor; parallelize it with --workers 8 (or COGNIFY_EXTRACT_WORKERS) and use --cache for cheap re-runs.

Multi-tenancy

Every node carries a tenant (and namespace). Pass a different tenant per client/agent and their data stays isolated: the local backend is a separate store, the neo4j backend filters every query by tenant. This is what makes it safe to run one engine across many agents.

Recommended models (extraction)

Extraction is one cheap LLM call per chunk; pick by cost vs throughput. Numbers below are from a real single-chunk extraction test, not vendor specs.

Model	Via	Cost (rough)	Notes
`openai/gpt-4o-mini`	OpenRouter / OpenAI	~$0.15/$0.60 per M	Recommended default. Fast (~6s/chunk), reliable JSON. A 40-doc KB cost ~$0.20.
`google/gemini-2.0-flash`	OpenRouter / Google	~$0.10/$0.40 per M	Cheapest solid cloud option; big context. Google free tier rate-limits (429) — use a paid key for bulk.
`deepseek/deepseek-chat`	OpenRouter	~$0.14/$0.28 per M	Same quality as gpt-4o-mini but ~3× slower (~17s/chunk). Fine for small batches.
local Qwen / Llama 3.3 / Gemma	Ollama / vLLM	free	The real free path for bulk. Run on your own GPU; point `COGNIFY_LLM_BASE` at it.
Claude Haiku	Anthropic (native)	cheap	Set `ANTHROPIC_API_KEY`; auto-detected. Highest extraction quality of the cheap tier.

Avoid OpenRouter's :free model variants for bulk — they are heavily rate-limited (429) or very slow (one free model measured ~77s/chunk). Free is only practical on local inference.

Switch model with one env var, e.g. local Ollama:

export COGNIFY_LLM_BASE=http://localhost:11434/v1
export COGNIFY_LLM_MODEL=qwen2.5:14b
export COGNIFY_LLM_KEY=ollama        # any non-empty string

Configuration

All via env (see .env.example): COGNIFY_BACKEND, COGNIFY_DATA_DIR, COGNIFY_LLM_BASE/MODEL/KEY, COGNIFY_LLM_PROVIDER, COGNIFY_EMBED_PROVIDER (st | fastembed), COGNIFY_EXTRACT_WORKERS, COGNIFY_HOST/PORT, NEO4J_URI/USER/PASSWORD. The LLM endpoint is OpenAI-compatible (OpenRouter, OpenAI, vLLM, Ollama) or native Anthropic (Claude). See the model table above.

For agents

CLAUDE.md is the operating guide. ARCHITECTURE.md explains the design. BLUEPRINT.md is a from-scratch reconstruction spec: hand this repo to an agent and it can rebuild or extend the whole thing.

MIT licensed.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Related MCP Servers

MinerU Document Explorerofficial
RAG Systems Search
opendatalab
A
license
-
quality
D
maintenance
Enables AI agents to search, deep-read, and build knowledge bases from Markdown, PDF, DOCX, and PPTX documents via MCP tools for retrieval, document navigation, and ingestion.
Last updated 2026-04-26
73
612
MIT
Enterprise Knowledge MCP
RAG Systems Search
j84077200345-dotcom
A
license
-
quality
B
maintenance
Enables querying enterprise documents (DOCX, PDF, PPTX) using natural language, with hybrid search and MCP integration for Claude Desktop and other agents.
Last updated 2026-06-20
MIT
LightRAG MCP Server
Knowledge & Memory RAG Systems
nailmailster
A
license
-
quality
B
maintenance
Enables AI clients to interact with a LightRAG knowledge graph server via MCP, providing 30 tools for queries, document management, and graph operations.
Last updated 2026-06-24
112
MIT
flexorch-mcpofficial
RAG Systems AI & Machine Learning Search
flexorch
A
license
A
quality
A
maintenance
Enables Claude and other MCP-compatible agents to process documents, extract structured data, detect PII, and export LLM-ready datasets through natural language tool calls.
Last updated 2026-07-05
6
1
MIT

View all related MCP servers

Related MCP Connectors

OntoRamp Knowledge Cartographer
Knowledge coverage map and health score. Ingest docs into a governed knowledge graph via MCP.
Frenchie
OCR, transcription, file extraction, and image generation for AI agents via MCP.
mcp
Your memory, everywhere AI goes. Build knowledge once, access it via MCP anywhere.

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/S3YED/cognify'

If you have feedback or need assistance with the MCP directory API, please join our Discord server