Which integrations are available for this server?

Provides a CLI and HTTP server that Hermes agents can use to ingest documents and recall information from a typed knowledge graph.

How do I use Cognify MCP Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Cognify MCP Server ingest meeting notes and recall action items related to design" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Cognify MCP Server

by S3YED

Overview Schema Related Servers Score Discussions

Python

Hybrid

Cognify

A lightweight document-ingestion and typed knowledge-graph engine you can hand to an agent. Drop in raw documents, get back a queryable graph of typed entities and relations plus hybrid (vector + graph) retrieval.

Two interchangeable backends behind one API:

backend	vectors	graph	needs	use
`local` (default)	ChromaDB (ONNX MiniLM)	networkx	nothing external, no torch	drop into an agent box
`neo4j`	TurboVec	Neo4j	a Neo4j instance	shared/server graph

Same 384d embedding space on both, so retrieval behaves identically.

Why not plain RAG

Plain RAG embeds chunks and does similarity search. Cognify also asks a cheap LLM to extract typed entities (Person, Project, Technology, ...) and typed relations (USES, WORKS_AT, BUILT, ...) from every chunk, builds a graph, and expands that graph around your search hits. You get the facts and how they connect, which is what makes multi-hop questions work.

Related MCP server: Modular RAG MCP Server

How it compares

	Cognify	Cognee	Mem0	Graphiti / Zep	LightRAG	plain RAG
Typed entity+relation graph	✅	✅	partial (dropped graph)	✅ (temporal)	✅	❌
Runs with zero external services	✅ (ChromaDB+networkx)	❌ (Kuzu file-lock; Neo4j for multi-agent)	❌ (hosted/Qdrant)	❌ (Neo4j)	⚠️	✅
Torch-free local install	✅ (ONNX embedder)	❌	❌	❌	❌	varies
Same API, swap local ↔ server	✅	⚠️	❌	❌	❌	n/a
Built-in multi-tenancy	✅ (every node)	⚠️	✅	✅	❌	❌
MCP server for Claude	✅	✅	❌	❌	❌	❌
Lines of core code	~1k, readable	large	large	large	medium	tiny
Reconstruction spec for agents	✅ `BLUEPRINT.md`	❌	❌	❌	❌	❌

Where each wins, honestly. Graphiti/Zep is the choice if you need temporal fact-tracking and SOC2/HIPAA compliance. Cognee has more managed connectors and a cloud tier. Mem0 is simplest for pure conversational memory. Cognify wins when you want a real typed graph that an agent can run anywhere — a laptop, an isolated box, or a shared server — with one dependency-light install, one API across backends, and code small enough to read in a sitting. It is the embed-it-in-your-agent option, not the managed-platform option.

Why it's so lightweight

Default backend needs nothing external — ChromaDB (embedded) + a networkx graph in a JSON file. No database server, no Docker, no cloud.
No PyTorch — embeddings come from ChromaDB's bundled ONNX MiniLM. The whole default install is small and CPU-only.
The LLM is the only heavy lift, and it's remote — entity/relation extraction is one cheap API call per chunk; nothing large runs locally.
~1k lines of pure-function code, src layout, one file per concern. The backend protocol is four methods; adding a store is one file.
Scales by swapping a backend, not rewriting — move to TurboVec + Neo4j for a shared graph by changing one env var; the same embeddings and API carry over.

Pipeline (ECL)

ingest(doc) ->  Extract: file/text -> heading-aware ~512-token chunks
                Cognify: per chunk, cheap LLM -> typed entities + relations
                Load:    embed chunks (384d) -> vectors ; write graph
recall(q)   ->  vector search (tenant-scoped) -> expand graph -> chunks + subgraph

Quickstart

./setup.sh local            # venv + deps + .env
source .venv/bin/activate
echo 'OPENROUTER_API_KEY=sk-or-...' >> .env
set -a && . ./.env && set +a

cognify ingest examples/sample_docs/acme.md --tenant demo
cognify recall "what does Pathfinder run on and who owns it?" --tenant demo
cognify stats --tenant demo

Python:

import cognify
be = cognify.get_backend("local")
cognify.ingest(be, "handbook.pdf", tenant="acme", namespace="hr")
res = cognify.recall(be, "who owns onboarding?", tenant="acme")
print(res.entities, res.relations)

Use with Claude

Claude as the extractor — just set the key (auto-detected):

pip install 'cognify-kg[local]'
export ANTHROPIC_API_KEY=sk-ant-...
cognify ingest notes.md --tenant demo && cognify recall "what connects to X?" --tenant demo

Cognify as MCP tools in Claude Code / Desktop:

pip install 'cognify-kg[local,claude]'
claude mcp add cognify -- cognify-mcp

Claude then has cognify_ingest, cognify_recall, cognify_stats. Details in integrations/claude/.

Use with Hermes (and any agent runtime)

The cognify CLI works as-is — a Hermes agent shells out to it. Drop integrations/hermes/SKILL.md into the agent's skills. Or run the HTTP server for a shared/long-running graph:

pip install 'cognify-kg[serve]'
cognify-serve                      # 127.0.0.1:8799
curl -s localhost:8799/recall -d '{"query":"refund policy?","tenant":"acme"}' -H 'content-type: application/json'

Multi-tenancy

Every node carries a tenant (and namespace). Pass a different tenant per client/agent and their data stays isolated: the local backend is a separate store, the neo4j backend filters every query by tenant. This is what makes it safe to run one engine across many agents.

Recommended models (extraction)

Extraction is one cheap LLM call per chunk; pick by cost vs throughput. Numbers below are from a real single-chunk extraction test, not vendor specs.

Model	Via	Cost (rough)	Notes
`openai/gpt-4o-mini`	OpenRouter / OpenAI	~$0.15/$0.60 per M	Recommended default. Fast (~6s/chunk), reliable JSON. A 40-doc KB cost ~$0.20.
`google/gemini-2.0-flash`	OpenRouter / Google	~$0.10/$0.40 per M	Cheapest solid cloud option; big context. Google free tier rate-limits (429) — use a paid key for bulk.
`deepseek/deepseek-chat`	OpenRouter	~$0.14/$0.28 per M	Same quality as gpt-4o-mini but ~3× slower (~17s/chunk). Fine for small batches.
local Qwen / Llama 3.3 / Gemma	Ollama / vLLM	free	The real free path for bulk. Run on your own GPU; point `COGNIFY_LLM_BASE` at it.
Claude Haiku	Anthropic (native)	cheap	Set `ANTHROPIC_API_KEY`; auto-detected. Highest extraction quality of the cheap tier.

Avoid OpenRouter's :free model variants for bulk — they are heavily rate-limited (429) or very slow (one free model measured ~77s/chunk). Free is only practical on local inference.

Switch model with one env var, e.g. local Ollama:

export COGNIFY_LLM_BASE=http://localhost:11434/v1
export COGNIFY_LLM_MODEL=qwen2.5:14b
export COGNIFY_LLM_KEY=ollama        # any non-empty string

Configuration

All via env (see .env.example): COGNIFY_BACKEND, COGNIFY_DATA_DIR, COGNIFY_LLM_BASE/MODEL/KEY, COGNIFY_LLM_PROVIDER, NEO4J_URI/USER/PASSWORD. The LLM endpoint is OpenAI-compatible (OpenRouter, OpenAI, vLLM, Ollama) or native Anthropic (Claude). See the model table above.

For agents

CLAUDE.md is the operating guide. ARCHITECTURE.md explains the design. BLUEPRINT.md is a from-scratch reconstruction spec: hand this repo to an agent and it can rebuild or extend the whole thing.

MIT licensed.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/S3YED/cognify'

If you have feedback or need assistance with the MCP directory API, please join our Discord server