cognee-mcp

Overview Schema Related Servers Score Discussions

cognee
examples
pocs
disambiguation

README.md•3.32 KiB

# POC Disambiguation This folder contains proof-of-concept scripts and assets focused on entity disambiguation during knowledge graph extraction. The POC injects existing entity names into the LLM prompt to bias extraction toward reusing canonical entities instead of creating duplicates. ## What Was Done - Added a custom graph extraction step that performs a vector lookup over existing entity names and appends those candidates to the extraction prompt. - Wired that custom extraction into a full `cognify`-style pipeline to compare against the default behavior. - Provided small synthetic datasets that include aliasing, abbreviations, and spelling variants to make the disambiguation effect easy to observe. - Included a runnable example that builds and visualizes graphs with and without the POC. ## Scripts And What They Do `disambiguate_entities.py` - Provides `disambiguate_entities_pipeline(...)`, a pipeline equivalent to `cognee.cognify()` but with the graph extraction task swapped for the POC entity-disambiguation extractor. Parameters (differences vs `cognee.cognify()`): - `custom_prompt`: Optional prompt string appended with candidates from vector search. - `serach_limit`: Typo in signature; currently unused. - `temporal_cognify`: Present but unused in this POC. - `**kwargs`: Passed through to the POC extractor (used for `vector_search_limit`). `poc_extract_graph_from_data_with_entity_disambiguation.py` - Implements `extract_graph_from_data_with_entity_disambiguation(...)` which: - Looks up existing entities from the vector DB collection `Entity_name`. - Appends the top matches to the prompt (`custom_prompt`) as candidate aliases. - Runs `extract_content_graph(...)` for each chunk using the updated prompt. - Filters invalid edges when using the default `KnowledgeGraph` model. - Integrates chunk graphs and adds nodes/edges via `add_data_points`. Parameters (differences vs `extract_graph_from_data`): - No `context` parameter. - `custom_prompt`: Prompt string to seed the LLM; will be extended with vector matches. - `vector_search_limit` (via `**kwargs`): Number of existing entities to retrieve (default 5). `poc_disambiguate_entities_example.py` - End-to-end runnable example that: - Prunes existing Cognee data. - Reads a small example file line-by-line from `poc_disambiguation/data/`. - Adds each line as a separate datapoint. - Runs either the POC pipeline (`use_poc=True`) or the standard `cognee.cognify()`. - Writes graph visualizations to `poc_disambiguation/results/`. Key parameters: - `use_poc`: If true, runs `disambiguate_entities_pipeline` after each added line. - `vector_search_limit`: Number of candidate entities appended to the prompt. - `custom_prompt`: Prompt template read from `prompts/prompt1.txt`. Outputs: - `results/cognify_simple_<example>_poc_graph.html` - `results/cognify_simple_<example>_graph.html` ## Examples And Data `data/example1.txt` - Variants of “OpenAI” (spacing, punctuation, paraphrases). `data/example2.txt` - “NASA” variants including abbreviations and formal name. `data/example3.txt` - Person name spelling/spacing variants. `data/example4.txt` - Location name variants (NYC, New York, etc.). `prompts/prompt1.txt` - Base prompt for graph extraction with explicit alias-reuse rules. The POC appends existing entity names under the “Existing entities:” section.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/topoteretes/cognee'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•3.32 KiB