okfgen-mcp
Generates a knowledge graph from Firestore collections by extracting field types from sampled documents.
Generates a knowledge graph from a Git repository by scanning file structure, README, and code modules.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@okfgen-mcpgenerate a knowledge graph from this repository"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
okfgen is a deterministic reference implementation of both sides of the
Open Knowledge Format (OKF) —
Google's vendor-neutral standard for representing the knowledge around your data
and systems as just markdown files with YAML frontmatter
(announcement).
Producers turn a source system, database, docs site, or live open-data portal into a bundle.
Consumers read a bundle back out — a viewer, a search index, a reasoning agent.
It extracts structured facts straight from the source (schemas, file
structure, READMEs, dependency manifests, page headings). No LLM and no API key
are required; an optional --llm flag adds Claude-powered enrichment where you
want it.
PRODUCERS CONSUMERS
git repo ─┐ ┌─ visualize → interactive HTML graph
database ─┤ ├─ search → full-text index
open data ─┼─► generate ─► BUNDLE ─► ┼─ ask → reasoning agent
local dir ─┤ │ (.md + └─ validate → conformance check
web docs ─┘ ▼ frontmatter)
enrich (pass 2: join paths, backlinks, citations)Quickstart (30 seconds)
Zero-install with uv — turn this directory into
a knowledge graph and open it:
uvx okfgen generate . -o my-okf
uvx okfgen visualize my-okf -o my-okf/graph.html
# then open my-okf/graph.htmlOr with pip:
pip install okfgen
okfgen generate . -o my-okf && okfgen visualize my-okf -o my-okf/graph.html🎥 Demo GIF coming soon (see docs/RECORD_DEMO.md) — meanwhile the live gallery is fully interactive.
Related MCP server: MegaMemory
Why okfgen
One command, any source → a knowledge graph. Code, databases, docs, and live open-data portals — the same tool, the same output.
Deterministic, offline, no API key. Reproducible facts, not LLM hallucinations. Runs in air-gapped environments. The LLM is strictly opt-in.
Open format, zero lock-in. Output is plain markdown + YAML you can read, diff in git, and grep — not a proprietary database.
Agent-ready. Search, a citation-backed reasoning agent, and a portable JSON index make bundles first-class context for RAG and AI agents.
A viewer you can email. The visualizer is a single self-contained HTML file — no backend, no CDN, data never leaves the page.
Reference implementation of an open standard. Tracks the OKF v0.1 spec; every bundle it emits passes its own conformance validator.
How it compares
okfgen | Data catalogs(DataHub / Amundsen) | LLM auto-doc tools | Hand-written wiki | |
Runs with no server/DB to deploy | ✅ | ❌ | ⚠️ | ✅ |
No API key / no LLM required | ✅ | ✅ | ❌ | ✅ |
Deterministic & reproducible | ✅ | ✅ | ❌ | ✅ |
Open, plain-markdown output (no lock-in) | ✅ | ❌ | ⚠️ | ✅ |
Code and DB and docs and live open data | ✅ | ⚠️ data only | ⚠️ | manual |
Self-contained interactive graph viewer | ✅ | ⚠️ needs server | ❌ | ❌ |
Agent-ready (search + reasoning over bundle) | ✅ | ⚠️ | ✅ | ❌ |
Time to first result | seconds | hours–days | minutes | ∞ |
Install
pip install okfgen # core: git, local, web, schema (zero deps)
pip install "okfgen[all]" # + BigQuery, Firebase, MCP, PyYAMLOptional extras: [bigquery], [firebase], [mcp], [yaml], [dev].
For development from a clone: pip install -e '.[dev]'.
Producers — make a bundle from your data
okfgen generate https://github.com/psf/requests.git # a source system (git)
okfgen generate ./my-project # a source system (local)
okfgen generate schema:./warehouse.schema.json # a database (offline)
okfgen generate schema:./ddl.sql # a database (SQL DDL)
okfgen generate bq:my-gcp-project # BigQuery datasets/tables
okfgen generate firebase:my-firebase-project # Firestore collections
okfgen generate https://docs.mytool.dev/ # a documentation site
okfgen generate ckan:https://portal/dataset/some-set # a live CKAN open-data portal
okfgen generate socrata:https://data.cityofnewyork.us/d/erm2-nwe9 # a live Socrata datasetInput | Detected as | What it extracts |
|
| shallow-clones, then scans like a local dir |
a directory path |
| README overview, per-directory code modules (functions/classes/types), doc files, dependency inventory |
|
| dataset + table concepts with full column schemas — no cloud creds |
|
| one concept per dataset and per table, with column schemas |
|
| one concept per Firestore collection, fields/types inferred from sampled docs |
|
| a live CKAN open-data dataset → one concept per resource, with live column schemas + example rows from the DataStore. No auth; works against data.gov, data.gov.au, the EU portal, city portals, etc. |
|
| a live Socrata dataset (NYC Open Data, Seattle, Chicago, many state portals) → Dataset + Table concepts with live column schema + descriptions + example rows. No auth. |
|
| crawls same-host pages (depth/page budget) into one concept per page |
Cloud sources use Application Default Credentials
(gcloud auth application-default login). Output goes to ./<name>-okf/.
The enrichment agent (pass 2)
Producers draft concepts; the enrichment agent enriches them — exactly the
two-pass pattern from the OKF blog. Deterministically, it infers join paths
between tables from foreign-key naming (customer_id → customers) and wires
backlinks so the graph is navigable both ways:
okfgen enrich ./my-okf # in place
okfgen enrich ./my-okf -o ./enriched # to a new directory
okfgen enrich ./my-okf --llm # also rewrite descriptions via ClaudeConsumers — read a bundle back out
The OKF value proposition is producer/consumer independence: any consumer works on any bundle, regardless of who produced it.
# Viewer: a self-contained interactive graph (no backend, no CDN, data stays local)
okfgen visualize ./my-okf -o graph.html
# Search index: full-text, TF-IDF ranked
okfgen search ./my-okf "weekly active users"
okfgen search ./my-okf --export index.json # portable JSON index
# Reasoning agent: retrieves concepts, follows join links, answers with citations
okfgen ask ./my-okf "how do orders relate to customers?"
okfgen ask ./my-okf "..." --llm # phrase answer via Claude
# Conformance validation
okfgen validate ./my-okf --strictokfgen ask shows its work — the retrieved concepts, the links it traversed, and
the citations behind the answer — so the reasoning is auditable.
Use it inside your AI agent (MCP)
okfgen ships an MCP server, so Claude Desktop, Claude Code, Cursor, and any Model Context Protocol client can produce and reason over OKF bundles without leaving the agent.
pip install "okfgen[mcp]"
okfgen-mcp # stdio MCP serverRegister it (e.g. Claude Desktop claude_desktop_config.json, or Cursor's MCP
settings):
{
"mcpServers": {
"okfgen": { "command": "okfgen-mcp" }
}
}Exposed tools: okfgen_generate, okfgen_search, okfgen_ask,
okfgen_validate, okfgen_visualize, okfgen_list_source_types. Now an agent
can say "catalog this database and tell me how orders join to customers" and
get grounded, cited answers.
Sample bundles
Browse the sample knowledge graphs online: https://bushans.github.io/okfgen/
Ready-to-browse bundles live in samples/bundles/. Open any
graph.html in a browser, or point the consumers at them. The same visualizers
are published to GitHub Pages from docs/ (regenerate with
python samples/build_pages.py).
Three offline, reproducible bundles (database, source system, docs site):
python samples/build_samples.pyOne live public-data bundle — Toronto Beaches Water Quality from the Toronto Open Data CKAN portal:
python samples/build_live_samples.py
See samples/README.md for details.
Output layout
<name>-okf/
├── index.md # root listing + okf_version: "0.1"
├── log.md # generation / enrichment log (ISO-dated)
├── overview.md # the root "Project" / "Data Project" concept
├── dependencies.md # parsed manifests (git/local)
├── docs/… # documentation concepts
├── modules/… # per-directory code concepts (git/local)
├── datasets/… tables/… # database / BigQuery concepts
├── collections/… # Firestore concepts
├── pages/… # web page concepts
└── graph.html # (after `visualize`) the interactive viewerEvery concept carries the required type frontmatter field plus recommended
title/description/resource/tags/timestamp, and bodies use the
conventional OKF # Schema, # Examples, # Citations, # Joins headings.
Design notes
Deterministic by default. git/local/web/schema run on the standard library alone (zero third-party deps). Cloud SDKs and the LLM are optional extras, loaded lazily and off unless you ask.
Producer/consumer split. Consumers depend only on markdown + frontmatter (
okfgen/consumer.py), never on producer internals.Scriptable. Every command prints its primary output path to stdout and logs to stderr:
BUNDLE=$(okfgen generate ./repo).
Development
pip install -e '.[dev]'
pytestNew to the project? TESTING.md is a step-by-step VS Code walkthrough: environment setup, running the test suite, and driving every producer/consumer command locally.
License
Apache-2.0.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/bushans/okfgen'
If you have feedback or need assistance with the MCP directory API, please join our Discord server