Skip to main content
Glama
aklianeva

Internal Documentation Search

by aklianeva

Internal Documentation Search — MCP Server

What's in the box: An MCP server with 5 tools, 3 prompt templates, and 2 resources backed by a knowledge base of 16 markdown documents (standards, runbooks, ADRs). Includes RBAC with 3 roles, sliding-window rate limiting, input sanitisation, and a CLI scraper to ingest docs from URLs. RAG (ChromaDB) is available as an optional extra — keyword search works well for structured docs with clear titles and tags, but RAG adds value when engineers phrase queries differently from how documents are written (e.g. searching "why do we use Kafka" against an ADR titled "Adopt Event-Driven Architecture").

An MCP server that exposes an internal knowledge base to AI assistants, enabling engineers to search standards, runbooks, and architecture decisions from within their coding workflow.

Why this scenario? Finding internal docs is one of the highest-friction points for engineers. This brings company-specific knowledge directly into the AI assistant, keeping engineers in flow.

Architecture

AI Assistant ◄──MCP (stdio)──► MCP Server
                                 ├─ config.py   (Pydantic BaseSettings)
                                 ├─ auth.py     (RBAC + rate limiting)
                                 ├─ models.py   (Document, Category, enums)
                                 ├─ server.py   (tools, prompts, resources)
                                 ├─ scraper.py  (CLI: scrape URLs → markdown)
                                 └─ kb/         (knowledge base sub-package)
                                      ├─ store.py   (document store + search)
                                      ├─ loader.py  (markdown file loader)
                                      └─ rag.py     (ChromaDB vector search)

Documents are stored as markdown files with YAML frontmatter in docs/knowledge_base/ (16 docs: 6 Standards, 5 Runbooks, 5 ADRs). They are loaded at startup by the document loader.

Tools

Tool

Purpose

search_internal_docs

Free-text search with optional category/tag filters, ranked by relevance

get_document

Full document content by ID, with semantically related docs

list_doc_categories

Available categories

list_doc_tags

All tags for refining searches

list_all_docs

Browse all documents, optionally by category

Prompts

Prompt

Purpose

investigate_incident

Structured incident investigation using runbooks

check_code_standards

Review code against internal standards

explain_architecture_decision

Explain why a technology was adopted (ADRs)

Resources

  • docs://categories — list of categories

  • docs://document/{doc_id} — individual document access

RAG (optional)

When enabled, search uses hybrid ranking — keyword scoring blended with ChromaDB vector similarity. Related documents in get_document are found via semantic similarity instead of hardcoded lists.

# Install with RAG support
uv pip install -e '.[rag]'

# Run with RAG enabled
RAG_ENABLED=true uv run internal-docs-mcp

Scraping new documents

Scrape any URL into a knowledge base markdown file:

uv pip install -e '.[scrape]'

internal-docs-scrape https://example.com/some-doc \
    --id EXT-001 \
    --title "External Integration Guide" \
    --category standard \
    --tags integration,api

Files are saved to docs/knowledge_base/ and loaded automatically on next server start.

Setup

Prerequisites: Python 3.12+, uv (recommended) or pip

# Install (core)
uv pip install -e '.[dev]'

# Install all extras (RAG + scraping)
uv pip install -e '.[all]'

# Run
python -m internal_docs_mcp

# Test
uv run pytest tests/ -v

# Lint
uv run ruff check src/ tests/

Connect to Claude Code

claude mcp add internal-docs /path/to/internal-docs-mcp/run_server.sh

What I Would Improve

Note: This project uses local markdown files as the data source to keep the demo self-contained. In a real production setup the MCP server would connect to live APIs and databases — e.g. Confluence/GitLab wikis, PostgreSQL, Elasticsearch — so engineers always get up-to-date results without manual file management.

  1. Real data source — Connect to Confluence/GitLab wikis via API with a Redis cache.

  2. RBAC via SSO — Map caller identity from SSO tokens instead of static API keys.

  3. Prompt injection classifier — Add Lakera Guard or similar instead of relying on character-level sanitisation.

  4. Document freshness — Flag docs not updated in >6 months as potentially stale in search results.

  5. Tag match mode — Add tag_match parameter ("any" vs "all") to search_internal_docs so tags=["kafka","kubernetes"] can return docs matching either tag.

  6. Feedback loop — Thumbs-up/down on results to improve ranking.

  7. SSE transport — Centralised server deployment for the whole org.

  8. Audit log to JSONL — Append every tool call + outcome (timestamp, doc IDs, role, injection flags) to a file for compliance and debugging.

  9. Usage analytics tool — Admin-only get_usage_stats tool that tracks popular queries, most-accessed docs, and zero-result searches to surface knowledge gaps.

  10. Onboarding promptonboard_engineer(team, role) prompt template that guides new hires to relevant standards + ADRs for their team without knowing what to search for.

Install Server
A
security – no known vulnerabilities
F
license - not found
A
quality - confirmed to work

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/aklianeva/mcp1'

If you have feedback or need assistance with the MCP directory API, please join our Discord server