pdf-knowledge-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@pdf-knowledge-mcpHow to extract text from PDF content streams?"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
pdf-knowledge-mcp
pdf-knowledge-mcp 是一个面向 PDF 开发经验沉淀的本地 RAG MCP Server。它可以导入 PDF 解析、生成、渲染、文本提取、表格识别、版式分析、字体处理、OCR、PDF/A、签名、加密、性能优化等经验文档,并通过 MCP 工具提供检索和问答能力。
当前实现不依赖远程模型或外部向量数据库。文档会被切分为 chunk,使用本地 TF-IDF 向量和余弦相似度检索,并把索引持久化为 JSON 文件。后续可以在 src/knowledge-store.ts 中替换或扩展 embedding/provider。
安装与构建
cd C:\src\pdf-knowledge-mcp
npm install
npm run buildRelated MCP server: smart-search
启动
npm start该进程通过 stdio 提供 MCP 服务。
默认知识库索引路径为项目内的 data/pdf-knowledge-index.json。如需指定其他位置:
$env:PDF_KNOWLEDGE_STORE_PATH = "C:\src\pdf-knowledge-mcp\data\pdf-knowledge-index.json"
npm startMCP 配置
推荐先链接成本地命令:
cd C:\src\pdf-knowledge-mcp
npm link然后添加到 Codex:
codex mcp add pdf-knowledge -- pdf-knowledge-mcp通用 MCP 客户端配置示例:
{
"mcpServers": {
"pdf-knowledge": {
"command": "node",
"args": ["C:/src/pdf-knowledge-mcp/dist/index.js"],
"env": {
"PDF_KNOWLEDGE_STORE_PATH": "C:/src/pdf-knowledge-mcp/data/pdf-knowledge-index.json"
}
}
}
}工具
ingest_document
导入 PDF 开发经验文档。支持直接传入 content,也支持传入 UTF-8 文本、Markdown、JSON、HTML 文件路径。
{
"title": "PDF text extraction notes",
"source": "notes/text-extraction.md",
"tags": ["parsing", "text", "font"],
"content": "When extracting text from PDF content streams, ToUnicode CMaps are essential...",
"chunkSize": 1800,
"chunkOverlap": 200,
"replaceExisting": true
}也可以从文件导入:
{
"filePath": "C:/docs/pdf-rendering-notes.md",
"tags": ["rendering", "performance"]
}search_knowledge
基于本地向量索引检索相关经验片段。
{
"query": "ToUnicode font extraction",
"limit": 5,
"tags": ["text"]
}返回内容包括分数、文档标题、来源、标签、chunk id、命中词和 excerpt。
ask_pdf_expert
先检索知识库,再基于检索结果生成带来源的 PDF 开发回答。
{
"question": "How should I handle fonts when extracting PDF text?",
"limit": 5,
"maxContextChars": 8000
}如果没有匹配内容,工具会明确提示需要先导入相关经验文档。
验证
npm testSmoke test 会验证:
文档导入、分块和索引持久化;
本地向量检索和标签过滤;
ask_pdf_expert返回带来源的 RAG 回答;MCP Server 可以通过 stdio 响应
initialize请求。
说明
这个服务适合作为 PDF 开发经验知识库的基础版本:它先保证本地、可追溯、可运行。未来可以扩展的方向包括真实 embedding 模型、SQLite/向量数据库、PDF/Docx 文档解析器、自动目录同步、以及与 pdf-debug-mcp、pdf-specification-mcp 的联合查询。
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/tycket033-tech/pdf-knowledge-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server