MCP Docling Server
MCP 文档服务器
使用 Docling 库提供文档处理功能的 MCP 服务器。
安装
您可以使用 pip 安装该软件包:
pip install -e .Related MCP server: MarkItDown MCP Server
用法
使用 stdio(默认)或 SSE 传输启动服务器:
# Using stdio transport (default)
mcp-server-lls
# Using SSE transport on custom port
mcp-server-lls --transport sse --port 8000如果您使用 uv,则无需安装即可直接运行服务器:
# Using stdio transport (default)
uv run mcp-server-lls
# Using SSE transport on custom port
uv run mcp-server-lls --transport sse --port 8000可用工具
该服务器公开以下工具:
convert_document :将文档从 URL 或本地路径转换为 markdown 格式
source:文档的 URL 或本地文件路径(必需)enable_ocr:是否对扫描文档启用 OCR(可选,默认值:false)ocr_language:OCR 的语言代码列表,例如 [“en”,“fr”](可选)
convert_document_with_images :转换文档并提取嵌入的图像
source:文档的 URL 或本地文件路径(必需)enable_ocr:是否对扫描文档启用 OCR(可选,默认值:false)ocr_language:OCR 的语言代码列表(可选)
extract_tables :从文档中提取表格作为结构化数据
source:文档的 URL 或本地文件路径(必需)
convert_batch :以批处理模式处理多个文档
sources:文档的 URL 或文件路径列表(必需)enable_ocr:是否对扫描文档启用 OCR(可选,默认值:false)ocr_language:OCR 的语言代码列表(可选)
qna_from_document :从 URL 或本地路径创建 YAML 格式的问答文档
source:文档的 URL 或本地文件路径(必需)no_of_qnas:预期问答数量(可选,默认值:5)注意:此工具需要将 IBM Watson X 凭证设置为环境变量:
WATSONX_PROJECT_ID:您的 Watson X 项目 IDWATSONX_APIKEY:您的 IBM Cloud API 密钥WATSONX_URL:Watson X API URL(默认值: https ://us-south.ml.cloud.ibm.com)
get_system_info :获取有关系统配置和加速状态的信息
Llama Stack 示例
https://github.com/user-attachments/assets/8ad34e50-cbf7-4ec8-aedd-71c42a5de0a1
您可以将此服务器与Llama Stack配合使用,为您的 LLM 应用程序提供文档处理功能。请确保您已运行 Llama Stack 服务器,然后配置您的INFERENCE_MODEL
from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger
from llama_stack_client.types.agent_create_params import AgentConfig
from llama_stack_client.types.shared_params.url import URL
from llama_stack_client import LlamaStackClient
import os
# Set your model ID
model_id = os.environ["INFERENCE_MODEL"]
client = LlamaStackClient(
base_url=f"http://localhost:{os.environ.get('LLAMA_STACK_PORT', '8080')}"
)
# Register MCP tools
client.toolgroups.register(
toolgroup_id="mcp::docling",
provider_id="model-context-protocol",
mcp_endpoint=URL(uri="http://0.0.0.0:8000/sse"))
# Define an agent with MCP toolgroup
agent_config = AgentConfig(
model=model_id,
instructions="""You are a helpful assistant with access to tools to manipulate documents.
Always use the appropriate tool when asked to process documents.""",
toolgroups=["mcp::docling"],
tool_choice="auto",
max_tool_calls=3,
)
# Create the agent
agent = Agent(client, agent_config)
# Create a session
session_id = agent.create_session("test-session")
def _summary_and_qna(source: str):
# Define the prompt
run_turn(f"Please convert the document at {source} to markdown and summarize its content.")
run_turn(f"Please generate a Q&A document with 3 items for source at {source} and display it in YAML format.")
def _run_turn(prompt):
# Create a turn
response = agent.create_turn(
messages=[
{
"role": "user",
"content": prompt,
}
],
session_id=session_id,
)
# Log the response
for log in EventLogger().log(response):
log.print()
_summary_and_qna('https://arxiv.org/pdf/2004.07606')缓存
服务器将处理过的文档缓存在~/.cache/mcp-docling/中,以提高重复请求的性能。
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Appeared in Searches
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/zanetworker/mcp-docling'
If you have feedback or need assistance with the MCP directory API, please join our Discord server