pageindex-local-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@pageindex-local-mcpsearch my notes for machine learning topics"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
pageindex-local-mcp
A local-first MCP (Model Context Protocol) server for PageIndex — the vectorless, reasoning-based RAG framework.
This server lets local AI agents (Claude Desktop, Cursor, Claude Code, Cline, Continue, OpenAI Agents SDK, LangChain, or any MCP-compatible client) index and query local PDF and Markdown documents through a self-hosted PageIndex installation, without requiring any PageIndex cloud API key.
Security Warning: This MCP server exposes local file indexing and tree-query capabilities to MCP clients. Only connect trusted clients. Review
PAGEINDEX_ALLOWED_ROOTSbefore deploying in shared environments.
What This Project Does
Wraps a locally installed PageIndex repository and exposes its capabilities as MCP tools.
Indexes local PDF and Markdown files by calling
run_pageindex.pyfrom the PageIndex repo.Builds and stores a hierarchical PageIndex tree structure for each document.
Performs vectorless, reasoning-based document search over those trees using a local OpenAI-compatible LLM endpoint (LM Studio, Ollama, vLLM, etc.).
Returns traceable results: document ID, node ID, title, summary, page/line range, reasoning path.
Maintains a local document registry with full metadata.
What This Project Does Not Do
Does not call
https://api.pageindex.aior any PageIndex cloud API.Does not require a
PAGEINDEX_API_KEY.Does not use vector databases or embeddings.
Does not provide a web UI.
Does not perform cloud OCR. Local PDF parsing quality depends on your PageIndex installation and the underlying Python PDF library (PyPDF2). Complex scanned PDFs may parse poorly compared to the cloud pipeline.
How It Differs from the Official PageIndex MCP
Feature | Official pageindex-mcp | This project |
Backend | PageIndex Cloud API | Local PageIndex repo |
API key required | Yes | No |
Runs locally | No | Yes |
Vector DB | No (tree-based) | No (tree-based) |
LLM for indexing | Cloud models | Configurable local/remote |
LLM for querying | Cloud models | Local OpenAI-compatible endpoint |
OCR quality | Cloud (best) | Local (depends on PageIndex/PyPDF2) |
Prerequisites
Node.js 18+ (for this MCP server)
Python 3.9+ (for the PageIndex repo)
A local clone of VectifyAI/PageIndex with dependencies installed
A local OpenAI-compatible LLM endpoint (LM Studio, Ollama, vLLM) — required for both indexing (if PageIndex is configured to use it) and querying
1. Install the Local PageIndex Repository
git clone https://github.com/VectifyAI/PageIndex.git
cd PageIndex
pip install -r requirements.txtPageIndex needs an LLM to generate tree structures. Configure it to use your local endpoint by editing pageindex/config.yaml:
model: local-model # must match what your local server loadsOr set the model via the --model argument at indexing time.
Note: PageIndex's indexing currently calls LLM APIs. Point its config at your local endpoint (LM Studio, Ollama, vLLM) so no internet calls are made during indexing.
2. Install the MCP Server
git clone https://github.com/jamesbubenik/pageindex-local-mcp.git
cd pageindex-local-mcp
npm install
npm run build3. Configure Environment Variables
Copy the example and edit:
cp examples/sample.env .env
# or: cp .env.example .envEdit .env:
PAGEINDEX_REPO_PATH=/home/user/PageIndex
PAGEINDEX_PYTHON=python3
PAGEINDEX_WORKSPACE=/home/user/.pageindex-local-mcp
PAGEINDEX_LLM_BASE_URL=http://127.0.0.1:1234/v1
PAGEINDEX_LLM_API_KEY=lm-studio
PAGEINDEX_MODEL=local-modelAll Configuration Options
Variable | Required | Default | Description |
| Yes | — | Absolute path to cloned PageIndex repo |
| No |
| Python executable with PageIndex deps |
| No |
| Where the MCP server stores artifacts |
| No |
| Default model name for indexing/querying |
| No |
| OpenAI-compatible endpoint for queries |
| No |
| API key (any non-empty value for local servers) |
| No |
| LLM request timeout (ms) |
| No |
| Max ms for a PageIndex Python subprocess. Raise for large PDFs or slow machines. |
| No |
| Pages scanned for TOC (PDF only) |
| No |
| Max pages per tree node (PDF only) |
| No |
| Max tokens per tree node |
| No |
| Semicolon (Win) or colon (Unix) separated allowed dirs |
| No |
|
|
| No |
|
|
4. Configure Your MCP Client
Claude Desktop
macOS/Linux — config file: ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or ~/.config/Claude/claude_desktop_config.json (Linux)
{
"mcpServers": {
"pageindex-local": {
"command": "node",
"args": ["/home/user/pageindex-local-mcp/dist/index.js"],
"env": {
"PAGEINDEX_REPO_PATH": "/home/user/PageIndex",
"PAGEINDEX_PYTHON": "python3",
"PAGEINDEX_WORKSPACE": "/home/user/.pageindex-local-mcp",
"PAGEINDEX_LLM_BASE_URL": "http://127.0.0.1:1234/v1",
"PAGEINDEX_LLM_API_KEY": "lm-studio",
"PAGEINDEX_MODEL": "local-model",
"PAGEINDEX_ALLOWED_ROOTS": "/home/user/Documents:/home/user/Downloads"
}
}
}
}Windows — config file: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"pageindex-local": {
"command": "node",
"args": ["C:\\Users\\user\\pageindex-local-mcp\\dist\\index.js"],
"env": {
"PAGEINDEX_REPO_PATH": "C:\\Users\\user\\PageIndex",
"PAGEINDEX_PYTHON": "C:\\Users\\user\\miniconda3\\envs\\pageindex\\python.exe",
"PAGEINDEX_WORKSPACE": "C:\\Users\\user\\.pageindex-local-mcp",
"PAGEINDEX_LLM_BASE_URL": "http://127.0.0.1:1234/v1",
"PAGEINDEX_LLM_API_KEY": "lm-studio",
"PAGEINDEX_MODEL": "local-model",
"PAGEINDEX_ALLOWED_ROOTS": "C:\\Users\\user\\Documents;C:\\Users\\user\\Downloads"
}
}
}
}Cursor
Add to .cursor/mcp.json in your project root:
{
"mcpServers": {
"pageindex-local": {
"command": "node",
"args": ["/home/user/pageindex-local-mcp/dist/index.js"],
"env": {
"PAGEINDEX_REPO_PATH": "/home/user/PageIndex",
"PAGEINDEX_PYTHON": "python3",
"PAGEINDEX_WORKSPACE": "/home/user/.pageindex-local-mcp",
"PAGEINDEX_LLM_BASE_URL": "http://127.0.0.1:1234/v1",
"PAGEINDEX_LLM_API_KEY": "lm-studio",
"PAGEINDEX_MODEL": "local-model"
}
}
}
}Claude Code
Add to your project's .claude/settings.json under mcpServers, using the same format as Cursor above.
LM Studio (as MCP client)
LM Studio 0.3.17+ can act as an MCP host, meaning it can call this server's tools directly from its chat UI — no separate MCP client needed.
Note: This section is about using LM Studio as the MCP client. For using LM Studio as the LLM backend for indexing and querying, see Section 5 below.
Requirements:
LM Studio 0.3.17 or later
A tool-use-capable model loaded in LM Studio (e.g., Mistral Nemo Instruct, Qwen2.5 Instruct, LLaMA 3.1 Instruct, Gemma 3). Pure base models will not invoke tools reliably.
Step 1 — Edit mcp.json
Open LM Studio, switch to the Program tab in the right sidebar, then click Install → Edit mcp.json. This opens the config file in LM Studio's built-in editor.
The file lives at:
macOS / Linux:
~/.lmstudio/mcp.jsonWindows:
%USERPROFILE%\.lmstudio\mcp.json
Step 2 — Add the server
Paste the following, adjusting paths for your system:
macOS / Linux:
{
"mcpServers": {
"pageindex-local": {
"command": "node",
"args": ["/home/user/pageindex-local-mcp/dist/index.js"],
"timeout": 600,
"env": {
"PAGEINDEX_REPO_PATH": "/home/user/PageIndex",
"PAGEINDEX_PYTHON": "python3",
"PAGEINDEX_WORKSPACE": "/home/user/.pageindex-local-mcp",
"PAGEINDEX_LLM_BASE_URL": "http://127.0.0.1:1234/v1",
"PAGEINDEX_LLM_API_KEY": "lm-studio",
"PAGEINDEX_MODEL": "your-loaded-model-name",
"PAGEINDEX_TOOL_TIMEOUT_MS": "600000",
"PAGEINDEX_LOG_LEVEL": "info"
}
}
}
}Windows:
{
"mcpServers": {
"pageindex-local": {
"command": "node",
"args": ["C:\\Users\\user\\pageindex-local-mcp\\dist\\index.js"],
"timeout": 600,
"env": {
"PAGEINDEX_REPO_PATH": "C:\\Users\\user\\PageIndex",
"PAGEINDEX_PYTHON": "C:\\Users\\user\\miniconda3\\envs\\pageindex\\python.exe",
"PAGEINDEX_WORKSPACE": "C:\\Users\\user\\.pageindex-local-mcp",
"PAGEINDEX_LLM_BASE_URL": "http://127.0.0.1:1234/v1",
"PAGEINDEX_LLM_API_KEY": "lm-studio",
"PAGEINDEX_MODEL": "your-loaded-model-name",
"PAGEINDEX_TOOL_TIMEOUT_MS": "600000",
"PAGEINDEX_LOG_LEVEL": "info"
}
}
}
}Set PAGEINDEX_MODEL to the exact model name shown in LM Studio's server status bar (e.g., mistral-nemo-instruct-2407). Save the file — LM Studio picks up changes immediately.
Timeout configuration — required for large PDFs
Indexing a PDF can take several minutes because PageIndex makes multiple LLM calls. LM Studio's default MCP request timeout is 60 seconds, which is not long enough. You must set two values or you will see MCP error -32001: Request timed out:
Setting | Where | What it does |
|
| Tells LM Studio to wait up to 600 seconds (10 min) for a tool response |
|
| Tells the server how long to let the Python subprocess run before killing it |
Both values are already included in the example configs above. Make sure they are present in your actual mcp.json — LM Studio does not have a default that is long enough.
The server also sends heartbeat notifications every 5 seconds while indexing or searching. Clients that support resetTimeoutOnProgress (Claude Desktop, Cursor, Claude Code) will reset their timer on each one. LM Studio will additionally receive supplemental log notifications that may reset its connection timer depending on version.
Step 3 — Enable tool use
Go to App Settings → Tools & Integrations and ensure tool calling is enabled. You can allow individual tools once or permanently when the confirmation dialog appears.
Step 4 — Start the LM Studio local server
The MCP server's query engine calls LM Studio's OpenAI-compatible endpoint (http://127.0.0.1:1234/v1) to reason over document trees. Make sure the local server is running: Developer tab → Start Server (default port 1234).
Step 5 — Chat with your documents
Load a tool-capable model, open a new chat, and ask naturally:
Index the file at /home/user/Documents/research-paper.pdfSearch my indexed documents for information about climate feedback loopsList all my indexed documentsWhen the model decides to call a tool, LM Studio will show a confirmation dialog with the tool name and arguments. Review and approve. Results are returned inline in the chat.
Tip: Run
pageindex_local_healthfirst to confirm the server, PageIndex repo, and Python environment are all reachable before attempting to index.
5. LM Studio Setup
Download and install LM Studio.
Load a model (e.g., Mistral 7B Instruct, LLaMA 3, Qwen 2.5).
Start the local server: Server tab → Start Server (default port 1234).
Set:
PAGEINDEX_LLM_BASE_URL=http://127.0.0.1:1234/v1 PAGEINDEX_LLM_API_KEY=lm-studio PAGEINDEX_MODEL=<model-name-from-lm-studio>
Ollama Setup
ollama serve
ollama pull llama3PAGEINDEX_LLM_BASE_URL=http://127.0.0.1:11434/v1
PAGEINDEX_LLM_API_KEY=ollama
PAGEINDEX_MODEL=llama36. Using the MCP Tools
Check Health
pageindex_local_healthVerifies the PageIndex repo, Python, workspace, and LLM config. Run this first.
Index a PDF
{
"tool": "pageindex_local_index_document",
"arguments": {
"path": "/home/user/Documents/research-paper.pdf",
"addNodeSummary": true,
"addNodeId": true,
"addDocDescription": true
}
}Index with node text (larger output, enables source text in search results):
{
"path": "/home/user/Documents/research-paper.pdf",
"addNodeText": true
}Index a Markdown File
{
"tool": "pageindex_local_index_document",
"arguments": {
"path": "/home/user/notes/project-spec.md"
}
}List Indexed Documents
{
"tool": "pageindex_local_list_documents",
"arguments": { "status": "indexed", "limit": 20 }
}Get Tree Structure
{
"tool": "pageindex_local_get_tree",
"arguments": {
"documentId": "550e8400-e29b-41d4-a716-446655440000",
"maxDepth": 3
}
}Query (Vectorless Search)
{
"tool": "pageindex_local_search",
"arguments": {
"query": "What are the main conclusions about climate change?",
"maxResults": 5,
"includeReasoningPath": true
}
}Search across specific documents:
{
"query": "What is the recommended dosage?",
"documentIds": ["doc-id-1", "doc-id-2"],
"includeSourceText": true
}Remove a Document
{
"tool": "pageindex_local_remove_document",
"arguments": {
"documentId": "550e8400-e29b-41d4-a716-446655440000",
"deleteFiles": true
}
}Re-index a Document
{
"tool": "pageindex_local_reindex_document",
"arguments": {
"documentId": "550e8400-e29b-41d4-a716-446655440000",
"addNodeText": true
}
}7. Workspace Layout
The server stores all artifacts under PAGEINDEX_WORKSPACE:
~/.pageindex-local-mcp/
registry.json ← document registry
documents/
<document-id>/
original/
source.pdf ← copy of original file
index/
tree.json ← PageIndex tree structure
metadata.json ← indexing metadata
stdout.log ← PageIndex stdout
stderr.log ← PageIndex stderr
queries/
<query-id>.json ← query results (future)8. Development and Testing
# Type-check only
npm run typecheck
# Run tests
npm test
# Run smoke tests (requires configured .env and PageIndex repo)
npm run smoke:health
npm run smoke:index -- /absolute/path/to/document.pdf
npm run smoke:list
npm run smoke:query -- "What is this document about?"
# Dev mode (runs from TypeScript source, no build needed)
npm run dev9. Troubleshooting
run_pageindex.py not found
Verify PAGEINDEX_REPO_PATH points to the root of the cloned PageIndex repository and that run_pageindex.py exists there.
Python import errors during indexing
Make sure the PageIndex Python dependencies are installed in the Python environment pointed to by PAGEINDEX_PYTHON:
pip install -r /path/to/PageIndex/requirements.txtTree file not found after indexing
PageIndex saves output to <PAGEINDEX_REPO_PATH>/results/<filename>_structure.json. If your version saves elsewhere, check stdout.log in the document workspace for the actual output path and open an issue.
LLM connection failed during search
Verify your local LLM server is running and that PAGEINDEX_LLM_BASE_URL is correct. Test manually:
curl http://127.0.0.1:1234/v1/modelsFile outside allowed roots
Add the file's parent directory to PAGEINDEX_ALLOWED_ROOTS in your environment config.
Low-quality indexing results on scanned PDFs PageIndex uses PyPDF2 for local PDF parsing, which does not perform OCR. Scanned PDFs without embedded text will produce poor results. For scanned documents, consider pre-processing with an OCR tool or using the PageIndex cloud service.
MCP error -32001: Request timed out in LM Studio (or other clients)
The timeout is enforced by the MCP client, not this server. LM Studio's default is 60 seconds — not long enough for PDF indexing.
Checklist (do all three):
"timeout": 600must be present in yourmcp.jsonunder the server entry. This raises LM Studio's per-request timeout to 10 minutes. Without this field, LM Studio uses 60 seconds regardless of how fast the server is.PAGEINDEX_TOOL_TIMEOUT_MS=600000in theenvblock (or.env) — keeps the server-side Python subprocess limit in sync.Restart LM Studio after editing
mcp.json— changes are not always picked up without a restart.
The server sends heartbeat notifications every 5 seconds (progress + log) while indexing and searching. If you are still seeing -32001 after adding "timeout": 600, set PAGEINDEX_LOG_LEVEL=debug and check the stderr output to confirm whether hasProgressToken: true appears — if it does, LM Studio is sending progress tokens and the heartbeats are active. If hasProgressToken: false, the heartbeats are log-only and you must rely on the "timeout" field.
MCP server logs All logs go to stderr (not stdout, which is reserved for the MCP protocol). Check your MCP client's stderr console or increase log level:
PAGEINDEX_LOG_LEVEL=debug10. Security Notes
PAGEINDEX_ALLOWED_ROOTS: When set, only files within these directories can be indexed. Always configure this in shared or multi-user environments.No shell interpolation: All Python subprocess calls use argument arrays (
shell: false). Path arguments are never interpolated into shell strings.No cloud calls: This server never contacts
api.pageindex.ai,chat.pageindex.ai, or any PageIndex cloud endpoint.Secrets: Never place API keys in document paths or document IDs. All config comes from environment variables.
Trusted clients only: The MCP protocol grants tool invocation to any connected client. Run this server only in trusted local environments.
11. Known Limitations
SQLite registry backend: The
sqliteoption forPAGEINDEX_REGISTRY_BACKENDis planned but not yet implemented. Use the defaultjsonbackend.Concurrent indexing: Only one indexing job should run per server instance at a time. Concurrent calls are not prevented but may produce race conditions in the registry.
Source text extraction: Full source text in search results (
includeSourceText: true) only works when the document was indexed withaddNodeText: true. Otherwise, results include node summaries only.Markdown line references: PageIndex uses line numbers (not pages) for Markdown files. Search results will show line ranges instead of page numbers.
Large documents: Indexing very large PDFs may exceed LLM context windows. Adjust
maxPagesPerNodeandmaxTokensPerNodeto reduce node size.Model compatibility: The query engine uses a simple JSON-structured prompt. Some smaller local models may not reliably output valid JSON. Use instruction-tuned models (Mistral Instruct, LLaMA Instruct, Qwen Instruct, etc.).
12. Using with an AI Agent
AGENT_SYSTEM_PROMPT.md contains a ready-to-use system prompt for any AI agent that will drive this MCP server. It covers all 8 tools, every parameter and response field, typical workflows, error handling, and usage constraints.
How to use it:
Copy the full contents of
AGENT_SYSTEM_PROMPT.md.Paste it into your agent's system prompt (or include it as a context file if your framework supports file injection).
The agent will know how to index documents, search them, handle failures, and avoid common mistakes — without needing further instruction.
This is useful when building automated pipelines, custom agents, or assistants that need to interact with local documents through this server.
MCP Tools Reference
Tool | Description |
| Check configuration and connectivity |
| Index a local PDF or Markdown file |
| List all registered documents |
| Get full metadata for one document |
| Retrieve the PageIndex tree structure |
| Vectorless reasoning-based search |
| Remove a document from the registry |
| Re-run indexing for an existing document |
License
MIT
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/jamesbubenik/pageindex-local-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server