Which integrations are available for this server?

Enables LangChain agents to index and query local documents using the PageIndex tree-based RAG framework. Allows integration with OpenAI Agents SDK to index and query local documents via a local LLM endpoint.

How do I use pageindex-local-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@pageindex-local-mcp search my notes for machine learning topics" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

pageindex-local-mcp

by jamesbubenik

Overview Schema Related Servers Score Discussions

TypeScript

Local

pageindex-local-mcp

A local-first MCP (Model Context Protocol) server for PageIndex — the vectorless, reasoning-based RAG framework.

This server lets local AI agents (Claude Desktop, Cursor, Claude Code, Cline, Continue, OpenAI Agents SDK, LangChain, or any MCP-compatible client) index and query local PDF and Markdown documents through a self-hosted PageIndex installation, without requiring any PageIndex cloud API key.

Security Warning: This MCP server exposes local file indexing and tree-query capabilities to MCP clients. Only connect trusted clients. Review PAGEINDEX_ALLOWED_ROOTS before deploying in shared environments.

What This Project Does

Wraps a locally installed PageIndex repository and exposes its capabilities as MCP tools.
Indexes local PDF and Markdown files by calling run_pageindex.py from the PageIndex repo.
Builds and stores a hierarchical PageIndex tree structure for each document.
Performs vectorless, reasoning-based document search over those trees using a local OpenAI-compatible LLM endpoint (LM Studio, Ollama, vLLM, etc.).
Returns traceable results: document ID, node ID, title, summary, page/line range, reasoning path.
Maintains a local document registry with full metadata.

Related MCP server: Hoard

What This Project Does Not Do

Does not call https://api.pageindex.ai or any PageIndex cloud API.
Does not require a PAGEINDEX_API_KEY.
Does not use vector databases or embeddings.
Does not provide a web UI.
Does not perform cloud OCR. Local PDF parsing quality depends on your PageIndex installation and the underlying Python PDF library (PyPDF2). Complex scanned PDFs may parse poorly compared to the cloud pipeline.

How It Differs from the Official PageIndex MCP

Feature	Official pageindex-mcp	This project
Backend	PageIndex Cloud API	Local PageIndex repo
API key required	Yes	No
Runs locally	No	Yes
Vector DB	No (tree-based)	No (tree-based)
LLM for indexing	Cloud models	Configurable local/remote
LLM for querying	Cloud models	Local OpenAI-compatible endpoint
OCR quality	Cloud (best)	Local (depends on PageIndex/PyPDF2)

Prerequisites

Node.js 18+ (for this MCP server)
Python 3.9+ (for the PageIndex repo)
A local clone of VectifyAI/PageIndex with dependencies installed
A local OpenAI-compatible LLM endpoint (LM Studio, Ollama, vLLM) — required for both indexing (if PageIndex is configured to use it) and querying

1. Install the Local PageIndex Repository

git clone https://github.com/VectifyAI/PageIndex.git
cd PageIndex
pip install -r requirements.txt

PageIndex needs an LLM to generate tree structures. Configure it to use your local endpoint by editing pageindex/config.yaml:

model: local-model           # must match what your local server loads

Or set the model via the --model argument at indexing time.

Note: PageIndex's indexing currently calls LLM APIs. Point its config at your local endpoint (LM Studio, Ollama, vLLM) so no internet calls are made during indexing.

2. Install the MCP Server

git clone https://github.com/jamesbubenik/pageindex-local-mcp.git
cd pageindex-local-mcp
npm install
npm run build

3. Configure Environment Variables

Copy the example and edit:

cp examples/sample.env .env
# or: cp .env.example .env

Edit .env:

PAGEINDEX_REPO_PATH=/home/user/PageIndex
PAGEINDEX_PYTHON=python3
PAGEINDEX_WORKSPACE=/home/user/.pageindex-local-mcp
PAGEINDEX_LLM_BASE_URL=http://127.0.0.1:1234/v1
PAGEINDEX_LLM_API_KEY=lm-studio
PAGEINDEX_MODEL=local-model

All Configuration Options

Variable	Required	Default	Description
`PAGEINDEX_REPO_PATH`	Yes	—	Absolute path to cloned PageIndex repo
`PAGEINDEX_PYTHON`	No	`python3`	Python executable with PageIndex deps
`PAGEINDEX_WORKSPACE`	No	`~/.pageindex-local-mcp`	Where the MCP server stores artifacts
`PAGEINDEX_MODEL`	No	`local-model`	Default model name for indexing/querying
`PAGEINDEX_LLM_BASE_URL`	No	`http://127.0.0.1:1234/v1`	OpenAI-compatible endpoint for queries
`PAGEINDEX_LLM_API_KEY`	No	`local`	API key (any non-empty value for local servers)
`PAGEINDEX_LLM_TIMEOUT_MS`	No	`120000`	LLM request timeout (ms)
`PAGEINDEX_TOOL_TIMEOUT_MS`	No	`600000`	Max ms for a PageIndex Python subprocess. Raise for large PDFs or slow machines.
`PAGEINDEX_TOC_CHECK_PAGES`	No	`20`	Pages scanned for TOC (PDF only)
`PAGEINDEX_MAX_PAGES_PER_NODE`	No	`10`	Max pages per tree node (PDF only)
`PAGEINDEX_MAX_TOKENS_PER_NODE`	No	`20000`	Max tokens per tree node
`PAGEINDEX_ALLOWED_ROOTS`	No	`""` (all)	Semicolon (Win) or colon (Unix) separated allowed dirs
`PAGEINDEX_REGISTRY_BACKEND`	No	`json`	`json` (supported) or `sqlite` (future)
`PAGEINDEX_LOG_LEVEL`	No	`info`	`debug`, `info`, `warn`, `error`

4. Configure Your MCP Client

Claude Desktop

macOS/Linux — config file: ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or ~/.config/Claude/claude_desktop_config.json (Linux)

{
  "mcpServers": {
    "pageindex-local": {
      "command": "node",
      "args": ["/home/user/pageindex-local-mcp/dist/index.js"],
      "env": {
        "PAGEINDEX_REPO_PATH": "/home/user/PageIndex",
        "PAGEINDEX_PYTHON": "python3",
        "PAGEINDEX_WORKSPACE": "/home/user/.pageindex-local-mcp",
        "PAGEINDEX_LLM_BASE_URL": "http://127.0.0.1:1234/v1",
        "PAGEINDEX_LLM_API_KEY": "lm-studio",
        "PAGEINDEX_MODEL": "local-model",
        "PAGEINDEX_ALLOWED_ROOTS": "/home/user/Documents:/home/user/Downloads"
      }
    }
  }
}

Windows — config file: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "pageindex-local": {
      "command": "node",
      "args": ["C:\\Users\\user\\pageindex-local-mcp\\dist\\index.js"],
      "env": {
        "PAGEINDEX_REPO_PATH": "C:\\Users\\user\\PageIndex",
        "PAGEINDEX_PYTHON": "C:\\Users\\user\\miniconda3\\envs\\pageindex\\python.exe",
        "PAGEINDEX_WORKSPACE": "C:\\Users\\user\\.pageindex-local-mcp",
        "PAGEINDEX_LLM_BASE_URL": "http://127.0.0.1:1234/v1",
        "PAGEINDEX_LLM_API_KEY": "lm-studio",
        "PAGEINDEX_MODEL": "local-model",
        "PAGEINDEX_ALLOWED_ROOTS": "C:\\Users\\user\\Documents;C:\\Users\\user\\Downloads"
      }
    }
  }
}

Cursor

Add to .cursor/mcp.json in your project root:

{
  "mcpServers": {
    "pageindex-local": {
      "command": "node",
      "args": ["/home/user/pageindex-local-mcp/dist/index.js"],
      "env": {
        "PAGEINDEX_REPO_PATH": "/home/user/PageIndex",
        "PAGEINDEX_PYTHON": "python3",
        "PAGEINDEX_WORKSPACE": "/home/user/.pageindex-local-mcp",
        "PAGEINDEX_LLM_BASE_URL": "http://127.0.0.1:1234/v1",
        "PAGEINDEX_LLM_API_KEY": "lm-studio",
        "PAGEINDEX_MODEL": "local-model"
      }
    }
  }
}

Claude Code

Add to your project's .claude/settings.json under mcpServers, using the same format as Cursor above.

LM Studio (as MCP client)

LM Studio 0.3.17+ can act as an MCP host, meaning it can call this server's tools directly from its chat UI — no separate MCP client needed.

Note: This section is about using LM Studio as the MCP client. For using LM Studio as the LLM backend for indexing and querying, see Section 5 below.

Requirements:

LM Studio 0.3.17 or later
A tool-use-capable model loaded in LM Studio (e.g., Mistral Nemo Instruct, Qwen2.5 Instruct, LLaMA 3.1 Instruct, Gemma 3). Pure base models will not invoke tools reliably.

Step 1 — Edit mcp.json

Open LM Studio, switch to the Program tab in the right sidebar, then click Install → Edit mcp.json. This opens the config file in LM Studio's built-in editor.

The file lives at:

macOS / Linux: ~/.lmstudio/mcp.json
Windows: %USERPROFILE%\.lmstudio\mcp.json

Step 2 — Add the server

Paste the following, adjusting paths for your system:

macOS / Linux:

{
  "mcpServers": {
    "pageindex-local": {
      "command": "node",
      "args": ["/home/user/pageindex-local-mcp/dist/index.js"],
      "timeout": 600,
      "env": {
        "PAGEINDEX_REPO_PATH": "/home/user/PageIndex",
        "PAGEINDEX_PYTHON": "python3",
        "PAGEINDEX_WORKSPACE": "/home/user/.pageindex-local-mcp",
        "PAGEINDEX_LLM_BASE_URL": "http://127.0.0.1:1234/v1",
        "PAGEINDEX_LLM_API_KEY": "lm-studio",
        "PAGEINDEX_MODEL": "your-loaded-model-name",
        "PAGEINDEX_TOOL_TIMEOUT_MS": "600000",
        "PAGEINDEX_LOG_LEVEL": "info"
      }
    }
  }
}

Windows:

{
  "mcpServers": {
    "pageindex-local": {
      "command": "node",
      "args": ["C:\\Users\\user\\pageindex-local-mcp\\dist\\index.js"],
      "timeout": 600,
      "env": {
        "PAGEINDEX_REPO_PATH": "C:\\Users\\user\\PageIndex",
        "PAGEINDEX_PYTHON": "C:\\Users\\user\\miniconda3\\envs\\pageindex\\python.exe",
        "PAGEINDEX_WORKSPACE": "C:\\Users\\user\\.pageindex-local-mcp",
        "PAGEINDEX_LLM_BASE_URL": "http://127.0.0.1:1234/v1",
        "PAGEINDEX_LLM_API_KEY": "lm-studio",
        "PAGEINDEX_MODEL": "your-loaded-model-name",
        "PAGEINDEX_TOOL_TIMEOUT_MS": "600000",
        "PAGEINDEX_LOG_LEVEL": "info"
      }
    }
  }
}

Set PAGEINDEX_MODEL to the exact model name shown in LM Studio's server status bar (e.g., mistral-nemo-instruct-2407). Save the file — LM Studio picks up changes immediately.

Timeout configuration — required for large PDFs

Indexing a PDF can take several minutes because PageIndex makes multiple LLM calls. LM Studio's default MCP request timeout is 60 seconds, which is not long enough. You must set two values or you will see MCP error -32001: Request timed out:

Setting	Where	What it does
`"timeout": 600`	`mcp.json` server entry	Tells LM Studio to wait up to 600 seconds (10 min) for a tool response
`PAGEINDEX_TOOL_TIMEOUT_MS=600000`	`env` block or `.env`	Tells the server how long to let the Python subprocess run before killing it

Both values are already included in the example configs above. Make sure they are present in your actual mcp.json — LM Studio does not have a default that is long enough.

The server also sends heartbeat notifications every 5 seconds while indexing or searching. Clients that support resetTimeoutOnProgress (Claude Desktop, Cursor, Claude Code) will reset their timer on each one. LM Studio will additionally receive supplemental log notifications that may reset its connection timer depending on version.

Step 3 — Enable tool use

Go to App Settings → Tools & Integrations and ensure tool calling is enabled. You can allow individual tools once or permanently when the confirmation dialog appears.

Step 4 — Start the LM Studio local server

The MCP server's query engine calls LM Studio's OpenAI-compatible endpoint (http://127.0.0.1:1234/v1) to reason over document trees. Make sure the local server is running: Developer tab → Start Server (default port 1234).

Step 5 — Chat with your documents

Load a tool-capable model, open a new chat, and ask naturally:

Index the file at /home/user/Documents/research-paper.pdf

Search my indexed documents for information about climate feedback loops

List all my indexed documents

When the model decides to call a tool, LM Studio will show a confirmation dialog with the tool name and arguments. Review and approve. Results are returned inline in the chat.

Tip: Run pageindex_local_health first to confirm the server, PageIndex repo, and Python environment are all reachable before attempting to index.

5. LM Studio Setup

Download and install LM Studio.
Load a model (e.g., Mistral 7B Instruct, LLaMA 3, Qwen 2.5).
Start the local server: Server tab → Start Server (default port 1234).

Set:

PAGEINDEX_LLM_BASE_URL=http://127.0.0.1:1234/v1
PAGEINDEX_LLM_API_KEY=lm-studio
PAGEINDEX_MODEL=<model-name-from-lm-studio>

Ollama Setup

ollama serve
ollama pull llama3

PAGEINDEX_LLM_BASE_URL=http://127.0.0.1:11434/v1
PAGEINDEX_LLM_API_KEY=ollama
PAGEINDEX_MODEL=llama3

6. Using the MCP Tools

Check Health

pageindex_local_health

Verifies the PageIndex repo, Python, workspace, and LLM config. Run this first.

Index a PDF

{
  "tool": "pageindex_local_index_document",
  "arguments": {
    "path": "/home/user/Documents/research-paper.pdf",
    "addNodeSummary": true,
    "addNodeId": true,
    "addDocDescription": true
  }
}

Index with node text (larger output, enables source text in search results):

{
  "path": "/home/user/Documents/research-paper.pdf",
  "addNodeText": true
}

Index a Markdown File

{
  "tool": "pageindex_local_index_document",
  "arguments": {
    "path": "/home/user/notes/project-spec.md"
  }
}

List Indexed Documents

{
  "tool": "pageindex_local_list_documents",
  "arguments": { "status": "indexed", "limit": 20 }
}

Get Tree Structure

{
  "tool": "pageindex_local_get_tree",
  "arguments": {
    "documentId": "550e8400-e29b-41d4-a716-446655440000",
    "maxDepth": 3
  }
}

Query (Vectorless Search)

{
  "tool": "pageindex_local_search",
  "arguments": {
    "query": "What are the main conclusions about climate change?",
    "maxResults": 5,
    "includeReasoningPath": true
  }
}

Search across specific documents:

{
  "query": "What is the recommended dosage?",
  "documentIds": ["doc-id-1", "doc-id-2"],
  "includeSourceText": true
}

Remove a Document

{
  "tool": "pageindex_local_remove_document",
  "arguments": {
    "documentId": "550e8400-e29b-41d4-a716-446655440000",
    "deleteFiles": true
  }
}

Re-index a Document

{
  "tool": "pageindex_local_reindex_document",
  "arguments": {
    "documentId": "550e8400-e29b-41d4-a716-446655440000",
    "addNodeText": true
  }
}

7. Workspace Layout

The server stores all artifacts under PAGEINDEX_WORKSPACE:

~/.pageindex-local-mcp/
  registry.json                       ← document registry
  documents/
    <document-id>/
      original/
        source.pdf                    ← copy of original file
      index/
        tree.json                     ← PageIndex tree structure
        metadata.json                 ← indexing metadata
        stdout.log                    ← PageIndex stdout
        stderr.log                    ← PageIndex stderr
      queries/
        <query-id>.json               ← query results (future)

8. Development and Testing

# Type-check only
npm run typecheck

# Run tests
npm test

# Run smoke tests (requires configured .env and PageIndex repo)
npm run smoke:health
npm run smoke:index -- /absolute/path/to/document.pdf
npm run smoke:list
npm run smoke:query -- "What is this document about?"

# Dev mode (runs from TypeScript source, no build needed)
npm run dev

9. Troubleshooting

run_pageindex.py not found Verify PAGEINDEX_REPO_PATH points to the root of the cloned PageIndex repository and that run_pageindex.py exists there.

Python import errors during indexing Make sure the PageIndex Python dependencies are installed in the Python environment pointed to by PAGEINDEX_PYTHON:

pip install -r /path/to/PageIndex/requirements.txt

Tree file not found after indexing PageIndex saves output to <PAGEINDEX_REPO_PATH>/results/<filename>_structure.json. If your version saves elsewhere, check stdout.log in the document workspace for the actual output path and open an issue.

LLM connection failed during search Verify your local LLM server is running and that PAGEINDEX_LLM_BASE_URL is correct. Test manually:

curl http://127.0.0.1:1234/v1/models

File outside allowed roots Add the file's parent directory to PAGEINDEX_ALLOWED_ROOTS in your environment config.

Low-quality indexing results on scanned PDFs PageIndex uses PyPDF2 for local PDF parsing, which does not perform OCR. Scanned PDFs without embedded text will produce poor results. For scanned documents, consider pre-processing with an OCR tool or using the PageIndex cloud service.

MCP error -32001: Request timed out in LM Studio (or other clients)

The timeout is enforced by the MCP client, not this server. LM Studio's default is 60 seconds — not long enough for PDF indexing.

Checklist (do all three):

"timeout": 600 must be present in your mcp.json under the server entry. This raises LM Studio's per-request timeout to 10 minutes. Without this field, LM Studio uses 60 seconds regardless of how fast the server is.
PAGEINDEX_TOOL_TIMEOUT_MS=600000 in the env block (or .env) — keeps the server-side Python subprocess limit in sync.
Restart LM Studio after editing mcp.json — changes are not always picked up without a restart.

The server sends heartbeat notifications every 5 seconds (progress + log) while indexing and searching. If you are still seeing -32001 after adding "timeout": 600, set PAGEINDEX_LOG_LEVEL=debug and check the stderr output to confirm whether hasProgressToken: true appears — if it does, LM Studio is sending progress tokens and the heartbeats are active. If hasProgressToken: false, the heartbeats are log-only and you must rely on the "timeout" field.

MCP server logs All logs go to stderr (not stdout, which is reserved for the MCP protocol). Check your MCP client's stderr console or increase log level:

PAGEINDEX_LOG_LEVEL=debug

10. Security Notes

PAGEINDEX_ALLOWED_ROOTS: When set, only files within these directories can be indexed. Always configure this in shared or multi-user environments.
No shell interpolation: All Python subprocess calls use argument arrays (shell: false). Path arguments are never interpolated into shell strings.
No cloud calls: This server never contacts api.pageindex.ai, chat.pageindex.ai, or any PageIndex cloud endpoint.
Secrets: Never place API keys in document paths or document IDs. All config comes from environment variables.
Trusted clients only: The MCP protocol grants tool invocation to any connected client. Run this server only in trusted local environments.

11. Known Limitations

SQLite registry backend: The sqlite option for PAGEINDEX_REGISTRY_BACKEND is planned but not yet implemented. Use the default json backend.
Concurrent indexing: Only one indexing job should run per server instance at a time. Concurrent calls are not prevented but may produce race conditions in the registry.
Source text extraction: Full source text in search results (includeSourceText: true) only works when the document was indexed with addNodeText: true. Otherwise, results include node summaries only.
Markdown line references: PageIndex uses line numbers (not pages) for Markdown files. Search results will show line ranges instead of page numbers.
Large documents: Indexing very large PDFs may exceed LLM context windows. Adjust maxPagesPerNode and maxTokensPerNode to reduce node size.
Model compatibility: The query engine uses a simple JSON-structured prompt. Some smaller local models may not reliably output valid JSON. Use instruction-tuned models (Mistral Instruct, LLaMA Instruct, Qwen Instruct, etc.).

12. Using with an AI Agent

AGENT_SYSTEM_PROMPT.md contains a ready-to-use system prompt for any AI agent that will drive this MCP server. It covers all 8 tools, every parameter and response field, typical workflows, error handling, and usage constraints.

How to use it:

Copy the full contents of AGENT_SYSTEM_PROMPT.md.
Paste it into your agent's system prompt (or include it as a context file if your framework supports file injection).
The agent will know how to index documents, search them, handle failures, and avoid common mistakes — without needing further instruction.

This is useful when building automated pipelines, custom agents, or assistants that need to interact with local documents through this server.

MCP Tools Reference

Tool	Description
`pageindex_local_health`	Check configuration and connectivity
`pageindex_local_index_document`	Index a local PDF or Markdown file
`pageindex_local_list_documents`	List all registered documents
`pageindex_local_get_document`	Get full metadata for one document
`pageindex_local_get_tree`	Retrieve the PageIndex tree structure
`pageindex_local_search`	Vectorless reasoning-based search
`pageindex_local_remove_document`	Remove a document from the registry
`pageindex_local_reindex_document`	Re-run indexing for an existing document

License

MIT

Install Server

license - not found

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Tools

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jamesbubenik/pageindex-local-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server