paperlessngx-mcp
Allows interaction with a Paperless-ngx server, providing tools for searching, tagging, and organizing documents with privacy tiers, including metadata-only operations, local extraction of fields, and full document content retrieval.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@paperlessngx-mcpfind recent utility bills"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
paperless-mcp
A privacy-first Model Context Protocol (MCP) server for Paperless-ngx. It lets an LLM agent search, organize, tag, and reference your documents without the full text of those documents entering the model's context window — unless you explicitly ask for it.
Built with the official @modelcontextprotocol/sdk as a stdio server, TypeScript/ESM, Node 20+.
Why this exists: the three privacy tiers
The whole point of this server is to enforce a boundary between document metadata and document content. Most document management — "find my 2024 tax return", "tag this as an invoice", "what did the electric company send me" — needs metadata, not the OCR'd text of the document. Yet the naive Paperless API call (GET /api/documents/{id}/) returns the entire OCR content field by default, which would silently dump full document text into the model's context.
Every tool here falls into exactly one tier, and the tier dictates what data reaches the calling model:
Tier 1 — Metadata only (default)
Returns titles, dates, tags, correspondents, document types, custom fields, page counts, file sizes. Never returns OCR content. This covers the large majority of practical use cases. Search runs server-side inside Paperless and comes back as metadata.
Two mechanisms enforce this:
List/search calls pass Paperless's sparse fieldset (
fields=id,title,correspondent,...) so thecontentfield is never even serialized by the server.The single-document metadata tool calls the detail endpoint (which includes content) but explicitly deletes the
contentfield before returning.
Tier 2 — Local extraction (server-side processing)
The MCP server fetches content into its own process, runs a local parser (regex today; a local LLM in the future), and returns only the extracted value(s). The full text never leaves the server. Example: extract every dollar amount, or the "amount due" line, from an invoice — without the invoice body entering the model context.
This is the extensibility point: paperless_extract_field is designed so a future local-model extraction backend drops in behind the same contract (read content locally → parse → return only the field).
Tier 3 — Full content (explicit intent required)
Returns the full OCR text or the document binary to the calling model. These tools are clearly named and their descriptions carry a privacy_warning. Reserve them for when the user explicitly asks to read, summarize, or analyze a document's contents.
The tiers are deliberately not collapsed. A tool that searches returns metadata; a tool that fetches content is a separate, clearly-marked tool.
Related MCP server: KnowledgeMCP
Tools by tier
Tool | Tier | Returns |
| 1 | Matching docs as metadata (id, title, names, tags, dates, page_count) |
| 1 | One doc's full metadata (content stripped), incl. checksum, size, custom fields, notes |
| 1 | All tags: id, name, document_count |
| 1 | All correspondents: id, name, document_count |
| 1 | All document types: id, name, document_count |
| 1 | Totals, inbox count, type & tag breakdowns |
| 1 | Sets a document's tags (by name) |
| 1 | Sets a document's correspondent (by name) |
| 1 | Sets a document's type (by name) |
| 1 | Paperless's own server-side suggestions (no content to the model) |
| 2 | Only the field(s) extracted locally from content |
| 3 | Full OCR text ⚠ |
| 3 | Saves the binary PDF to disk, returns the path ⚠ |
All Tier 1 tools resolve IDs to human-readable names. There is exactly one tool that returns OCR text (paperless_get_document_content) and one that materializes the binary (paperless_download_document).
Configuration
All config is via environment variables.
Variable | Required | Default | Description |
| yes | — | e.g. |
| yes | — | Paperless API token ( |
| no |
| Set |
| no |
| Where Tier 3 downloads are written |
Get an API token in Paperless under Settings → My Profile → API Token.
Running locally (stdio)
npm install
npm run build
PAPERLESS_BASE_URL=http://localhost:8000 \
PAPERLESS_API_TOKEN=xxxxxxxx \
node dist/index.jsThe server speaks JSON-RPC over stdio; it is launched by your MCP client, not run as a standalone HTTP service.
Example MCP client config:
{
"mcpServers": {
"paperless": {
"command": "node",
"args": ["/path/to/paperless-mcp/dist/index.js"],
"env": {
"PAPERLESS_BASE_URL": "http://localhost:8000",
"PAPERLESS_API_TOKEN": "xxxxxxxx",
"PAPERLESS_VERIFY_SSL": "false"
}
}
}
}Deploying via Portainer + using with hermes
Because this is a stdio server (JSON-RPC over stdin/stdout, exits when stdin closes), it is not a long-lived HTTP service. Two patterns:
Pattern A — on-demand container (recommended). Build the image, then have hermes launch a fresh container per session:
docker run -i --rm --env-file .env paperless-mcp:latestThe -i is essential — it keeps stdin open for the JSON-RPC stream.
Pattern B — persistent container you exec into. Deploy the included docker-compose.yml as a Portainer Stack. It builds the image and keeps a container alive (via a sleep loop) so hermes can attach the MCP entrypoint on demand:
docker exec -i paperless-mcp node /app/dist/index.jsPortainer steps
In Portainer: Stacks → Add stack.
Paste the contents of
docker-compose.yml(or point it at this repo).Add the environment variables (
PAPERLESS_BASE_URL,PAPERLESS_API_TOKEN,PAPERLESS_VERIFY_SSL) in the stack's Environment variables section, or upload your.env.Deploy. If Paperless runs in the same Docker network, use its service name as the host (e.g.
http://paperless-ngx:8000).
Then point hermes' MCP configuration at whichever launch command matches your chosen pattern (docker run -i ... or docker exec -i ...).
Privacy guarantees & limits
Tier 1 list/search calls send
fields=so Paperless never serializescontent.paperless_get_document_metadatadeletescontentfrom the detail response in-process before returning.paperless_extract_fieldreads content only inside the server process and returns just the extracted value, with aprivacy_noteconfirming content was not returned.Only
paperless_get_document_contentandpaperless_download_documentsurface full content; both carry an explicitprivacy_warning.
This server controls what it returns. It cannot stop a client/agent from separately calling the Tier 3 tools — that's exactly why those tools are named and described to make the privacy cost obvious to the model and the user.
Extending Tier 2 extraction
src/tools.ts contains an EXTRACTORS registry of named, dependency-free regex extractors (dollar_amounts, dates, emails, phone_numbers, addresses, total_amount). extraction_pattern also accepts a raw regex.
To plug in a local LLM (the design goal): keep the same contract — fetch content into the process, run your local model, return only the requested field(s). Replace the body of paperless_extract_field's handler (or add a new named extractor that calls your local inference endpoint). The privacy boundary is preserved as long as only the extracted value is returned.
Project layout
paperless-mcp/
├── src/
│ ├── index.ts # entrypoint: stdio transport, tool registration, dispatch
│ ├── paperless-client.ts # REST client: auth, pagination, TTL cache, ID↔name resolution
│ └── tools.ts # tool defs + handlers, grouped by privacy tier
├── package.json
├── tsconfig.json
├── Dockerfile
├── docker-compose.yml # Portainer stack
├── .env.example
└── README.mdLicense
MIT
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Milli42/paperlessngx-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server