Which integrations are available for this server?

Allows interaction with a Paperless-ngx server, providing tools for searching, tagging, and organizing documents with privacy tiers, including metadata-only operations, local extraction of fields, and full document content retrieval.

How do I use paperlessngx-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@paperlessngx-mcp find recent utility bills" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

paperlessngx-mcp

by Milli42

Overview Schema Related Servers Score Discussions

TypeScript

Remote

paperless-mcp

A privacy-first Model Context Protocol (MCP) server for Paperless-ngx. It lets an LLM agent search, organize, tag, and reference your documents without the full text of those documents entering the model's context window — unless you explicitly ask for it.

Built with the official @modelcontextprotocol/sdk as a stdio server, TypeScript/ESM, Node 20+.

Why this exists: the three privacy tiers

The whole point of this server is to enforce a boundary between document metadata and document content. Most document management — "find my 2024 tax return", "tag this as an invoice", "what did the electric company send me" — needs metadata, not the OCR'd text of the document. Yet the naive Paperless API call (GET /api/documents/{id}/) returns the entire OCR content field by default, which would silently dump full document text into the model's context.

Every tool here falls into exactly one tier, and the tier dictates what data reaches the calling model:

Tier 1 — Metadata only (default)

Returns titles, dates, tags, correspondents, document types, custom fields, page counts, file sizes. Never returns OCR content. This covers the large majority of practical use cases. Search runs server-side inside Paperless and comes back as metadata.

Two mechanisms enforce this:

List/search calls pass Paperless's sparse fieldset (fields=id,title,correspondent,...) so the content field is never even serialized by the server.
The single-document metadata tool calls the detail endpoint (which includes content) but explicitly deletes the content field before returning.

Tier 2 — Local extraction (server-side processing)

The MCP server fetches content into its own process, runs a local parser (regex today; a local LLM in the future), and returns only the extracted value(s). The full text never leaves the server. Example: extract every dollar amount, or the "amount due" line, from an invoice — without the invoice body entering the model context.

This is the extensibility point: paperless_extract_field is designed so a future local-model extraction backend drops in behind the same contract (read content locally → parse → return only the field).

Tier 3 — Full content (explicit intent required)

Returns the full OCR text or the document binary to the calling model. These tools are clearly named and their descriptions carry a privacy_warning. Reserve them for when the user explicitly asks to read, summarize, or analyze a document's contents.

The tiers are deliberately not collapsed. A tool that searches returns metadata; a tool that fetches content is a separate, clearly-marked tool.

Related MCP server: KnowledgeMCP

Tools by tier

Tool	Tier	Returns
`paperless_search_documents`	1	Matching docs as metadata (id, title, names, tags, dates, page_count)
`paperless_get_document_metadata`	1	One doc's full metadata (content stripped), incl. checksum, size, custom fields, notes
`paperless_list_tags`	1	All tags: id, name, document_count
`paperless_list_correspondents`	1	All correspondents: id, name, document_count
`paperless_list_document_types`	1	All document types: id, name, document_count
`paperless_get_statistics`	1	Totals, inbox count, type & tag breakdowns
`paperless_tag_document`	1	Sets a document's tags (by name)
`paperless_set_correspondent`	1	Sets a document's correspondent (by name)
`paperless_set_document_type`	1	Sets a document's type (by name)
`paperless_suggest_tags`	1	Paperless's own server-side suggestions (no content to the model)
`paperless_extract_field`	2	Only the field(s) extracted locally from content
`paperless_get_document_content`	3	Full OCR text ⚠
`paperless_download_document`	3	Saves the binary PDF to disk, returns the path ⚠

All Tier 1 tools resolve IDs to human-readable names. There is exactly one tool that returns OCR text (paperless_get_document_content) and one that materializes the binary (paperless_download_document).

Configuration

All config is via environment variables.

Variable	Required	Default	Description
`PAPERLESS_BASE_URL`	yes	—	e.g. `http://paperless-ngx:8000`
`PAPERLESS_API_TOKEN`	yes	—	Paperless API token (`Authorization: Token <...>`)
`PAPERLESS_VERIFY_SSL`	no	`true`	Set `false` for self-signed / plain-http local instances
`PAPERLESS_DOWNLOAD_DIR`	no	`<tmp>/paperless-mcp-downloads`	Where Tier 3 downloads are written

Get an API token in Paperless under Settings → My Profile → API Token.

Running locally (stdio)

npm install
npm run build
PAPERLESS_BASE_URL=http://localhost:8000 \
PAPERLESS_API_TOKEN=xxxxxxxx \
node dist/index.js

The server speaks JSON-RPC over stdio; it is launched by your MCP client, not run as a standalone HTTP service.

Example MCP client config:

{
  "mcpServers": {
    "paperless": {
      "command": "node",
      "args": ["/path/to/paperless-mcp/dist/index.js"],
      "env": {
        "PAPERLESS_BASE_URL": "http://localhost:8000",
        "PAPERLESS_API_TOKEN": "xxxxxxxx",
        "PAPERLESS_VERIFY_SSL": "false"
      }
    }
  }
}

Deploying via Portainer + using with hermes

Because this is a stdio server (JSON-RPC over stdin/stdout, exits when stdin closes), it is not a long-lived HTTP service. Two patterns:

Pattern A — on-demand container (recommended). Build the image, then have hermes launch a fresh container per session:

docker run -i --rm --env-file .env paperless-mcp:latest

The -i is essential — it keeps stdin open for the JSON-RPC stream.

Pattern B — persistent container you exec into. Deploy the included docker-compose.yml as a Portainer Stack. It builds the image and keeps a container alive (via a sleep loop) so hermes can attach the MCP entrypoint on demand:

docker exec -i paperless-mcp node /app/dist/index.js

Portainer steps

In Portainer: Stacks → Add stack.
Paste the contents of docker-compose.yml (or point it at this repo).
Add the environment variables (PAPERLESS_BASE_URL, PAPERLESS_API_TOKEN, PAPERLESS_VERIFY_SSL) in the stack's Environment variables section, or upload your .env.
Deploy. If Paperless runs in the same Docker network, use its service name as the host (e.g. http://paperless-ngx:8000).

Then point hermes' MCP configuration at whichever launch command matches your chosen pattern (docker run -i ... or docker exec -i ...).

Privacy guarantees & limits

Tier 1 list/search calls send fields= so Paperless never serializes content.
paperless_get_document_metadata deletes content from the detail response in-process before returning.
paperless_extract_field reads content only inside the server process and returns just the extracted value, with a privacy_note confirming content was not returned.
Only paperless_get_document_content and paperless_download_document surface full content; both carry an explicit privacy_warning.

This server controls what it returns. It cannot stop a client/agent from separately calling the Tier 3 tools — that's exactly why those tools are named and described to make the privacy cost obvious to the model and the user.

Extending Tier 2 extraction

src/tools.ts contains an EXTRACTORS registry of named, dependency-free regex extractors (dollar_amounts, dates, emails, phone_numbers, addresses, total_amount). extraction_pattern also accepts a raw regex.

To plug in a local LLM (the design goal): keep the same contract — fetch content into the process, run your local model, return only the requested field(s). Replace the body of paperless_extract_field's handler (or add a new named extractor that calls your local inference endpoint). The privacy boundary is preserved as long as only the extracted value is returned.

Project layout

paperless-mcp/
├── src/
│   ├── index.ts            # entrypoint: stdio transport, tool registration, dispatch
│   ├── paperless-client.ts # REST client: auth, pagination, TTL cache, ID↔name resolution
│   └── tools.ts            # tool defs + handlers, grouped by privacy tier
├── package.json
├── tsconfig.json
├── Dockerfile
├── docker-compose.yml      # Portainer stack
├── .env.example
└── README.md

License

MIT

Install Server

license - not found

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Tools

View all tools

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Milli42/paperlessngx-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server