Skip to main content
Glama
Milli42

paperlessngx-mcp

by Milli42

paperless-mcp

A privacy-first Model Context Protocol (MCP) server for Paperless-ngx. It lets an LLM agent search, organize, tag, and reference your documents without the full text of those documents entering the model's context window — unless you explicitly ask for it.

Built with the official @modelcontextprotocol/sdk as a stdio server, TypeScript/ESM, Node 20+.


Why this exists: the three privacy tiers

The whole point of this server is to enforce a boundary between document metadata and document content. Most document management — "find my 2024 tax return", "tag this as an invoice", "what did the electric company send me" — needs metadata, not the OCR'd text of the document. Yet the naive Paperless API call (GET /api/documents/{id}/) returns the entire OCR content field by default, which would silently dump full document text into the model's context.

Every tool here falls into exactly one tier, and the tier dictates what data reaches the calling model:

Tier 1 — Metadata only (default)

Returns titles, dates, tags, correspondents, document types, custom fields, page counts, file sizes. Never returns OCR content. This covers the large majority of practical use cases. Search runs server-side inside Paperless and comes back as metadata.

Two mechanisms enforce this:

  1. List/search calls pass Paperless's sparse fieldset (fields=id,title,correspondent,...) so the content field is never even serialized by the server.

  2. The single-document metadata tool calls the detail endpoint (which includes content) but explicitly deletes the content field before returning.

Tier 2 — Local extraction (server-side processing)

The MCP server fetches content into its own process, runs a local parser (regex today; a local LLM in the future), and returns only the extracted value(s). The full text never leaves the server. Example: extract every dollar amount, or the "amount due" line, from an invoice — without the invoice body entering the model context.

This is the extensibility point: paperless_extract_field is designed so a future local-model extraction backend drops in behind the same contract (read content locally → parse → return only the field).

Tier 3 — Full content (explicit intent required)

Returns the full OCR text or the document binary to the calling model. These tools are clearly named and their descriptions carry a privacy_warning. Reserve them for when the user explicitly asks to read, summarize, or analyze a document's contents.

The tiers are deliberately not collapsed. A tool that searches returns metadata; a tool that fetches content is a separate, clearly-marked tool.


Related MCP server: KnowledgeMCP

Tools by tier

Tool

Tier

Returns

paperless_search_documents

1

Matching docs as metadata (id, title, names, tags, dates, page_count)

paperless_get_document_metadata

1

One doc's full metadata (content stripped), incl. checksum, size, custom fields, notes

paperless_list_tags

1

All tags: id, name, document_count

paperless_list_correspondents

1

All correspondents: id, name, document_count

paperless_list_document_types

1

All document types: id, name, document_count

paperless_get_statistics

1

Totals, inbox count, type & tag breakdowns

paperless_tag_document

1

Sets a document's tags (by name)

paperless_set_correspondent

1

Sets a document's correspondent (by name)

paperless_set_document_type

1

Sets a document's type (by name)

paperless_suggest_tags

1

Paperless's own server-side suggestions (no content to the model)

paperless_extract_field

2

Only the field(s) extracted locally from content

paperless_get_document_content

3

Full OCR text

paperless_download_document

3

Saves the binary PDF to disk, returns the path

All Tier 1 tools resolve IDs to human-readable names. There is exactly one tool that returns OCR text (paperless_get_document_content) and one that materializes the binary (paperless_download_document).


Configuration

All config is via environment variables.

Variable

Required

Default

Description

PAPERLESS_BASE_URL

yes

e.g. http://paperless-ngx:8000

PAPERLESS_API_TOKEN

yes

Paperless API token (Authorization: Token <...>)

PAPERLESS_VERIFY_SSL

no

true

Set false for self-signed / plain-http local instances

PAPERLESS_DOWNLOAD_DIR

no

<tmp>/paperless-mcp-downloads

Where Tier 3 downloads are written

Get an API token in Paperless under Settings → My Profile → API Token.


Running locally (stdio)

npm install
npm run build
PAPERLESS_BASE_URL=http://localhost:8000 \
PAPERLESS_API_TOKEN=xxxxxxxx \
node dist/index.js

The server speaks JSON-RPC over stdio; it is launched by your MCP client, not run as a standalone HTTP service.

Example MCP client config:

{
  "mcpServers": {
    "paperless": {
      "command": "node",
      "args": ["/path/to/paperless-mcp/dist/index.js"],
      "env": {
        "PAPERLESS_BASE_URL": "http://localhost:8000",
        "PAPERLESS_API_TOKEN": "xxxxxxxx",
        "PAPERLESS_VERIFY_SSL": "false"
      }
    }
  }
}

Deploying via Portainer + using with hermes

Because this is a stdio server (JSON-RPC over stdin/stdout, exits when stdin closes), it is not a long-lived HTTP service. Two patterns:

Pattern A — on-demand container (recommended). Build the image, then have hermes launch a fresh container per session:

docker run -i --rm --env-file .env paperless-mcp:latest

The -i is essential — it keeps stdin open for the JSON-RPC stream.

Pattern B — persistent container you exec into. Deploy the included docker-compose.yml as a Portainer Stack. It builds the image and keeps a container alive (via a sleep loop) so hermes can attach the MCP entrypoint on demand:

docker exec -i paperless-mcp node /app/dist/index.js

Portainer steps

  1. In Portainer: Stacks → Add stack.

  2. Paste the contents of docker-compose.yml (or point it at this repo).

  3. Add the environment variables (PAPERLESS_BASE_URL, PAPERLESS_API_TOKEN, PAPERLESS_VERIFY_SSL) in the stack's Environment variables section, or upload your .env.

  4. Deploy. If Paperless runs in the same Docker network, use its service name as the host (e.g. http://paperless-ngx:8000).

Then point hermes' MCP configuration at whichever launch command matches your chosen pattern (docker run -i ... or docker exec -i ...).


Privacy guarantees & limits

  • Tier 1 list/search calls send fields= so Paperless never serializes content.

  • paperless_get_document_metadata deletes content from the detail response in-process before returning.

  • paperless_extract_field reads content only inside the server process and returns just the extracted value, with a privacy_note confirming content was not returned.

  • Only paperless_get_document_content and paperless_download_document surface full content; both carry an explicit privacy_warning.

This server controls what it returns. It cannot stop a client/agent from separately calling the Tier 3 tools — that's exactly why those tools are named and described to make the privacy cost obvious to the model and the user.


Extending Tier 2 extraction

src/tools.ts contains an EXTRACTORS registry of named, dependency-free regex extractors (dollar_amounts, dates, emails, phone_numbers, addresses, total_amount). extraction_pattern also accepts a raw regex.

To plug in a local LLM (the design goal): keep the same contract — fetch content into the process, run your local model, return only the requested field(s). Replace the body of paperless_extract_field's handler (or add a new named extractor that calls your local inference endpoint). The privacy boundary is preserved as long as only the extracted value is returned.


Project layout

paperless-mcp/
├── src/
│   ├── index.ts            # entrypoint: stdio transport, tool registration, dispatch
│   ├── paperless-client.ts # REST client: auth, pagination, TTL cache, ID↔name resolution
│   └── tools.ts            # tool defs + handlers, grouped by privacy tier
├── package.json
├── tsconfig.json
├── Dockerfile
├── docker-compose.yml      # Portainer stack
├── .env.example
└── README.md

License

MIT

Install Server
F
license - not found
A
quality
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Milli42/paperlessngx-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server