How do I use procurement-knowledge-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@procurement-knowledge-mcp compare order documents for ORD-2024-001" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

procurement-knowledge-mcp

by trussler-leveragepointdata

Overview Schema Related Servers Score Discussions

Local

Procurement Knowledge MCP

Local data pipeline and MCP server for querying a procurement and inventory document corpus (invoices, purchase orders, shipping orders, inventory reports, contracts).

Status: Pipeline and MCP server complete. Run make ingest once, then connect an MCP client to query preprocessed artifacts.

Design decisions and trade-offs: CAPABILITY_TRACKER.md.

Requirements

Python 3.11+
uv package manager
Tesseract OCR (brew install tesseract on macOS)
Cursor or another MCP-compatible client

Related MCP server: docnova-mcp

Setup

Extract corpus data

From the project root, unzip the bundled archive. It creates data/ in place (no copying or rearranging files):

unzip test-data.zip

Expected layout: data/invoices/, data/purchase_orders/, data/shipping_orders/, data/inventory_reports/, data/contracts/ (~45 documents).

Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

Virtual environment on external drives

If the project lives on an external volume, uv may fail to create .venv on that drive due to macOS ._* (AppleDouble) files.

cp .env.example .env

Edit .env:

UV_PROJECT_ENVIRONMENT=/Users/you/.venvs/procurement-knowledge-mcp
mkdir -p ~/.venvs

Use make or source .env before bare uv run commands.

Install dependencies

make sync
# or: uv sync --all-groups

Development workflow

make          # same as make check
make check    # ruff + mypy + full pytest with coverage (CI / pre-commit gate)
make test-fast # unit tests only, no coverage (skips corpus ingest)
make test     # full pytest with terminal coverage
make test-integration # slow corpus / ingest tests only
make test-cov # pytest + htmlcov/
make cursor-setup # sync deps + verify MCP launcher
make smoke    # MCP tool smoke test (requires make ingest)
make ingest   # test-fast, then full ingest
make mcp      # test-fast, then MCP server (stdio)

Coverage artifacts (.coverage, htmlcov/) are gitignored.

Architecture

flowchart LR
    subgraph ingest [ingest.py]
        D[discover] --> E[extract] --> M[model]
        M --> C[compare] --> CH[chunk] --> I[index]
    end

    subgraph artifacts [processed/ and index/]
        DJ[documents.jsonl]
        OI[order_index.json]
        CAT[catalog.json]
        OC[order_comparisons.json]
        CJ[chunks.jsonl]
        DB[knowledge.db]
    end

    subgraph mcp [mcp_server.py]
        T1[search_documents]
        T2[list_documents]
        T3[get_document]
        T4[find_document_gaps]
        T5[compare_order_documents]
        T6[get_knowledge_base_summary]
    end

    data[(data/)] --> D
    M --> DJ
    M --> OI
    C --> CAT
    C --> OC
    CH --> CJ
    I --> DB
    DJ --> mcp
    CAT --> mcp
    OC --> mcp
    DB --> T1

Pipeline

discover -> extract -> model -> compare -> chunk -> index

Stage	Module	Output
discover	`discover.py`	File registry, stable `doc_id`, basename collision index
extract	`extract.py`	PyMuPDF text; Tesseract OCR for JPGs / empty PDFs
model	`parse.py`, `model.py`	`documents.jsonl`, `order_index.json`
compare	`compare.py`	`catalog.json`, `order_comparisons.json`
chunk	`chunk.py`	`chunks.jsonl`
index	`index.py`	`knowledge.db` (SQLite FTS5)

Typical ingest result: 45 documents, 19 orders, 130 chunks.

Data modeling

Content-primary fields: order_id, totals, line items, inventory period parsed from document body (or OCR), with field_provenance on each field.
Path-based identity only: doc_id, source_path, doc_type (from folder). Filename patterns are fallbacks, never silent authority for linking.
Order index: maps content-derived order_id to doc_id lists. Scanned JPG invoices without OCR order_id are excluded and listed in catalog.json.
Comparisons: precomputed per-order presence, total matching (?0.01), and line-item checks in order_comparisons.json.

Retrieval strategy

Query type	Mechanism
Structured lookup (order, doc type, gaps)	`documents.jsonl`, `order_index.json`, `catalog.json`
Cross-document reconciliation	`order_comparisons.json`
Text evidence (contracts, keywords)	FTS5 over `chunks.jsonl` in `knowledge.db`

Chunking: one chunk per page (invoice, PO, shipping, contract); one chunk per inventory report; contract boilerplate stripped.

Identity and duplicate handling

Rule	Behavior
`source_path`	Normalized relative path; duplicate paths in one run are skipped
`doc_id`	Slug from folder + stem (`invoices__invoice_10687`); hash suffix on collision
`basename_index`	Recorded in `catalog.json` when the same filename appears in multiple folders
Hidden / junk files	`._*`, `.DS_Store`, and hidden files are not ingested

Run the ingest pipeline

make ingest

Custom paths:

uv run python ingest.py --data ./data --processed ./processed --index ./index

Run the MCP server

Smoke test all six tools against ingested artifacts (no client required):

make ingest   # once, if processed/ and index/ are missing
make smoke    # or: uv run python scripts/smoke_test_mcp.py

Start the stdio server for Cursor or another MCP client (make mcp runs test-fast first, then blocks):

make mcp

Press Ctrl+C to stop the server.

Required artifacts

make ingest must complete successfully before MCP tools work:

processed/documents.jsonl
processed/order_index.json
processed/catalog.json
processed/order_comparisons.json
index/knowledge.db

MCP tools

Every tool returns a top-level sources[] array (citations). Use these for grounded answers.

Tool	Purpose
`search_documents`	FTS keyword search; optional `doc_type`, `order_id`, `period` filters
`list_documents`	List documents by metadata
`get_document`	Fetch one record by `doc_id`; `include_text=false` by default
`find_document_gaps`	Gap lists from catalog (`invoices_missing_po`, etc.)
`compare_order_documents`	Precomputed order comparison + field citations
`get_knowledge_base_summary`	Corpus counts, inventory periods, ingest metadata

Citation schema

Each entry in sources[] includes:

doc_id, source_path, doc_type, page, chunk_id, field, value, snippet, field_provenance, extraction_method, confidence, citation_label

Built by procurement/citations.py (build_citation, build_citation_from_chunk).

Connect Cursor

Project MCP config is committed at .cursor/mcp.json. It runs scripts/run_mcp.sh, which uses uv and loads .env (for UV_PROJECT_ENVIRONMENT on external drives).

make cursor-setup   # uv sync + chmod launcher
make ingest         # required once before MCP queries work

In Cursor: Developer: Reload Window, then Settings ? MCP and confirm procurement-knowledge is connected. In Agent mode, ask the agent to use procurement-knowledge tools.

To override paths locally, copy the server block to ~/.cursor/mcp.json or edit the project file.

Validate basic questions (Cursor Agent)

Use tests/prompts/test_questions.txt for a manual end-to-end check after ingest and MCP connect:

Run make ingest and confirm procurement-knowledge is connected (see above).
Open Agent chat and attach or paste the prompt file (@tests/prompts/test_questions.txt).
Ask the agent to follow the file instructions: one question at a time, grounded answers with procurement-knowledge tools, and show the answer before the next question.

The prompt covers assignment-style questions (missing POs, order 10687, shipment vs invoice, contract terms, mismatches, inventory periods). Answers should cite sources[] from tool results.

Prompt question (summary)	Primary tool(s)
Invoices missing a PO	`find_document_gaps`
PO for invoice 10248	`list_documents`
Shipment 10603 vs invoice	`compare_order_documents`
Contract supply of goods	`search_documents` (`doc_type=contract`)
Documents for order 10687	`list_documents`
Mismatches across doc types	`find_document_gaps`, `compare_order_documents`
Inventory reports and periods	`get_knowledge_base_summary`, `list_documents`

Automated coverage of the same paths lives in tests/test_mcp_tools.py; make smoke exercises tools without a client.

Example queries (assignment-style)

After make ingest, automated tests in tests/test_mcp_tools.py exercise these paths. Expected results on the bundled corpus:

Question	Tool	Expected
Which invoices lack a PO?	`find_document_gaps(gap_type="invoices_missing_po")`	`10436`, `10687`, `10839`
Documents for order 10687?	`list_documents(order_id="10687")`	invoice + shipping_order
Does shipment match invoice for 10687?	`compare_order_documents(order_id="10687")`	`missing_purchase_order`; invoice/shipping totals match
Contract supply terms?	`search_documents(query="supply goods", doc_type="contract")`	Hits on TotalEnergies master contract
Inventory periods?	`get_knowledge_base_summary()`	`2016-07` ? `2018-01`

Sample compare_order_documents("10687") summary:

Order 10687: invoice and shipping totals match; purchase order missing.

Project layout

procurement-knowledge-mcp/
  test-data.zip         # Corpus archive; unzip at repo root ? data/
  data/                 # Source documents (read-only; from test-data.zip)
  processed/            # Generated JSON/JSONL (gitignored)
  index/                # SQLite FTS index (gitignored)
  procurement/          # Pipeline library
  ingest.py             # Ingest CLI
  mcp_server.py         # MCP server (stdio)
  tests/                # pytest suite
    prompts/
      test_questions.txt  # Cursor Agent manual validation prompt
  pyproject.toml
  Makefile

Testing

make test-fast        # quick unit loop during development
make test             # full suite with coverage
make test-integration # corpus / full-ingest tests only
make check            # lint + typecheck + full suite with coverage

Current suite: 130 passed, 100% coverage on measured source (make check).

Integration tests (marked @pytest.mark.integration) run discover ? ingest on data/ and validate known orders (10687, 10248) and gap lists. make test-fast skips them for a faster feedback loop.

For interactive validation in Cursor, use tests/prompts/test_questions.txt (see Validate basic questions under Connect Cursor).

Verbose:

set -a && source .env && set +a && uv run pytest -v

Troubleshooting

Issue	Solution
`uv run` fails with `._ruff` on external drive	Use `.env` with `UV_PROJECT_ENVIRONMENT` on local disk; prefer `make`
`Processed artifacts missing`	Run `make ingest`
`make mcp` hangs	Normal ? stdio server waiting for client; Ctrl+C to exit
SQLite readonly on external drive	Ingest builds `knowledge.db` in a temp dir then moves it (handled in `index.py`)

Known limitations

No vector embeddings (FTS keyword search only).
Scanned JPG invoices without OCR order_id are excluded from the order index.
Line-item matching is best-effort regex parsing, not layout-aware extraction.
Contract text is page-chunked plain text (no clause segmentation).

AI-assisted development

This project was developed with AI assistance (Cursor). Design decisions follow a content-primary modeling policy with explicit trade-offs for a 3?5 hour take-home scope. Validation: make check (lint, types, 100% coverage), make smoke, pytest in tests/test_mcp_tools.py, and manual Agent runs via tests/prompts/test_questions.txt.

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/trussler-leveragepointdata/procurement-knowledge-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server