procurement-knowledge-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@procurement-knowledge-mcpcompare order documents for ORD-2024-001"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Procurement Knowledge MCP
Local data pipeline and MCP server for querying a procurement and inventory document corpus (invoices, purchase orders, shipping orders, inventory reports, contracts).
Status: Pipeline and MCP server complete. Run make ingest once, then connect an MCP client to query preprocessed artifacts.
Design decisions and trade-offs: CAPABILITY_TRACKER.md.
Requirements
Python 3.11+
uv package manager
Tesseract OCR (
brew install tesseracton macOS)Cursor or another MCP-compatible client
Related MCP server: Atlas
Setup
Extract corpus data
From the project root, unzip the bundled archive. It creates data/ in place (no copying or rearranging files):
unzip test-data.zipExpected layout: data/invoices/, data/purchase_orders/, data/shipping_orders/, data/inventory_reports/, data/contracts/ (~45 documents).
Install uv
curl -LsSf https://astral.sh/uv/install.sh | shVirtual environment on external drives
If the project lives on an external volume, uv may fail to create .venv on that drive due to macOS ._* (AppleDouble) files.
cp .env.example .envEdit .env:
UV_PROJECT_ENVIRONMENT=/Users/you/.venvs/procurement-knowledge-mcp
mkdir -p ~/.venvsUse make or source .env before bare uv run commands.
Install dependencies
make sync
# or: uv sync --all-groupsDevelopment workflow
make # same as make check
make check # ruff + mypy + full pytest with coverage (CI / pre-commit gate)
make test-fast # unit tests only, no coverage (skips corpus ingest)
make test # full pytest with terminal coverage
make test-integration # slow corpus / ingest tests only
make test-cov # pytest + htmlcov/
make cursor-setup # sync deps + verify MCP launcher
make smoke # MCP tool smoke test (requires make ingest)
make ingest # test-fast, then full ingest
make mcp # test-fast, then MCP server (stdio)Coverage artifacts (.coverage, htmlcov/) are gitignored.
Architecture
flowchart LR
subgraph ingest [ingest.py]
D[discover] --> E[extract] --> M[model]
M --> C[compare] --> CH[chunk] --> I[index]
end
subgraph artifacts [processed/ and index/]
DJ[documents.jsonl]
OI[order_index.json]
CAT[catalog.json]
OC[order_comparisons.json]
CJ[chunks.jsonl]
DB[knowledge.db]
end
subgraph mcp [mcp_server.py]
T1[search_documents]
T2[list_documents]
T3[get_document]
T4[find_document_gaps]
T5[compare_order_documents]
T6[get_knowledge_base_summary]
end
data[(data/)] --> D
M --> DJ
M --> OI
C --> CAT
C --> OC
CH --> CJ
I --> DB
DJ --> mcp
CAT --> mcp
OC --> mcp
DB --> T1Pipeline
discover -> extract -> model -> compare -> chunk -> indexStage | Module | Output |
discover |
| File registry, stable |
extract |
| PyMuPDF text; Tesseract OCR for JPGs / empty PDFs |
model |
|
|
compare |
|
|
chunk |
|
|
index |
|
|
Typical ingest result: 45 documents, 19 orders, 130 chunks.
Data modeling
Content-primary fields:
order_id, totals, line items, inventoryperiodparsed from document body (or OCR), withfield_provenanceon each field.Path-based identity only:
doc_id,source_path,doc_type(from folder). Filename patterns are fallbacks, never silent authority for linking.Order index: maps content-derived
order_idtodoc_idlists. Scanned JPG invoices without OCRorder_idare excluded and listed incatalog.json.Comparisons: precomputed per-order presence, total matching (?0.01), and line-item checks in
order_comparisons.json.
Retrieval strategy
Query type | Mechanism |
Structured lookup (order, doc type, gaps) |
|
Cross-document reconciliation |
|
Text evidence (contracts, keywords) | FTS5 over |
Chunking: one chunk per page (invoice, PO, shipping, contract); one chunk per inventory report; contract boilerplate stripped.
Identity and duplicate handling
Rule | Behavior |
| Normalized relative path; duplicate paths in one run are skipped |
| Slug from folder + stem ( |
| Recorded in |
Hidden / junk files |
|
Run the ingest pipeline
make ingestCustom paths:
uv run python ingest.py --data ./data --processed ./processed --index ./indexRun the MCP server
Smoke test all six tools against ingested artifacts (no client required):
make ingest # once, if processed/ and index/ are missing
make smoke # or: uv run python scripts/smoke_test_mcp.pyStart the stdio server for Cursor or another MCP client (make mcp runs test-fast first, then blocks):
make mcpPress Ctrl+C to stop the server.
Required artifacts
make ingest must complete successfully before MCP tools work:
processed/documents.jsonlprocessed/order_index.jsonprocessed/catalog.jsonprocessed/order_comparisons.jsonindex/knowledge.db
MCP tools
Every tool returns a top-level sources[] array (citations). Use these for grounded answers.
Tool | Purpose |
| FTS keyword search; optional |
| List documents by metadata |
| Fetch one record by |
| Gap lists from catalog ( |
| Precomputed order comparison + field citations |
| Corpus counts, inventory periods, ingest metadata |
Citation schema
Each entry in sources[] includes:
doc_id, source_path, doc_type, page, chunk_id, field, value, snippet, field_provenance, extraction_method, confidence, citation_label
Built by procurement/citations.py (build_citation, build_citation_from_chunk).
Connect Cursor
Project MCP config is committed at .cursor/mcp.json. It runs scripts/run_mcp.sh, which uses uv and loads .env (for UV_PROJECT_ENVIRONMENT on external drives).
make cursor-setup # uv sync + chmod launcher
make ingest # required once before MCP queries workIn Cursor: Developer: Reload Window, then Settings ? MCP and confirm procurement-knowledge is connected. In Agent mode, ask the agent to use procurement-knowledge tools.
To override paths locally, copy the server block to ~/.cursor/mcp.json or edit the project file.
Validate basic questions (Cursor Agent)
Use tests/prompts/test_questions.txt for a manual end-to-end check after ingest and MCP connect:
Run
make ingestand confirm procurement-knowledge is connected (see above).Open Agent chat and attach or paste the prompt file (
@tests/prompts/test_questions.txt).Ask the agent to follow the file instructions: one question at a time, grounded answers with procurement-knowledge tools, and show the answer before the next question.
The prompt covers assignment-style questions (missing POs, order 10687, shipment vs invoice, contract terms, mismatches, inventory periods). Answers should cite sources[] from tool results.
Prompt question (summary) | Primary tool(s) |
Invoices missing a PO |
|
PO for invoice 10248 |
|
Shipment 10603 vs invoice |
|
Contract supply of goods |
|
Documents for order 10687 |
|
Mismatches across doc types |
|
Inventory reports and periods |
|
Automated coverage of the same paths lives in tests/test_mcp_tools.py; make smoke exercises tools without a client.
Example queries (assignment-style)
After make ingest, automated tests in tests/test_mcp_tools.py exercise these paths. Expected results on the bundled corpus:
Question | Tool | Expected |
Which invoices lack a PO? |
|
|
Documents for order 10687? |
| invoice + shipping_order |
Does shipment match invoice for 10687? |
|
|
Contract supply terms? |
| Hits on TotalEnergies master contract |
Inventory periods? |
|
|
Sample compare_order_documents("10687") summary:
Order 10687: invoice and shipping totals match; purchase order missing.Project layout
procurement-knowledge-mcp/
test-data.zip # Corpus archive; unzip at repo root ? data/
data/ # Source documents (read-only; from test-data.zip)
processed/ # Generated JSON/JSONL (gitignored)
index/ # SQLite FTS index (gitignored)
procurement/ # Pipeline library
ingest.py # Ingest CLI
mcp_server.py # MCP server (stdio)
tests/ # pytest suite
prompts/
test_questions.txt # Cursor Agent manual validation prompt
pyproject.toml
MakefileTesting
make test-fast # quick unit loop during development
make test # full suite with coverage
make test-integration # corpus / full-ingest tests only
make check # lint + typecheck + full suite with coverageCurrent suite: 130 passed, 100% coverage on measured source (make check).
Integration tests (marked @pytest.mark.integration) run discover ? ingest on data/ and validate known orders (10687, 10248) and gap lists. make test-fast skips them for a faster feedback loop.
For interactive validation in Cursor, use tests/prompts/test_questions.txt (see Validate basic questions under Connect Cursor).
Verbose:
set -a && source .env && set +a && uv run pytest -vTroubleshooting
Issue | Solution |
| Use |
| Run |
| Normal ? stdio server waiting for client; Ctrl+C to exit |
SQLite readonly on external drive | Ingest builds |
Known limitations
No vector embeddings (FTS keyword search only).
Scanned JPG invoices without OCR
order_idare excluded from the order index.Line-item matching is best-effort regex parsing, not layout-aware extraction.
Contract text is page-chunked plain text (no clause segmentation).
AI-assisted development
This project was developed with AI assistance (Cursor). Design decisions follow a content-primary modeling policy with explicit trade-offs for a 3?5 hour take-home scope. Validation: make check (lint, types, 100% coverage), make smoke, pytest in tests/test_mcp_tools.py, and manual Agent runs via tests/prompts/test_questions.txt.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/trussler-leveragepointdata/procurement-knowledge-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server