Skip to main content
Glama
singhh879

findocs-mcp

by singhh879

FinDocs MCP

An eval-first, reliability-first MCP server for semantic search and grounded Q&A over a financial-docs corpus β€” Postgres + pgvector for retrieval, a first-class eval-loop that fails CI on regression.

CI

FinDocs MCP gives an AI agent three tools over MCP: search a corpus of broker API documentation (Zerodha Kite Connect + Finvasia Shoonya), ask grounded questions that come back with citations, and ingest new documents. The interesting part isn't the RAG β€” it's the evaluation harness: every change is scored on retrieval recall, ranking quality, answer faithfulness, and refusal correctness, and a regression below baseline turns the build red.

This is the "tick-data validation, zero production mis-fires" discipline from quant trading infrastructure, applied to AI tooling: a confident wrong answer is worse than an honest "not found."

πŸ“š Learning the codebase? The source is written as a reverse-learning layer: read it top-down from src/mcp/server.ts (where an agent calls in) and follow the β–Ό LEARN comment blocks down through retrieval, embeddings, cosine/pgvector, chunking, the refusal gate, and the eval-loop β€” to the linear algebra at the bottom. Each concept is taught inline, right where it's implemented.


Architecture

                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   MCP client    β”‚                 MCP server (stdio)          β”‚
 (Claude Code/   β”‚   search_docs Β· answer_question Β· ingest_docβ”‚
  Desktop) ─────▢│                                             β”‚
                 β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
                         β”‚               β”‚               β”‚
                  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
                  β”‚  Embedder  β”‚   β”‚ Retrieval  β”‚   β”‚   Ingest   β”‚
                  │ (local     │   │ + QA gate  │   │ chunk→embed│
                  β”‚  MiniLM)   β”‚   β”‚ + citationsβ”‚   β”‚  β†’upsert   β”‚
                  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
                                  β”‚  Postgres +  β”‚
                                  β”‚   pgvector   β”‚  HNSW cosine
                                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

   evals/  ──▢  runner ──▢ metrics (recall@k Β· MRR Β· faithfulness Β· refusal)
                                  β”‚
                                  β–Ό
                          baseline.json gate ──▢ CI pass/fail

Everything is provider-agnostic behind thin adapters:

Concern

Default (zero cost, no secrets)

Swap-in

Embeddings

@xenova/transformers MiniLM-L6-v2 (384-dim)

OpenAI / Voyage

LLM

deterministic heuristic (extractive + overlap judge)

local Ollama, or Anthropic / OpenAI

Store

Postgres + pgvector (HNSW, cosine)

β€”

The defaults run with no API keys and no per-call cost, which is exactly what makes the eval gate reproducible in CI.


Related MCP server: hr-faq-rag

MCP tools

Tool

Description

search_docs(query, k?)

Top-k chunks with cosine similarity scores + source metadata.

answer_question(question)

Retrieves, applies a confidence gate, synthesizes a grounded answer with citations, or refuses with "not found" when retrieval confidence is low.

ingest_doc({ url | text, source?, title? })

Chunk β†’ embed β†’ upsert. Idempotent on content.

The reliability core β€” the refusal gate

answer_question never synthesizes when retrieval confidence is below the configured floor. It refuses instead. The eval set includes out-of-corpus negative cases specifically to prove this behavior holds (see src/qa/gate.ts). With the default thresholds there is a clean margin between in-corpus questions (top cosine β‰₯ 0.35) and out-of-corpus questions (top cosine ≀ 0.31).


The eval-loop (the centerpiece)

A labeled dataset of ~50 cases (evals/dataset.jsonl) β€” question β†’ expected supporting document(s), including negative/out-of-corpus cases.

Metrics (evals/harness/metrics.ts):

Metric

Question it answers

recall@k

Did the right document make it into the top-k?

MRR

How highly was the right document ranked?

faithfulness

Is the answer actually supported by the retrieved chunks? (LLM-as-judge; deterministic fallback)

refusal accuracy

Does it answer in-corpus questions and refuse out-of-corpus ones?

Runner β€” pnpm eval prints a scorecard, writes evals/results/{timestamp}.json, and appends a row to evals/history.ndjson so you can track the score-over-time curve.

Regression gate β€” pnpm eval:gate compares the scorecard against evals/baseline.json and exits non-zero if any metric drops below threshold (minus a small epsilon). CI runs this on every PR.

Current baseline (calibrated against the real corpus):

recall@5  0.92   Β·   MRR  0.80   Β·   faithfulness  0.80   Β·   refusal accuracy  0.90

Offline smoke test: pnpm calibrate runs the entire scoring pipeline with the real embedder against an in-memory index β€” no database required β€” useful for tuning thresholds and sanity-checking retrieval quality locally.


Quickstart

Prerequisites: Node 20+, pnpm (corepack enable pnpm), and Docker (for the pgvector container).

pnpm install
cp .env.example .env          # defaults match docker-compose

pnpm db:up                    # start Postgres + pgvector (host port 5433)
pnpm db:wait                  # wait until it accepts connections
pnpm migrate                  # apply schema + HNSW index
pnpm ingest                   # chunk β†’ embed β†’ upsert the corpus

pnpm eval                     # print the scorecard
pnpm eval:gate                # run the regression gate (CI uses this)

pnpm dev                      # run the MCP server over stdio

The first pnpm ingest / pnpm eval downloads the MiniLM model (~90 MB) and caches it under .models/.


Using it from Claude Desktop / Claude Code

Build first (pnpm build), then point your MCP client at dist/mcp/server.js.

Claude Desktop β€” add to claude_desktop_config.json:

{
  "mcpServers": {
    "findocs": {
      "command": "node",
      "args": ["/absolute/path/to/findocs-mcp/dist/mcp/server.js"],
      "env": {
        "DATABASE_URL": "postgres://findocs:findocs@localhost:5433/findocs"
      }
    }
  }
}

Claude Code β€” register the server from the repo root:

claude mcp add findocs \
  --env DATABASE_URL=postgres://findocs:findocs@localhost:5433/findocs \
  -- node ./dist/mcp/server.js

Then ask things like "Search the docs for how GTT OCO orders work" or "How is the Kite Connect access token checksum computed?" β€” and try an out-of-corpus question to watch it refuse.


2-minute demo

Demo recording goes here β€” replace with an asciinema cast or GIF:

# record:
asciinema rec demo.cast -c "pnpm eval && pnpm dev"

demo


Project layout

src/
  config.ts              zod-validated env
  db/                    postgres.js client + repo (upsert / vectorSearch / getChunk)
  embeddings/            Embedder interface + local transformers.js impl + factory
  llm/                   LLMProvider {synthesize, judge}: heuristic + ollama
  ingest/                chunk Β· load Β· pipeline
  retrieval/search.ts    search_docs core
  qa/                    confidence gate + grounded answer with citations
  mcp/server.ts          MCP stdio server (3 tools, zod schemas)
evals/
  dataset.jsonl          labeled cases (incl. negatives)
  harness/               metrics Β· runner Β· scorecard Β· gate (first-class module)
  baseline.json          regression thresholds
corpus/                  vendored broker API docs (deterministic eval base)
db/                      schema.sql Β· migrate Β· wait
scripts/calibrate.ts     offline eval (no DB) for threshold tuning

Notes & scope

  • Corpus is a curated, vendored subset of public broker API documentation for demo and reproducibility; it may lag the official docs. Treat it as a fixture, not a source of truth for live trading.

  • TypeScript strict throughout (exactOptionalPropertyTypes, noUncheckedIndexedAccess, …), ESM, no any in core paths. Tests in vitest.

  • Out of scope for v1: rerankers, hybrid BM25+vector, auth, web UI β€” the adapters are structured so these slot in without a rewrite.

License

MIT β€” see LICENSE.

Install Server
A
license - permissive license
A
quality
B
maintenance

Maintenance

–Maintainers
–Response time
–Release cycle
–Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/singhh879/findocs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server