How do I use mcp-retrieve?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@mcp-retrieve search for 'token-level relevance' in my docs" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

mcp-retrieve

by maxbaluev

Overview Schema Related Servers Score Discussions

Python

Local

mcp-retrieve

An MCP server that exposes late-interaction document retrieval (ColBERT-style MaxSim) over a local folder. Point it at a directory of text/markdown/code, and any MCP client — Claude Desktop, an IDE agent, your own host — can index it and search it with token-level relevance.

It ships with a deterministic, model-free default embedder, so the server and its full test suite run offline with no model weights, no API key, no network. When you want production-grade semantics, drop in a real ColBERT / ColQwen encoder behind a small protocol — ranking code does not change.

What is MCP?

The Model Context Protocol is an open standard that lets LLM applications connect to external tools and data through a uniform server interface. A host (e.g. Claude Desktop) launches MCP servers and calls the tools they advertise. mcp-retrieve is such a server; it advertises two tools:

Tool	Purpose
`index_folder(folder)`	Read text files under `folder`, chunk them, embed each chunk into a multi-vector representation, and build an in-memory index.
`search(query, k=5)`	Rank indexed chunks by MaxSim late interaction and return the top `k` with source path, score, and snippet.

Related MCP server: ragi

The retrieval approach: late interaction (ColBERT)

Most dense retrievers compress a passage into one vector and compare it to one query vector — cheap, but lossy. Late interaction, introduced by ColBERT (Khattab & Zaharia, ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT, SIGIR 2020, arXiv:2004.12832), keeps one vector per token for both the query and the document and defers their interaction to scoring time via the MaxSim operator:

score(q, d) = Σ_i  max_j  sim(q_i, d_j)

Each query token q_i is matched to its single most similar document token d_j, and those per-token maxima are summed. This preserves fine-grained term matching (a query term can find its evidence anywhere in the passage) while staying efficient. With L2-normalised vectors, sim is cosine similarity, so MaxSim reduces to a dot product followed by a row-wise max and a sum — which is exactly what mcp_retrieve.retrieval.maxsim computes.

Install

pip install -e .          # core: mcp + numpy
pip install -e ".[dev]"   # plus pytest for the test suite

Register with an MCP client

For Claude Desktop, add the server to its mcpServers config (claude_desktop_config.json):

{
  "mcpServers": {
    "mcp-retrieve": {
      "command": "mcp-retrieve"
    }
  }
}

If you installed into a virtual environment, use the absolute path to the mcp-retrieve console script (or "command": "python", "args": ["-m", "mcp_retrieve"]). Restart the client, then ask it to index a folder and search it — the model will call the index_folder and search tools.

Usage from Python

from mcp_retrieve import RetrievalIndex

index = RetrievalIndex()                 # deterministic default embedder
index.index_folder("./docs")
for hit in index.search("late interaction maxsim", k=5):
    print(f"{hit.score:.3f}  {hit.chunk.source}  {hit.snippet}")

Plugging in a real late-interaction model

The default HashingEmbedder makes the project run anywhere, but it matches on character n-grams, not meaning. For real semantics, implement the Embedder protocol around a trained encoder and pass it in:

import numpy as np
from mcp_retrieve import RetrievalIndex
from mcp_retrieve.server import create_server

class ColbertEmbedder:
    """Wrap a ColBERT checkpoint as a multi-vector Embedder."""
    def __init__(self, checkpoint: str) -> None:
        from colbert.modeling.checkpoint import Checkpoint
        from colbert.infra import ColBERTConfig
        self._ckpt = Checkpoint(checkpoint, ColBERTConfig())

    @property
    def dim(self) -> int:
        return 128

    def embed(self, text: str) -> "np.ndarray":
        vecs = self._ckpt.docFromText([text])[0]   # (num_tokens, 128)
        return np.asarray(vecs, dtype=np.float32)

# Use it from Python …
index = RetrievalIndex(embedder=ColbertEmbedder("colbert-ir/colbertv2.0"))

# … or run the MCP server with it.
server = create_server(embedder=ColbertEmbedder("colbert-ir/colbertv2.0"))
server.run()

Any object exposing dim: int and embed(text) -> ndarray[num_tokens, dim] with L2-normalised rows satisfies the protocol — ColBERT, ColQwen, ColPali, or your own. The ranking and chunking code is encoder-agnostic.

Architecture

src/mcp_retrieve/
  embedder.py    # Embedder protocol + deterministic HashingEmbedder default
  retrieval.py   # chunking, MaxSim, RetrievalIndex (pure — no MCP dependency)
  server.py      # FastMCP server exposing index_folder + search (only MCP import)

The retrieval and embedding cores import nothing MCP-related, so they are testable and reusable on their own; the SDK is isolated to server.py and imported lazily.

Testing

python -m pytest

All retrieval and embedder tests run offline with the default embedder. The end-to-end FastMCP tool test is skipped automatically when the mcp package is not installed.

License

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Tools

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/maxbaluev/mcp-retrieve'

If you have feedback or need assistance with the MCP directory API, please join our Discord server