mcp-lancedb

🗄️ LanceDB MCP Server for LLMS

A Model Context Protocol (MCP) server that enables LLMs to interact directly the documents that they have on-disk through agentic RAG and hybrid search in LanceDB. Ask LLMs questions about the dataset as a whole or about specific documents.

✨ Features

  • 🔍 LanceDB-powered serverless vector index and document summary catalog.
  • 📊 Efficient use of LLM tokens. The LLM itself looks up what it needs when it needs.
  • 📈 Security. The index is stored locally so no data is transferred to the Cloud when using a local LLM.

🚀 Quick Start

To get started, create a local directory to store the index and add this configuration to your Claude Desktop config file:

MacOS: ~/Library/Application\ Support/Claude/claude_desktop_config.json
Windows: %APPDATA%/Claude/claude_desktop_config.json

{ "mcpServers": { "lancedb": { "command": "npx", "args": [ "lance-mcp", "PATH_TO_LOCAL_INDEX_DIR" ] } } }

Prerequisites

  • Node.js 18+
  • npx
  • MCP Client (Claude Desktop App for example)
  • Summarization and embedding models installed (see config.ts - by default we use Ollama models)
    • ollama pull snowflake-arctic-embed2
    • ollama pull llama3.1:8b

Demo

<img src="https://github.com/user-attachments/assets/90bfdea9-9edd-4cf6-bb04-94c9c84e4825" width="50%">

Local Development Mode:

{ "mcpServers": { "lancedb": { "command": "node", "args": [ "PATH_TO_LANCE_MCP/dist/index.js", "PATH_TO_LOCAL_INDEX_DIR" ] } } }

Use npm run build to build the project.

Use npx @modelcontextprotocol/inspector dist/index.js PATH_TO_LOCAL_INDEX_DIR to run the MCP tool inspector.

Seed Data

The seed script creates two tables in LanceDB - one for the catalog of document summaries, and another one - for vectorized documents' chunks. To run the seed script use the following command:

npm run seed -- --dbpath <PATH_TO_LOCAL_INDEX_DIR> --filesdir <PATH_TO_DOCS>

You can use sample data from the docs/ directory. Feel free to adjust the default summarization and embedding models in the config.ts file. If you need to recreate the index, simply rerun the seed script with the --overwrite option.

Catalog

  • Document summary
  • Metadata

Chunks

  • Vectorized document chunk
  • Metadata

🎯 Example Prompts

Try these prompts with Claude to explore the functionality:

"What documents do we have in the catalog?" "Why is the US healthcare system so broken?"

📝 Available Tools

The server provides these tools for interaction with the index:

Catalog Tools

  • catalog_search: Search for relevant documents in the catalog

Chunks Tools

  • chunks_search: Find relevant chunks based on a specific document from the catalog
  • all_chunks_search: Find relevant chunks from all known documents

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

-
security - not tested
A
license - permissive license
-
quality - not tested

A Model Context Protocol (MCP) server that enables LLMs to interact directly the documents that they have on-disk through agentic RAG and hybrid search in LanceDB. Ask LLMs questions about the dataset as a whole or about specific documents.

  1. ✨ Features
    1. 🚀 Quick Start
      1. Prerequisites
        1. Demo
          1. Local Development Mode:
          2. Seed Data
            1. Catalog
              1. Chunks
            2. 🎯 Example Prompts
              1. 📝 Available Tools
                1. Catalog Tools
                  1. Chunks Tools
                  2. 📜 License