DuckDB-RAG-MCP-Sample

by nananaman
Verified

local-only server

The server can only run on the client’s local machine because it depends on local resources.

Integrations

  • Uses DuckDB for vector search capabilities to enable retrieval augmented generation (RAG) from markdown documents

  • Processes markdown files by extracting text and converting it to vector embeddings for semantic search

DuckDB RAG MCP Sample

This is a sample that embeds and vectorizes a markdown document so that it can be explained using MCP and RAG.

We use Plamo-Embedding-1B for vectorization.

function

  • Extract and vectorize text from markdown files
  • Vector Searching with DuckDB
  • Persisting vector data with Parquet files
  • Vector search from MCP

How to use

Vector data generation

First, place the markdown files you want to search in a specific directory, then convert them to Parquet files with the following command.

uv run main.py --directory ~/path/to/markdown/files --parquet vectors.parquet

Configuring MCP

Build

The following command will generate a single binary in dist/server .

uv run pyinstaller --clean --strip --noconfirm --onefile server.py

MCP Client Configuration

Please set it according to the client you want to use.

For Claude Desktop it looks like this:

For VECTOR_PARQUET, specify the file you just converted.

uv run mcp install server.py -v VECTOR_PARQUET=/path/to/vectors.parquet

It is set as follows:

{ "mcpServers": { "DuckDB-RAG-MCP-Sample": { "command": "/path/to/dist/server", "env": { "VECTOR_PARQUET": "/path/to/vectors.parquet" } } } }

Start the development server

uv run mcp dev server.py

license

The DuckDB RAG MCP Sample is provided under the Apache License, Version 2.0.

-
security - not tested
A
license - permissive license
-
quality - not tested

An MCP server that enables RAG (Retrieval-Augmented Generation) on markdown documents by converting them to embedding vectors and performing vector search using DuckDB.

  1. 機能
    1. 使用方法
      1. ベクトルデータ生成
      2. MCP の設定
      3. 開発用サーバー起動
    2. ライセンス
      ID: 1qfkx3fdax