How do I use Multimodal RAG MCP Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Multimodal RAG MCP Server search my knowledge base for the embedding function code" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Multimodal RAG MCP Server

by brett-hardiman

Overview Schema Related Servers Score Discussions

Python

Remote

Multimodal RAG with MCP

A personal knowledge base that an AI assistant can search.

Architecture

What this is, in plain terms

When you talk to an AI assistant like Claude, it only knows two things: what it learned during training, and what you paste into the conversation. It does not know anything about your work — your code, your notes, the documents on your computer, the things you discussed last week.

This project fixes that. It takes a pile of my own material — code from my projects, PDFs, images, and written notes — and organizes it into a searchable knowledge base. Then it connects that knowledge base directly to Claude, so when I ask a question, Claude can look things up in my own files and answer from them instead of guessing.

Think of it like giving the AI a personal filing cabinet, plus the ability to instantly find the right folder in it. Ask "what was the directory path for my portfolio project?" and instead of saying "I don't know," it searches the filing cabinet and answers with the real path from my actual files.

Related MCP server: Claude RAG MCP Pipeline

Why that matters

AI assistants are useful, but they forget. Every conversation starts mostly from scratch, and the AI has no reliable memory of your specific projects. For someone doing real work — writing code, managing projects, building things — that gap is the difference between a tool that gives generic advice and one that knows your actual situation.

Companies are running into this same problem at a much larger scale. As they adopt AI to help their teams, they need the AI to understand their internal documents, codebases, and history — not just general knowledge from the internet. This project is a small, working example of exactly that pattern: connecting an AI to a private body of knowledge so its answers are grounded in real, specific information.

What it can do

Search across very different kinds of content at once. Code, written documents, images, and notes all live in the same searchable place. A single question can pull from any of them.
Understand images, not just text. When an image goes in — a screenshot, a diagram, a chart — the system writes a detailed description of what's in it, so the image becomes findable by searching its contents.
Find things by meaning and by exact wording. If I search for a vague concept, it finds related material even if I didn't use the exact words. If I search for an exact function name or error message, it finds that precise match too. (More on why both matter below.)
Work directly inside a Claude conversation. Once connected, Claude can search the knowledge base on its own, mid-conversation, and answer using what it finds.
Move with me to a new computer. The knowledge itself lives in the cloud, so switching machines doesn't mean rebuilding everything.

How it works (a level deeper)

The system has two halves: getting information in, and getting answers out.

Getting information in (ingestion)

Different kinds of files need different handling, so each type takes its own path:

Documents (PDFs, notes) are split into readable chunks.
Code is split along natural boundaries — each function or class stays whole instead of being cut in half — so a search returns complete, sensible pieces of code.
Images are passed to an AI vision model that writes a rich description of what the image shows. That description is what becomes searchable.

Every chunk is then converted into a list of numbers called an embedding, which captures its meaning in a form a computer can compare quickly. All of it gets stored in a database.

Getting answers out (retrieval)

When a question comes in, the system runs two kinds of search at the same time and combines them:

Meaning-based search finds content that's about the same thing, even with different wording.
Keyword search finds exact matches — a specific file name, a function, an error string.

Neither alone is enough. Meaning-based search is bad at exact terms; keyword search is bad at concepts. Combining them (a technique called hybrid search) covers both, and the results are merged fairly so neither method drowns out the other.

The combined, ranked results get handed back to Claude, which uses them to answer.

The connection layer

The piece that lets Claude actually use all this is an MCP server (Model Context Protocol — a standard way to give AI assistants new tools). It exposes the knowledge base to Claude as a set of search tools. Claude decides when to use them, runs a search, gets the results, and answers — all within a normal conversation.

The engineering decisions (for the technically inclined)

These are the choices that separate this from a tutorial clone:

Caption-then-embed instead of CLIP-style image embeddings. The corpus is diagrams, screenshots, and code — content that carries text meaning. A vision-model caption embedded as text retrieves better than a visual embedding for this material, and keeps everything in one text vector space. True cross-modal embeddings only earn their complexity for photo-heavy corpora where visual appearance dominates.
Structure-aware code chunking. Python files are split on function/class boundaries using the ast module, with the symbol name preserved in metadata. Naive line-window chunking severs functions and wrecks retrieval; symbol names in metadata are also what make exact-identifier keyword search work.
Hybrid search via Reciprocal Rank Fusion. Pure vector search misses exact tokens; pure keyword misses paraphrase. The two are fused with RRF rather than raw-score addition, because cosine distance and ts_rank live on different scales and one would otherwise dominate. RRF fuses on rank position, which is scale-free.
One database, no separate vector store. pgvector handles semantic search and a generated tsvector column handles keyword search, both in Postgres. Reuses infrastructure already in place and removes an entire moving part.
Idempotent ingestion. A content hash plus a unique constraint makes re-running ingestion safe and incremental — already-stored content is skipped, not duplicated.

Tech stack

Layer	Choice
Vector + keyword storage	Supabase (Postgres) with `pgvector` and full-text search
Embeddings	Voyage `voyage-3.5` (1024-dim)
Image captioning	Claude vision, at ingestion time
AI connection	MCP server (Python, FastMCP)
Language	Python 3.12

Project layout

sql/01_schema.sql                 table: pgvector + generated tsvector + indexes
sql/02_search_fns.sql             semantic / keyword / hybrid (RRF) search functions
ingest/core.py                    config, embedding, image captioning, database writes
ingest/loaders/code_loader.py     structure-aware code chunking
ingest/loaders/doc_loaders.py     pdf / image / note / link loaders
ingest/ingest.py                  ingestion command-line tool
ingest/retrieval.py               semantic / keyword / hybrid search
server.py                         MCP server exposing search to Claude

Setup

Database. Run sql/01_schema.sql then sql/02_search_fns.sql in Supabase.
Keys. Copy .env.example to .env and fill in Supabase, Voyage, and Anthropic keys.
Install. python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt

Use

Ingest content:

python ingest/ingest.py path  ~/projects/my-repo     # a whole folder
python ingest/ingest.py file  ~/docs/spec.pdf        # a single file
python ingest/ingest.py link  https://example.com    # a web page

Test a search directly:

python ingest/retrieval.py "how does the login flow work"

Connect to Claude Desktop by adding the server to its config (see claude_desktop_config.example.json), then restart it. The search tools become available inside any conversation.

Honest notes

On "token savings": retrieval reduces tokens only versus a baseline of pasting large context by hand. Against careful, surgical pasting it mostly buys better recall, not cheaper conversations. The real value is accurate grounding across a scattered body of work, not a headline percentage.
Conversations: there is no live feed of chat history into the knowledge base; chats are exported to text and ingested like any other document.
Cost: embeddings are inexpensive; image captioning is the main cost, paid once per image at ingestion.

License

MIT — see LICENSE.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Related MCP Servers

Personal RAG MCP Server
RAG Systems Knowledge & Memory Vector Databases
timerickson
F
license
-
quality
B
maintenance
Enables storing and searching personal notes, documents, and snippets using semantic search and RAG capabilities across Claude Desktop, VS Code, and Open WebUI.
Last updated 2026-07-16
1
Claude RAG MCP Pipeline
RAG Systems Vector Databases Search
kenjisekino
A
license
-
quality
D
maintenance
Enables Claude Desktop to search and query personal document collections (PDF, Word, Markdown, text) using semantic search and conversational AI with full context preservation across exchanges.
Last updated 2025-09-20
MIT
Enterprise Knowledge MCP
RAG Systems Search
j84077200345-dotcom
A
license
-
quality
B
maintenance
Enables querying enterprise documents (DOCX, PDF, PPTX) using natural language, with hybrid search and MCP integration for Claude Desktop and other agents.
Last updated 2026-06-20
MIT
rag-mcp-server
RAG Systems Search Knowledge & Memory
Rubrum95
A
license
-
quality
D
maintenance
Enables Claude Code to index and semantically search through PDFs, code, and documents with exact citations and zero hallucinations.
Last updated 2026-03-12
MIT

View all related MCP servers

Related MCP Connectors

Rootr
Connect your team's living knowledge base — docs, data, issues, CRM — to Claude and ChatGPT.
Ragora
Search your knowledge bases from any AI assistant using hybrid RAG.
MyAITwin
A personal RAG database you build from chat, so AI creates work that sounds like you.

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/brett-hardiman/multi-modal-rag-project'

If you have feedback or need assistance with the MCP directory API, please join our Discord server