Which integrations are available for this server?

Integrates with Ollama to run local LLMs (e.g., llama3, mistral) and embedding models (e.g., qwen3-embedding), enabling document Q&A, summarization, data analysis, and entity extraction entirely offline.

How do I use mcp-rag-assistant?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@mcp-rag-assistant what were the revenue figures in Q1?" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

mcp-rag-assistant

by Hassan-Butt4356

Overview Schema Related Servers Score Discussions

Python

Local

MCP-Powered AI Assistant (Local — LlamaIndex + Ollama)

Privacy-first document intelligence: all models run locally via Ollama — no data leaves your machine.

What is MCP?

Model Context Protocol (MCP) is an open standard (by Anthropic) that defines how AI models discover and invoke tools at runtime. Think of it as a "USB-C port for AI" — any MCP-compatible client (Claude Desktop, your own agent, etc.) can connect to any MCP server and immediately use its tools.

MCP vs. Advanced RAG — What's the Difference?

Dimension	Advanced RAG	MCP
Purpose	Improve retrieval accuracy	Standardise tool/capability exposure
Core idea	Better chunking, re-ranking, hybrid search	JSON-RPC tool registry with discovery
What the LLM gets	Retrieved context injected into prompt	A menu of callable functions with schemas
Execution	Single pipeline (query → retrieve → generate)	Multi-step agent loop (plan → pick tool → call → observe → repeat)
Tools	Retrieval only	Any function: retrieval, APIs, databases, code
State	Stateless per query	Stateful agent sessions possible
This project	RAG is one tool inside the MCP server	MCP wraps 8 RAG tools, discoverable at runtime

In short: Advanced RAG makes retrieval smarter. MCP makes the entire AI system composable and interoperable.

Related MCP server: MCP RAG Server

Project Architecture

mcp_rag_assistant/
├── config.py               ← Central config (LLM, embed, chunking, server)
├── rag_engine.py           ← LlamaIndex: load docs → build index → query engine
├── main.py                 ← CLI entrypoint (serve / index / query / demo)
├── mcp_client.py           ← Example client that calls server tools
│
├── mcp_server/
│   └── server.py           ← HTTP JSON-RPC server exposing all tools
│
├── tools/
│   └── rag_tools.py        ← 8 MCP tool implementations
│
├── utils/
│   └── logger.py           ← Structured logging
│
├── my_data/                ← ⬅ DROP YOUR FILES HERE (PDF, DOCX, XLSX, CSV)
├── storage/                ← ChromaDB persistence (auto-created)
├── logs/                   ← Log files (auto-created)
│
├── requirements.txt
├── .env.example
├── .gitignore
└── README.md

Data Flow

User Query
    │
    ▼
MCP Client (mcp_client.py or Claude Desktop or your agent)
    │  JSON-RPC POST /mcp  {"method": "tools/call", "params": {...}}
    ▼
MCP Server (mcp_server/server.py)
    │  dispatches to matching tool function
    ▼
Tool Function (tools/rag_tools.py)
    │  calls get_query_engine().query(...)
    ▼
LlamaIndex Query Engine (rag_engine.py)
    │  embeds query with qwen3-embedding:0.6b via Ollama
    ▼
ChromaDB Vector Store
    │  returns top-K similar chunks
    ▼
Ollama LLM (llama3 or mistral)
    │  synthesises answer from retrieved context
    ▼
JSON response back through MCP → Client

Available MCP Tools

Tool	Description
`query_documents`	General Q&A over all indexed documents
`list_indexed_files`	Show files in `my_data/`
`rebuild_index`	Re-index after adding/removing files
`summarize_document`	Summarise a specific file by name
`analyze_data`	Plain-English data analysis (CSV/XLSX)
`generate_report`	Generate summary / detailed / executive report
`compare_documents`	Compare two documents on a given aspect
`extract_entities`	Extract people, orgs, dates, numbers

Prerequisites

Python 3.10+
Ollama running locally — ollama.com
Models already pulled (you have these):
- llama3:latest
- mistral:latest
- qwen3-embedding:0.6b

Setup

# 1. Clone / unzip the project
cd mcp_rag_assistant

# 2. Create virtual environment
python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure (optional — defaults work out of the box)
cp .env.example .env
# Edit .env to change models, ports, chunk sizes etc.

# 5. Add your documents
#    Copy PDFs, DOCX, XLSX, CSV files into:
#    my_data/

# 6. Build the index
python main.py index

# 7. Start the MCP server
python main.py serve

Usage

Start the server

python main.py serve
# MCP server listening on http://0.0.0.0:8080

One-shot query (no server needed)

python main.py query "What are the key findings in the Q1 report?"

Run the demo client (server must be running)

# In a second terminal:
python main.py demo

Rebuild index after adding new files

python main.py index
# or via MCP tool:
# call rebuild_index tool from any client

Health check

GET http://localhost:8080/health
GET http://localhost:8080/tools

Switching Models

Edit config.py or your .env:

# Use mistral instead of llama3
LLM_MODEL=mistral:latest

# Use nomic-embed-text for embeddings
EMBED_MODEL=nomic-embed-text:latest

Tuning Chunk Size

In config.py or .env:

Setting	Default	Notes
`CHUNK_SIZE`	256	Tokens per chunk. Smaller = more precise retrieval
`CHUNK_OVERLAP`	25	Overlap between chunks. Helps preserve context at boundaries
`SIMILARITY_TOP_K`	5	Chunks retrieved per query
`RESPONSE_MODE`	compact	`compact` \| `tree_summarize` \| `refine`

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Hassan-Butt4356/mcp-rag-assistant'

If you have feedback or need assistance with the MCP directory API, please join our Discord server