rag-mcp
Allows ingestion of YouTube video transcripts and metadata for semantic search and retrieval.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@rag-mcpretrieve information about RAG systems"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
⚡ RAG-MCP
Persistent memory for MCP clients, powered by retrieval-augmented generation.
RAG-MCP turns documents, notes, web pages, transcripts, and local files into a searchable knowledge layer that MCP-compatible clients can ingest, retrieve, and manage over time. It is designed for assistants that need memory beyond a single chat session.
Overview
RAG-MCP is an MCP server that provides a practical memory and retrieval layer for AI clients.
It supports:
ingestion from raw text
ingestion from URLs
ingestion from YouTube transcripts
ingestion from local files
semantic retrieval with optional source metadata
document listing, searching, deletion, and status inspection
browser-based secure upload sessions for document ingestion
Prometheus-compatible metrics for runtime visibility
At a high level, the system parses content, chunks it, embeds it, stores vectors in ChromaDB, stores metadata in SQLite, and exposes the entire workflow through MCP tools.
Related MCP server: Simple Memory Extension MCP Server
Why this exists
Most MCP clients are excellent at reasoning in the moment, but weak at remembering useful context across sessions.
RAG-MCP solves that by giving clients a persistent, queryable memory layer.
Use it when you want to:
give an assistant long-term memory across conversations
search documentation, notes, transcripts, or uploaded files semantically
attach citations and source metadata to retrieval results
keep knowledge isolated by namespace for teams, projects, or environments
support both direct ingestion and user-friendly browser uploads
Core capabilities
Ingestion
Store knowledge from:
Text via
ingest_textWeb pages via
ingest_urlYouTube transcripts via
ingest_youtubeLocal files via
ingest_fileBrowser upload sessions via
create_upload_session+ upload UI
Supported local file types:
.txt.md.markdown.pdf.docx.doc
Retrieval
Query stored knowledge using:
retrievefor compact semantic matchesretrieve_with_sourcesfor source-aware responses with document and chunk metadata
Document management
Manage the knowledge base with:
list_documentssearch_documentsdelete_documentget_ingestion_statuscheck_upload_status
Runtime features
Streamable HTTP MCP transport at
/mcpSSE MCP transport at
/sse//messagesUpload UI under
/uploadMetrics endpoint at
/metrics
Architecture-level mental model
Think of RAG-MCP as a dedicated memory service for MCP clients:
Ingest content from text, files, URLs, or YouTube
Parse and normalize the content into plain text
Chunk the text into retrievable segments
Embed the chunks into vector representations
Store vectors in ChromaDB
Store metadata in SQLite
Query semantically and return either compact or citation-rich results
This makes the system practical for assistants that need to remember information across time without relying on chat history alone.
Quick start
Local development
python -m venv .venv
. .venv/bin/activate
pip install -e "[dev]"
cp .env.example .env
python -m rag_mcp.mainVerify the server:
curl -i http://127.0.0.1:8080/mcp
curl -i http://127.0.0.1:8080/sse
curl -i http://127.0.0.1:8080/metricsOptional extras
Install optional parsing extras when needed:
pip install -e ".[pdf]"
pip install -e ".[docx]"Docker usage
Run with Docker Compose
docker compose up --build -d
docker compose psCheck the running service
curl -i http://127.0.0.1:8080/metrics
curl -i http://127.0.0.1:8080/mcpStop the stack
docker compose downThe Compose setup mounts persistent storage for:
ChromaDB vectors
SQLite metadata database
Configuration
Configuration is managed through environment variables and loaded by Settings.
Start by copying the sample file:
cp .env.example .envCommon settings
RAG_MCP_CHROMA_PATH=/data/chroma
RAG_MCP_METADATA_DB_PATH=/data/metadata.db
RAG_MCP_LOG_LEVEL=INFO
RAG_MCP_EMBEDDING_MODEL=all-MiniLM-L6-v2
RAG_MCP_METRICS_ENABLED=true
RAG_MCP_METRICS_PATH=/metrics
RAG_MCP_METRICS_REQUIRE_AUTH=false
RAG_MCP_UPLOAD_SESSION_SECRET=change-me-in-productionImportant notes
RAG_MCP_UPLOAD_SESSION_SECRETshould always be set explicitly in real deployments.If metrics auth is enabled, configure the metrics token as well.
Chroma and SQLite paths should point to persistent storage in containerized environments.
Upload Documents (UI)
RAG-MCP includes a browser-based upload flow for cases where direct local file ingestion is not convenient.
The flow is:
Call
create_upload_sessionOpen the returned secure upload URL in a browser
Upload supported files
Poll
check_upload_statusif needed

This is especially useful when:
the MCP client cannot directly access a file path
the user wants a friendlier document upload flow
files need to be uploaded from another machine or browser session
Upload behavior
invalid or expired session token returns an error
unsupported files are rejected during parsing
upload limits are enforced for file count and size
indexed files are written into the target namespace
MCP tool usage patterns
1. Ingest text directly
{
"name": "ingest_text",
"arguments": {
"title": "Team Notes",
"namespace": "default",
"text": "Release checklist: create tag, run tests, publish image"
}
}2. Ingest a web page
{
"name": "ingest_url",
"arguments": {
"url": "https://example.com/docs",
"namespace": "docs"
}
}3. Retrieve compact results
{
"name": "retrieve",
"arguments": {
"query": "How does release publishing work?",
"namespace": "default",
"top_k": 5
}
}4. Retrieve with sources
{
"name": "retrieve_with_sources",
"arguments": {
"query": "What are the deployment steps?",
"namespace": "docs",
"top_k": 5
}
}5. List stored documents
{
"name": "list_documents",
"arguments": {
"namespace": "docs",
"limit": 20
}
}6. Create an upload session
{
"name": "create_upload_session",
"arguments": {
"namespace": "project-x"
}
}Recommended usage pattern
A common lifecycle looks like this:
ingest into a namespace
retrieve against the same namespace
inspect with
list_documentsdelete or re-ingest as documents change
Observability / metrics
The service exposes Prometheus-compatible metrics at /metrics.
Current instrumentation includes request-level visibility such as:
total HTTP requests
request latency histogram
in-flight requests
exception counters
default Python/process metrics from the Prometheus client runtime
Example:
curl -i http://127.0.0.1:8080/metricsThis makes it straightforward to plug RAG-MCP into:
Prometheus
Grafana
container monitoring dashboards
local ops/debugging workflows
Security notes
RAG-MCP includes practical safeguards for production-style deployments:
SSRF protection for URL ingestion
signed upload session tokens with expiry
upload file count and size limits
optional metrics authentication and CIDR controls
request rate limiting for sensitive paths like upload and metrics
Operational recommendations:
set a strong
RAG_MCP_UPLOAD_SESSION_SECRETkeep metrics private or authenticated in shared environments
use persistent storage for
/datarun behind a reverse proxy when exposing publicly
Troubleshooting
Upload UI says static files are missing
If the upload page does not render correctly, rebuild and restart after updating the image:
docker compose build
docker compose up -d --force-recreate/metrics returns 503
If metrics auth is enabled without the required token configuration, the endpoint can fail closed. Check your .env values.
/mcp returns a redirect
That is expected. The server supports transport-specific behavior and may redirect to the canonical mounted route.
URL ingestion fails
Private IPs, loopback targets, metadata endpoints, and blocked schemes are intentionally rejected by SSRF validation.
Retrieval returns empty results
Check these in order:
confirm ingestion completed successfully
confirm you are querying the correct namespace
broaden the query wording
increase
top_kverify the document exists with
list_documents
Repository structure
Useful entry points:
Contributing
Contributions are welcome.
A solid contribution workflow is:
python -m venv .venv
. .venv/bin/activate
pip install -e "[dev]"
pytestBefore opening a PR:
keep changes focused
verify the local server still starts
run tests
update docs when behavior changes
License
MIT — see LICENSE.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/mrankitvish/RAG-MCP'
If you have feedback or need assistance with the MCP directory API, please join our Discord server