RAG MCP Server (Pinecone)
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@RAG MCP Server (Pinecone)search for 'quarterly results' in my PDFs"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
RAG MCP Server (Pinecone)
FastMCP server exposing search_docs and ask_docs tools backed by a Pinecone
vector index. Embeddings (Qwen3-Embedding-0.6B) and answer generation
(Qwen2.5-1.5B-Instruct GGUF) run locally via transformers / llama.cpp — no
LLM API key required, only a Pinecone account.
Setup
Copy
.env.exampleto.envand fill in:PINECONE_KEY=your_pinecone_api_key PINECONE_INDEX=your_index_nameDrop PDFs into
data/.Install deps and ingest:
uv sync uv run python ingest.pyRun the server:
uv run python main.pyQuery it:
uv run python client.py "your question"
Related MCP server: PDF RAG MCP Server
Docker
Build and run with Docker Compose (recommended — persists the HuggingFace model cache in a named volume so models aren't re-downloaded on every restart):
docker compose up --buildThe server listens on http://localhost:8000/mcp. data/ is mounted as a
volume, so PDFs added on the host are visible inside the container.
To ingest PDFs into Pinecone from inside the running container:
docker compose exec rag-mcp-server uv run python ingest.pyWithout Compose
docker build -t rag-mcp-server .
docker run --rm -p 8000:8000 --env-file .env -v "$(pwd)/data:/app/data" rag-mcp-serverNotes
The image installs build tools to compile
llama-cpp-python; first build is slow, subsequent ones are cached.Models are downloaded from HuggingFace on first run, not baked into the image — mount a volume over
/root/.cache/huggingface(already done indocker-compose.yml) to avoid re-downloading.MCP_HOST/MCP_PORTenv vars override the listen address (default0.0.0.0:8000in the container,127.0.0.1:8000for localuv run).
Files
ingest.py— chunk PDFs fromdata/, embed, upsert into Pinecone.server.py— MCP toolssearch_docs(retrieval + rerank) andask_docs(retrieval + local generation).models.py— local embedding/generation models.main.py— server entrypoint.client.py— example MCP client for manual testing.eval.py— Ragas evaluation (Faithfulness, AnswerRelevancy) of the RAG pipeline using the local Qwen models; writeseval_results.csv.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/sujalg4888/rag_based_mcp_server_using_pinecone'
If you have feedback or need assistance with the MCP directory API, please join our Discord server