Which integrations are available for this server?

Allows using Ollama's embedding models for document indexing and retrieval, enabling local RAG capabilities. Supports OpenAI's embedding API as an embedding provider for generating document embeddings in RAG workflows.

How do I use mcp-rag-server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@mcp-rag-server query my documents for details on quantum computing" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

mcp-rag-server

by kwanLeeFrmVi

Overview Schema Related Servers Score Discussions

TypeScript

Local

mcp-rag-server

npm version License Node.js Version

A Model Context Protocol (MCP) server that enables Retrieval Augmented Generation (RAG). It indexes your documents and serves relevant context to Large Language Models via the MCP protocol.

Integration Examples

Generic MCP Client Configuration

{
  "mcpServers": {
    "rag": {
      "command": "npx",
      "args": ["-y", "mcp-rag-server"],
      "env": {
        "BASE_LLM_API": "http://localhost:11434/v1",
        "EMBEDDING_MODEL": "nomic-embed-text",
        "VECTOR_STORE_PATH": "./vector_store",
        "CHUNK_SIZE": "500"
      }
    }
  }
}

Example Interaction

# Index documents
>> tool:embedding_documents {"path":"./docs"}

# Check status
>> resource:embedding-status

<< rag://embedding/status
Current Path: ./docs/file1.md
Completed: 10
Failed: 0
Total chunks: 15
Failed Reason:

Related MCP server: mcp-docs

Features

Index documents in .txt, .md, .json, .jsonl, and .csv formats
Customizable chunk size for splitting text
Local vector store powered by SQLite (via LangChain's LibSQLVectorStore)
Supports multiple embedding providers (OpenAI, Ollama, Granite, Nomic)
Exposes MCP tools and resources over stdio for seamless integration with MCP clients

Installation

From npm

npm install -g mcp-rag-server

From Source

git clone https://github.com/kwanLeeFrmVi/mcp-rag-server.git
cd mcp-rag-server
npm install
npm run build
npm start

Quick Start

export BASE_LLM_API=http://localhost:11434/v1
export EMBEDDING_MODEL=granite-embedding-278m-multilingual-Q6_K-1743674737397:latest
export VECTOR_STORE_PATH=./vector_store
export CHUNK_SIZE=500

# Run (global install)
mcp-rag-server

# Or via npx
npx mcp-rag-server

💡 Tip: We recommend using Ollama for embedding. Install and pull the nomic-embed-text model:

ollama pull nomic-embed-text
export EMBEDDING_MODEL=nomic-embed-text

Configuration

Variable	Description	Default
`BASE_LLM_API`	Base URL for embedding API	`http://localhost:11434/v1`
`LLM_API_KEY`	API key for your LLM provider	(empty)
`EMBEDDING_MODEL`	Embedding model identifier	`nomic-embed-text`
`VECTOR_STORE_PATH`	Directory for local vector store	`./vector_store`
`CHUNK_SIZE`	Characters per text chunk (number)	`500`

💡 Recommendation: Use Ollama embedding models like nomic-embed-text for best performance.

Usage

MCP Tools

Once running, the server exposes these tools via MCP:

embedding_documents(path: string): Index documents under the given path
query_documents(query: string, k?: number): Retrieve top k chunks (default 15)
remove_document(path: string): Remove a specific document
remove_all_documents(confirm: boolean): Clear the entire index (confirm=true)
list_documents(): List all indexed document paths

MCP Resources

Clients can also read resources via URIs:

rag://documents — List all document URIs
rag://document/{path} — Fetch full content of a document
rag://query-document/{numberOfChunks}/{query} — Query documents as a resource
rag://embedding/status — Check current indexing status (completed, failed, total)

How RAG Works

Indexing: Reads files, splits text into chunks based on CHUNK_SIZE, and queues them for embedding.
Embedding: Processes each chunk sequentially against the embedding API, storing vectors in SQLite.
Querying: Embeds the query and retrieves nearest text chunks from the vector store, returning them to the client.

Development

npm install
npm run build      # Compile TypeScript
npm start          # Run server
npm run watch      # Watch for changes

Contributing

Contributions are welcome! Please open issues or pull requests on GitHub.

License

MIT 2025 Quan Le

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kwanLeeFrmVi/mcp-rag-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

mcp-rag-server

Integration Examples

Generic MCP Client Configuration

Example Interaction

Table of Contents

Features

Installation

From npm

From Source

Quick Start

Configuration

Usage

MCP Tools

MCP Resources

How RAG Works

Development

Contributing

License

Maintenance

Resources

Looking for Admin?

Latest Blog Posts

MCP directory API