rag
Provides semantic search and RAG over markdown documentation using Ollama for embeddings and chat.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@ragquery how to set up the project"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
rag
rag is a CLI tool and MCP server that turns markdown documentation into a searchable, queryable knowledge base.
It chunks .md files by heading, embeds them via Ollama, stores vectors in LanceDB, and exposes search + RAG through both a terminal CLI and MCP.
Prerequisites
Minimum hardware
Component | Requirement |
RAM | 4 GB (8 GB for larger doc sets) |
CPU | Any x86-64 or ARM64, 2+ cores |
GPU | Optional. Any NVIDIA GPU with 2+ GB VRAM. CPU-only fallback is functional but slower |
Disk | 100 MB for index (scales with doc count) |
Indexing 5000 chunks: ~25s on RTX 3060, ~3min on CPU-only.
Related MCP server: Knowledge Base MCP Server
Install
git clone https://github.com/FrameMuse/llm-rag.git
cd llm-rag
bun installAdd shell alias:
alias rag='bun /path/to/llm-rag/scripts/cli.ts'Quick start
cd my-docs-project
rag init # create .rag/ project scope
rag index # chunk, embed, index all .md files
rag mcp search "..." # semantic search
rag mcp query "..." # RAG: synthesize answer from docsCommands
Command | Description |
| Create .rag/ config, mcp.json, .gitignore |
| Chunk files by heading, embed via Ollama, store in LanceDB |
| Start MCP server (STDIO) for current .rag/ scope |
| One-shot CLI proxy for MCP tools |
| Show index statistics |
| Show usage |
rag mcp tools
Tool | Usage | Description |
|
| Semantic vector search |
|
| RAG: retrieve chunks, synthesize answer |
|
| List all indexed files |
|
| Show full document content |
|
| Print mcp.json for opencode.json adoption |
Project scope (.rag/)
project/
├── .rag/
│ ├── config.json # { name, embedModel, ragModel, pattern }
│ ├── mcp.json # MCP config snippet for opencode.json
│ ├── .gitignore # *
│ └── data/lancedb/ # Vector index (generated by rag index)
├── *.md
└── ...Each project keeps its index local. rag discovers .rag/ by walking up from current directory (like git).
MCP integration
Register in opencode.json:
{
"mcp": {
"my-docs": {
"type": "local",
"command": ["rag", "serve"],
"cwd": "/path/to/project",
"enabled": true
}
}
}Run rag mcp config from project directory to print the snippet with cwd pre-filled.
Architecture
flowchart LR
MD[.md files] --> Chunker
Chunker -->|heading split| Chunks
Chunks -->|Ollama embed| Vectors
Vectors -->|store| LanceDB
Query -->|embed| LanceDB
LanceDB -->|search| Results
Question -->|embed + search| Context
Context -->|Ollama chat| AnswerChunker: splits by
##/###headings, preserves heading hierarchy, merges tiny sectionsEmbedder: Ollama
/api/embedin batches of 20, truncates to 500 tokens per chunkStore: LanceDB embedded vector database (no external server)
RAG: retrieve top 8 chunks, build context prompt, call Ollama chat for synthesis
Configuration
.rag/config.json:
{
"name": "my-docs",
"embedModel": "nomic-embed-text",
"ragModel": "llama3.2:3b",
"pattern": "*.md"
}Models auto-pull if missing. Override via rag init or edit config.json directly.
License
MIT
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/FrameMuse/llm-rag'
If you have feedback or need assistance with the MCP directory API, please join our Discord server