# RAGStack MCP Server
MCP (Model Context Protocol) server for RAGStack knowledge bases. Enables AI assistants to search, chat, upload documents/media, and scrape your knowledge base.
## Installation
```bash
# Using uvx (recommended - no install needed)
uvx ragstack-mcp
# Or install globally
pip install ragstack-mcp
```
## Configuration
Get your GraphQL endpoint and API key from the RAGStack dashboard:
**Settings → API Key**
### Claude Desktop
Edit `~/Library/Application Support/Claude/claude_desktop_config.json` (Mac) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows):
```json
{
"mcpServers": {
"ragstack-kb": {
"command": "uvx",
"args": ["ragstack-mcp"],
"env": {
"RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql",
"RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx"
}
}
}
}
```
### Amazon Q CLI
Edit `~/.aws/amazonq/mcp.json`:
```json
{
"mcpServers": {
"ragstack-kb": {
"command": "uvx",
"args": ["ragstack-mcp"],
"env": {
"RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql",
"RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx"
}
}
}
}
```
### Cursor
Open **Settings → MCP Servers → Add Server**, or edit `.cursor/mcp.json`:
```json
{
"ragstack-kb": {
"command": "uvx",
"args": ["ragstack-mcp"],
"env": {
"RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql",
"RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx"
}
}
}
```
### VS Code + Cline
Edit `.vscode/cline_mcp_settings.json`:
```json
{
"mcpServers": {
"ragstack-kb": {
"command": "uvx",
"args": ["ragstack-mcp"],
"env": {
"RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql",
"RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx"
}
}
}
}
```
### VS Code + Continue
Edit `~/.continue/config.json`, add to `mcpServers` array:
```json
{
"mcpServers": [
{
"name": "ragstack-kb",
"command": "uvx",
"args": ["ragstack-mcp"],
"env": {
"RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql",
"RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx"
}
}
]
}
```
## Available Tools
### search_knowledge_base
Search for relevant documents in the knowledge base.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `query` | string | Yes | - | The search query |
| `max_results` | int | No | 5 | Maximum results to return |
### chat_with_knowledge_base
Ask questions and get AI-generated answers with source citations.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `query` | string | Yes | - | Your question |
| `conversation_id` | string | No | null | ID to maintain conversation context |
### start_scrape_job
Scrape a website into the knowledge base.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `url` | string | Yes | - | Starting URL to scrape |
| `max_pages` | int | No | 50 | Maximum pages to scrape |
| `max_depth` | int | No | 3 | How deep to follow links (0 = start page only) |
| `scope` | string | No | "HOSTNAME" | `SUBPAGES`, `HOSTNAME`, or `DOMAIN` |
| `include_patterns` | list[str] | No | null | Only scrape URLs matching these glob patterns |
| `exclude_patterns` | list[str] | No | null | Skip URLs matching these glob patterns |
| `scrape_mode` | string | No | "AUTO" | `AUTO`, `FAST` (HTTP only), or `FULL` (browser) |
| `cookies` | string | No | null | Cookie string for authenticated sites |
| `force_rescrape` | bool | No | false | Re-scrape even if content unchanged |
**Scope values:**
- `SUBPAGES` - Only URLs under the starting path
- `HOSTNAME` - All pages on the same subdomain
- `DOMAIN` - All subdomains of the domain
**Scrape mode values:**
- `AUTO` - Try fast mode, fall back to full for SPAs
- `FAST` - HTTP only, faster but may miss JavaScript content
- `FULL` - Uses headless browser, handles all JavaScript
### get_scrape_job_status
Check the status of a scrape job.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `job_id` | string | Yes | The scrape job ID |
### list_scrape_jobs
List recent scrape jobs.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `limit` | int | No | 10 | Maximum jobs to return |
### upload_document_url
Get a presigned URL to upload a document or media file.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `filename` | string | Yes | Name of the file (e.g., 'report.pdf', 'meeting.mp4') |
**Supported formats:**
- Documents: PDF, DOCX, XLSX, HTML, TXT, CSV, JSON, XML, EML, EPUB, Markdown
- Images: JPG, PNG, GIF, WebP, AVIF, BMP, TIFF
- Video: MP4, WebM
- Audio: MP3, WAV, M4A, OGG, FLAC
Video/audio files are transcribed using AWS Transcribe and segmented for search.
### upload_image_url
Get a presigned URL to upload an image (step 1 of image upload workflow).
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `filename` | string | Yes | Name of the image file (e.g., 'photo.jpg') |
Supported formats: JPEG, PNG, GIF, WebP, AVIF, BMP, TIFF
### generate_image_caption
Generate an AI caption for an uploaded image using a vision model (step 2, optional).
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `s3_uri` | string | Yes | S3 URI returned by upload_image_url |
### submit_image
Finalize an image upload and trigger indexing (step 3).
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `image_id` | string | Yes | - | Image ID from upload_image_url |
| `caption` | string | No | null | Primary caption |
| `user_caption` | string | No | null | User-provided caption |
| `ai_caption` | string | No | null | AI-generated caption |
---
## Configuration Tools (Read-Only)
### get_configuration
Get all current RAGStack configuration settings organized by category.
Returns settings for:
- **Chat:** Models, quotas, system prompt, document access
- **Metadata Extraction:** Enabled, model, mode (auto/manual), max keys
- **Query-Time Filtering:** Filter generation, multi-slice retrieval settings
- **Public Access:** Which endpoints allow unauthenticated access
- **Document Processing:** OCR backend, image caption prompt
- **Media Processing:** Transcribe language, speaker diarization, segment duration
- **Budget:** Alert thresholds
**Note:** Read-only. To modify settings, use the admin dashboard (Cognito auth required).
---
## Metadata Analysis Tools
These tools help understand and optimize metadata extraction and filtering.
### get_metadata_stats
Get statistics about metadata keys extracted from documents.
Returns key names, data types, occurrence counts, sample values, and status.
### get_filter_examples
Get AI-generated filter examples for metadata-based search queries.
Returns filter patterns with name, description, use case, and JSON filter syntax.
**Filter syntax reference:**
- Basic operators: `$eq`, `$ne`, `$gt`, `$gte`, `$lt`, `$lte`, `$in`, `$nin`, `$exists`
- Logical operators: `$and`, `$or`
- Example: `{"topic": {"$eq": "genealogy"}}`
### get_key_library
Get the complete metadata key library with all discovered keys.
Returns all keys available for filtering with data types and sample values.
### check_key_similarity
Check if a proposed metadata key is similar to existing keys.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `key_name` | string | Yes | - | Proposed key name to check |
| `threshold` | float | No | 0.8 | Similarity threshold (0.0-1.0) |
Use this before adding documents with new keys to avoid duplicates.
### analyze_metadata
Trigger metadata analysis to discover keys and generate filter examples.
**Note:** This is a long-running operation (1-2 minutes). It samples up to 1000 vectors and uses LLM analysis.
Run this after ingesting new documents or when filter generation isn't working as expected.
---
## Usage Examples
Once configured, just ask your AI assistant naturally:
**Search & Chat:**
- "Search my knowledge base for authentication best practices"
- "What does our documentation say about API rate limits?"
- "What was discussed in the team meeting about deadlines?" (searches video/audio transcripts)
**Web Scraping:**
- "Scrape the React docs at react.dev/reference"
- "Check the status of my scrape job"
**Document, Image & Media Upload:**
- "Upload a new document called quarterly-report.pdf"
- "Upload this image and generate a caption for it"
- "Upload the meeting recording meeting-2024-01.mp4"
**Metadata Analysis:**
- "What metadata keys are available for filtering?"
- "Analyze the metadata in my knowledge base"
- "Show me the filter examples"
- "Check if 'author' is similar to any existing keys"
**Configuration:**
- "What are my current RAGStack settings?"
- "What model is being used for chat?"
- "Is multi-slice retrieval enabled?"
- "What are my quota limits?"
- "What language is configured for transcription?"
## Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `RAGSTACK_GRAPHQL_ENDPOINT` | Yes | Your RAGStack GraphQL API URL |
| `RAGSTACK_API_KEY` | Yes | Your RAGStack API key |
## Development
```bash
# Clone and install
cd src/ragstack-mcp
uv sync
# Run locally
uv run ragstack-mcp
# Build package
uv build
# Publish to PyPI
uv publish
```
## License
MIT