MCP RAG Server

README.md•8.34 kB

# MCP Server with FAISS for RAG This project provides a proof-of-concept implementation of a Machine Conversation Protocol (MCP) server that allows an AI agent to query a vector database and retrieve relevant documents for Retrieval-Augmented Generation (RAG). ## Features - FastAPI server with MCP endpoints - FAISS vector database integration - Document chunking and embedding - GitHub Move file extraction and processing - LLM integration for complete RAG workflow - Simple client example - Sample documents ## Installation ### Using pipx (Recommended) [pipx](https://pypa.github.io/pipx/) is a tool to help you install and run Python applications in isolated environments. 1. First, install pipx if you don't have it: ```bash # On macOS brew install pipx pipx ensurepath # On Ubuntu/Debian sudo apt update sudo apt install python3-pip python3-venv python3 -m pip install --user pipx python3 -m pipx ensurepath # On Windows with pip pip install pipx pipx ensurepath ``` 2. Install the MCP Server package directly from the project directory: ```bash # Navigate to the directory containing the mcp_server folder cd /path/to/mcp-server-project # Install in editable mode pipx install -e . ``` 3. (Optional) Configure environment variables: - Copy `.env.example` to `.env` - Add your GitHub token for higher rate limits: `GITHUB_TOKEN=your_token_here` - Add your OpenAI or other LLM API key for RAG integration: `OPENAI_API_KEY=your_key_here` ### Manual Installation If you prefer not to use pipx: 1. Clone the repository 2. Install dependencies: ```bash cd mcp_server pip install -r requirements.txt ``` ## Usage with pipx After installing with pipx, you'll have access to the following commands: ### Downloading Move Files from GitHub ```bash # Download Move files with default settings mcp-download --query "use sui" --output-dir docs/move_files # Download with more options mcp-download --query "module sui::coin" --max-results 50 --new-index --verbose ``` ### Improved GitHub Search and Indexing (Recommended) ```bash # Search GitHub and index files with default settings mcp-search-index --keywords "sui move" # Search multiple keywords and customize options mcp-search-index --keywords "sui move,move framework" --max-repos 30 --output-results --verbose # Save search results and use a custom index location mcp-search-index --keywords "sui coin,sui::transfer" --index-file custom/path/index.bin --output-results ``` The `mcp-search-index` command provides enhanced GitHub repository search capabilities: - Searches repositories first, then recursively extracts Move files - Supports multiple search keywords (comma-separated) - Intelligently filters for Move files containing "use sui" references - Always rebuilds the vector database after downloading ### Indexing Move Files ```bash # Index files in the default location mcp-index # Index with custom options mcp-index --docs-dir path/to/files --index-file path/to/index.bin --verbose ``` ### Querying the Vector Database ```bash # Basic query mcp-query "What is a module in Sui Move?" # Advanced query with options mcp-query "How do I define a struct in Sui Move?" -k 3 -f ``` ### Using RAG with LLM Integration ```bash # Basic RAG query (will use simulated LLM if no API key is provided) mcp-rag "What is a module in Sui Move?" # Using with a specific LLM API mcp-rag "How do I define a struct in Sui Move?" --api-key your_api_key --top-k 3 # Output as JSON for further processing mcp-rag "What are the benefits of sui::coin?" --output-json > rag_response.json ``` ### Running the Server ```bash # Start the server with default settings mcp-server # Start with custom settings mcp-server --host 127.0.0.1 --port 8080 --index-file custom/path/index.bin ``` ## Manual Usage (without pipx) ### Starting the server ```bash cd mcp_server python main.py ``` The server will start on http://localhost:8000 ### Downloading Move Files from GitHub To download Move files from GitHub and populate your vector database: ```bash # Download Move files with default query "use sui" ./run.sh --download-move # Customize the search query ./run.sh --download-move --github-query "module sui::coin" --max-results 50 # Download, index, and start the server ./run.sh --download-move --index ``` You can also use the Python script directly: ```bash python download_move_files.py --query "use sui" --output-dir docs/move_files ``` ### Indexing documents Before querying, you need to index your documents. You can place your text files (.txt), Markdown files (.md), or Move files (.move) in the `docs` directory. To index the documents, you can either: 1. Use the run script with the `--index` flag: ```bash ./run.sh --index ``` 2. Use the index script directly: ```bash python index_move_files.py --docs-dir docs/move_files --index-file data/faiss_index.bin ``` ### Querying documents You can use the local query script: ```bash python local_query.py "What is RAG?" # With more options python local_query.py -k 3 -f "How to define a struct in Sui Move?" ``` ### Using RAG with LLM Integration ```bash # Direct RAG query with an LLM python rag_integration.py "What is a module in Sui Move?" --index-file data/faiss_index.bin # With API key (if you have one) OPENAI_API_KEY=your_key_here python rag_integration.py "How do coins work in Sui?" ``` ### MCP API Endpoint The MCP API endpoint is available at `/mcp/action`. You can use it to perform different actions: - `retrieve_documents`: Retrieve relevant documents for a query - `index_documents`: Index documents from a directory Example: ```bash curl -X POST "http://localhost:8000/mcp/action" -H "Content-Type: application/json" -d '{"action_type": "retrieve_documents", "payload": {"query": "What is RAG?", "top_k": 3}}' ``` ## Complete RAG Pipeline The full RAG (Retrieval-Augmented Generation) pipeline works as follows: 1. **Search Query**: The user submits a question 2. **Retrieval**: The system searches the vector database for relevant documents 3. **Context Formation**: Retrieved documents are formatted into a prompt 4. **LLM Generation**: The prompt is sent to an LLM with the retrieved context 5. **Enhanced Response**: The LLM provides an answer based on the retrieved information This workflow is fully implemented in the `rag_integration.py` module, which can be used either through the command line or as a library in your own applications. ## GitHub Move File Extraction The system can extract Move files from GitHub based on search queries. It implements two methods: 1. **GitHub API** (preferred): Requires a GitHub token for higher rate limits 2. **Web Scraping fallback**: Used when API method fails or when no token is provided To configure your GitHub token, set it in the `.env` file or as an environment variable: ``` GITHUB_TOKEN=your_github_token_here ``` ## Project Structure ``` mcp_server/ ├── __init__.py # Package initialization ├── main.py # Main server file ├── mcp_api.py # MCP API implementation ├── index_move_files.py # File indexing utility ├── local_query.py # Local query utility ├── download_move_files.py # GitHub Move file extractor ├── rag_integration.py # LLM integration for RAG ├── pyproject.toml # Package configuration ├── requirements.txt # Dependencies ├── .env.example # Example environment variables ├── README.md # This file ├── data/ # Storage for the FAISS index ├── docs/ # Sample documents │ └── move_files/ # Downloaded Move files ├── models/ # Model implementations │ └── vector_store.py # FAISS vector store implementation └── utils/ ├── document_processor.py # Document processing utilities └── github_extractor.py # GitHub file extraction utilities ``` ## Extending the Project To extend this proof-of-concept: 1. Add authentication and security features 2. Implement more sophisticated document processing 3. Add support for more document types 4. Integrate with other LLM providers 5. Add monitoring and logging 6. Improve the Move language parsing for more structured data extraction ## License MIT

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ProbonoBonobo/sui-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server