We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/chakradharkalle03-arch/MCP2'
If you have feedback or need assistance with the MCP directory API, please join our Discord server
# Project Summary: MCP-Based RAG System
## π― Main Goal
**Demonstrate how MCP (Model Context Protocol) works and its purpose** through a working RAG (Retrieval-Augmented Generation) system for semiconductor component search.
## β
Requirements Implemented
### 1. β
MCP (Model Context Protocol)
- **MCP Server** (`mcp_server.py`): Implements MCP protocol for structured context retrieval
- **MCP Client** (`mcp_client_example.py`): Demonstrates how to use MCP for querying
- **Purpose**: Shows how MCP provides standardized tool-based interface for context retrieval
### 2. β
ChromaDB
- **Vector Database**: Used for storing and retrieving semantic embeddings
- **Collection**: `semiconductor_components` collection for document storage
- **Integration**: Fully integrated with RAG pipeline for semantic search
### 3. β
Llama Model (Decoding)
- **Primary**: Attempts to load `meta-llama/Llama-2-7b-chat-hf` from HuggingFace
- **Fallback**: Uses GPT-2 if Llama is not accessible
- **Purpose**: Generates answers based on retrieved context
### 4. β
Encoding Model (Embeddings)
- **Model**: `sentence-transformers/all-MiniLM-L6-v2`
- **Purpose**: Converts text to embeddings for semantic search
- **Source**: HuggingFace Hub
### 5. β
Backend API
- **Framework**: FastAPI
- **Endpoints**:
- `POST /upload`: Upload Excel documents
- `POST /ask`: Ask questions and get answers
- `GET /info`: Get collection information
- `GET /health`: Health check
- **Features**: File upload, question-answering, RAG integration
### 6. β
RAG Flow with MCP
- **Document Processing**: Excel β Text chunks β Embeddings β ChromaDB
- **Query Processing**: Question β Embeddings β ChromaDB retrieval β LLM generation
- **MCP Integration**: Demonstrates MCP protocol for context retrieval
### 7. β
Example Excel Document
- **File**: `examples/semiconductor_components.xlsx`
- **Content**: 10 semiconductor components with details
- **Fields**: Component ID, Name, Category, Manufacturer, Part Number, Ratings, etc.
## π Project Structure
```
MCP2/
βββ main.py # FastAPI backend server
βββ rag_pipeline.py # RAG pipeline (embeddings + LLM)
βββ mcp_server.py # MCP server for ChromaDB
βββ mcp_client_example.py # Example MCP client usage
βββ config.py # Configuration (API keys, models)
βββ create_example_excel.py # Generate example Excel file
βββ test_api.py # API testing script
βββ requirements.txt # Python dependencies
βββ README.md # Main documentation
βββ USAGE.md # Detailed usage guide
βββ PROJECT_SUMMARY.md # This file
βββ .gitignore # Git ignore rules
βββ examples/ # Example Excel files
β βββ semiconductor_components.xlsx
βββ uploads/ # Uploaded files (created at runtime)
βββ chroma_db/ # ChromaDB storage (created at runtime)
```
## π How MCP Works in This Project
### MCP Purpose Demonstrated:
1. **Standardized Tool Interface**
- MCP defines tools (`query_semiconductor_data`, `get_collection_info`)
- Tools provide structured access to ChromaDB
- Protocol-level abstraction for data retrieval
2. **Context Retrieval Flow**
```
User Query
β
MCP Tool Call (query_semiconductor_data)
β
ChromaDB Semantic Search
β
Retrieved Context
β
LLM Answer Generation
```
3. **Protocol Benefits**
- **Modularity**: MCP tools can be reused across different systems
- **Standardization**: Consistent interface for context retrieval
- **Extensibility**: Easy to add new tools or data sources
### MCP Implementation:
- **MCP Server**: Defines tools for ChromaDB operations
- **MCP Client**: Demonstrates tool discovery and usage
- **RAG Integration**: Uses MCP principles for context retrieval
## π Quick Start
1. **Install dependencies:**
```bash
pip install -r requirements.txt
```
2. **Create example Excel:**
```bash
python create_example_excel.py
```
3. **Start API server:**
```bash
python main.py
```
4. **Upload document:**
```bash
curl -X POST "http://localhost:8000/upload" \
-F "file=@examples/semiconductor_components.xlsx"
```
5. **Ask question:**
```bash
curl -X POST "http://localhost:8000/ask" \
-H "Content-Type: application/json" \
-d '{"question": "What MOSFET components are available?"}'
```
## π§ Technologies Used
- **MCP**: Model Context Protocol (v0.9.0)
- **ChromaDB**: Vector database (v0.4.18)
- **HuggingFace**: Models and Transformers
- Encoding: `sentence-transformers/all-MiniLM-L6-v2`
- Decoding: Llama-2 or GPT-2
- **FastAPI**: REST API framework
- **Python**: 3.8+
- **PyTorch**: Deep learning framework
## π Data Flow
```
Excel Document
β
Parse to Text Chunks
β
Generate Embeddings (Encoding Model)
β
Store in ChromaDB (with metadata)
β
[User asks question]
β
Generate Query Embedding
β
Semantic Search in ChromaDB (MCP tool)
β
Retrieve Relevant Context
β
Generate Answer (LLM Decoding Model)
β
Return Response
```
## π Key Concepts Demonstrated
1. **RAG (Retrieval-Augmented Generation)**
- Retrieval phase: ChromaDB semantic search
- Augmentation phase: Combine context with query
- Generation phase: LLM generates answer
2. **MCP (Model Context Protocol)**
- Tool-based interface
- Standardized protocol
- Context retrieval abstraction
3. **Semantic Search**
- Embeddings for semantic similarity
- Vector database for efficient retrieval
- Metadata filtering capabilities
4. **Document Processing**
- Excel parsing
- Chunking strategy
- Metadata preservation
## π Example Questions
- "What MOSFET components are available?"
- "Show me voltage regulators from Texas Instruments"
- "What components work with 5V?"
- "List all temperature sensors"
- "What components are used for power switching?"
## π Configuration
HuggingFace API key should be set in `.env` file:
```python
HF_API_KEY=your_api_key_here
```
**Important**: Create a `.env` file in the root directory with your Hugging Face API key:
```
HF_API_KEY=your_api_key_here
```
Get your API key from: https://huggingface.co/settings/tokens
Models can be changed in `config.py`:
- `HF_EMBEDDING_MODEL`: Encoding model
- `HF_LLM_MODEL`: Decoding model
- `CHROMA_COLLECTION_NAME`: Collection name
## β
Testing
Run automated tests:
```bash
python test_api.py
```
Test MCP client:
```bash
python mcp_client_example.py
```
## π Documentation
- **README.md**: Overview and installation
- **USAGE.md**: Detailed usage instructions
- **PROJECT_SUMMARY.md**: This file - project summary
- **API Docs**: Available at `http://localhost:8000/docs`
## π― Project Goals Achieved
β
**MCP Integration**: Fully implemented MCP server and client
β
**ChromaDB**: Vector database for semantic search
β
**HuggingFace Models**: Both encoding and decoding models
β
**RAG Flow**: Complete retrieval-augmented generation pipeline
β
**Backend API**: REST API for document upload and Q&A
β
**Example Data**: Semiconductor component Excel document
β
**Working System**: Fully functional end-to-end system
## π¦ Status
**Project Status**: β
**COMPLETE** and **WORKING**
All requirements have been implemented and the system is ready for demonstration and use.