README.md•3.82 kB
# MCP Enhanced Data Retrieval System
An MCP (Model Context Protocol) server that standardizes AI context sharing by integrating organizational knowledge sources (GitHub, internal docs, APIs) to enable domain-aware AI assistance for enterprise development workflows.
## Project Overview
This system implements the Model Context Protocol to provide:
- Standardized AI context sharing across organizational knowledge sources
- GitHub repository integration with OAuth 2.1 authentication
- Vector-based semantic search using embeddings
- Optimized 1500-token context chunking for sub-500ms TTFT
- Parallel retrieval strategy with 2-second timeout
- Streamable HTTP transport using FastAPI
## Architecture
```
AI Applications
↓
Authentication (OAuth 2.1 + RBAC)
↓
MCP Client
↓
MCP Protocol (JSON-RPC + HTTP)
↓
MCP Server
• Multi-threaded parallel retrieval
• 1500-token chunking
↓
Knowledge Tiers (Public, Internal, Restricted)
↓
Data Sources: GitHub | Docs
Vector Storage: Embeddings
```
## Features
- **MCP Protocol Compliance**: JSON-RPC 2.0 over Streamable HTTP
- **GitHub Integration**: Repository data retrieval and contextualization
- **Vector Embeddings**: Semantic search using ChromaDB and Sentence Transformers
- **Context Optimization**: 1500-token chunking with parallel retrieval
- **OAuth 2.1 Security**: Secure authentication for GitHub access
- **Performance**: Sub-500ms response times with 2-second retrieval timeout
## Project Structure
```
.
├── src/
│ ├── server/ # MCP server core and FastAPI app
│ ├── auth/ # OAuth 2.1 authentication
│ ├── github/ # GitHub API integration
│ ├── vector/ # Vector database and embeddings
│ └── utils/ # Utilities and helpers
├── tests/ # Test suite
├── config/ # Configuration files
├── data/ # Data storage (vector DB, cache)
├── logs/ # Application logs
├── requirements.txt # Python dependencies
└── .env.example # Environment variables template
```
## Setup
1. **Clone and navigate to the project:**
```bash
cd "MCP Enhanced Data Retrieval"
```
2. **Create virtual environment:**
```bash
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. **Install dependencies:**
```bash
pip install -r requirements.txt
```
4. **Configure environment variables:**
```bash
cp .env.example .env
# Edit .env with your credentials
```
5. **Run the server:**
```bash
uvicorn src.server.main:app --reload
```
## Milestone 1 Goals
- ✅ MCP protocol analysis and communication flow evaluation
- ✅ High-level architecture design for enterprise knowledge integration
- 🔄 Functional MCP server with GitHub integration
- 🔄 OAuth 2.1 authentication implementation
- 🔄 1500-token context chunking mechanism
- 🔄 Vector-based semantic search
## Success Criteria
- Functional MCP server that can retrieve and contextualize GitHub repository information
- OAuth 2.1 authentication for secure GitHub access
- 1500-token context chunking maintaining sub-500ms TTFT
- Parallel retrieval with 2-second timeout
- Vector-based semantic search for relevant content
## Technologies
- **MCP SDK**: Anthropic MCP Python SDK
- **Web Framework**: FastAPI with Streamable HTTP transport
- **GitHub API**: PyGithub
- **Authentication**: OAuth 2.1 (authlib)
- **Vector Database**: ChromaDB
- **Embeddings**: Sentence Transformers (all-MiniLM-L6-v2)
- **Token Processing**: tiktoken
## Author
Kalpalathika Ramanujam
Advisor: Dr. Thomas Kinsman
Rochester Institute of Technology
## License
Academic Project - RIT Capstone