Skip to main content
Glama

MCP Enhanced Data Retrieval System

by kalpalathika
README.md3.82 kB
# MCP Enhanced Data Retrieval System An MCP (Model Context Protocol) server that standardizes AI context sharing by integrating organizational knowledge sources (GitHub, internal docs, APIs) to enable domain-aware AI assistance for enterprise development workflows. ## Project Overview This system implements the Model Context Protocol to provide: - Standardized AI context sharing across organizational knowledge sources - GitHub repository integration with OAuth 2.1 authentication - Vector-based semantic search using embeddings - Optimized 1500-token context chunking for sub-500ms TTFT - Parallel retrieval strategy with 2-second timeout - Streamable HTTP transport using FastAPI ## Architecture ``` AI Applications ↓ Authentication (OAuth 2.1 + RBAC) ↓ MCP Client ↓ MCP Protocol (JSON-RPC + HTTP) ↓ MCP Server • Multi-threaded parallel retrieval • 1500-token chunking ↓ Knowledge Tiers (Public, Internal, Restricted) ↓ Data Sources: GitHub | Docs Vector Storage: Embeddings ``` ## Features - **MCP Protocol Compliance**: JSON-RPC 2.0 over Streamable HTTP - **GitHub Integration**: Repository data retrieval and contextualization - **Vector Embeddings**: Semantic search using ChromaDB and Sentence Transformers - **Context Optimization**: 1500-token chunking with parallel retrieval - **OAuth 2.1 Security**: Secure authentication for GitHub access - **Performance**: Sub-500ms response times with 2-second retrieval timeout ## Project Structure ``` . ├── src/ │ ├── server/ # MCP server core and FastAPI app │ ├── auth/ # OAuth 2.1 authentication │ ├── github/ # GitHub API integration │ ├── vector/ # Vector database and embeddings │ └── utils/ # Utilities and helpers ├── tests/ # Test suite ├── config/ # Configuration files ├── data/ # Data storage (vector DB, cache) ├── logs/ # Application logs ├── requirements.txt # Python dependencies └── .env.example # Environment variables template ``` ## Setup 1. **Clone and navigate to the project:** ```bash cd "MCP Enhanced Data Retrieval" ``` 2. **Create virtual environment:** ```bash python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 3. **Install dependencies:** ```bash pip install -r requirements.txt ``` 4. **Configure environment variables:** ```bash cp .env.example .env # Edit .env with your credentials ``` 5. **Run the server:** ```bash uvicorn src.server.main:app --reload ``` ## Milestone 1 Goals - ✅ MCP protocol analysis and communication flow evaluation - ✅ High-level architecture design for enterprise knowledge integration - 🔄 Functional MCP server with GitHub integration - 🔄 OAuth 2.1 authentication implementation - 🔄 1500-token context chunking mechanism - 🔄 Vector-based semantic search ## Success Criteria - Functional MCP server that can retrieve and contextualize GitHub repository information - OAuth 2.1 authentication for secure GitHub access - 1500-token context chunking maintaining sub-500ms TTFT - Parallel retrieval with 2-second timeout - Vector-based semantic search for relevant content ## Technologies - **MCP SDK**: Anthropic MCP Python SDK - **Web Framework**: FastAPI with Streamable HTTP transport - **GitHub API**: PyGithub - **Authentication**: OAuth 2.1 (authlib) - **Vector Database**: ChromaDB - **Embeddings**: Sentence Transformers (all-MiniLM-L6-v2) - **Token Processing**: tiktoken ## Author Kalpalathika Ramanujam Advisor: Dr. Thomas Kinsman Rochester Institute of Technology ## License Academic Project - RIT Capstone

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kalpalathika/MCP-Enhanced-Data-Retrieval-System'

If you have feedback or need assistance with the MCP directory API, please join our Discord server