Enables retrieval and contextualization of GitHub repository information with OAuth 2.1 authentication, providing AI agents with access to organizational code repositories and documentation.
MCP Enhanced Data Retrieval System
An MCP (Model Context Protocol) server that standardizes AI context sharing by integrating organizational knowledge sources (GitHub, internal docs, APIs) to enable domain-aware AI assistance for enterprise development workflows.
Project Overview
This system implements the Model Context Protocol to provide:
Standardized AI context sharing across organizational knowledge sources
GitHub repository integration with OAuth 2.1 authentication
Vector-based semantic search using embeddings
Optimized 1500-token context chunking for sub-500ms TTFT
Parallel retrieval strategy with 2-second timeout
Streamable HTTP transport using FastAPI
Architecture
Features
MCP Protocol Compliance: JSON-RPC 2.0 over Streamable HTTP
GitHub Integration: Repository data retrieval and contextualization
Vector Embeddings: Semantic search using ChromaDB and Sentence Transformers
Context Optimization: 1500-token chunking with parallel retrieval
OAuth 2.1 Security: Secure authentication for GitHub access
Performance: Sub-500ms response times with 2-second retrieval timeout
Project Structure
Setup
Clone and navigate to the project:
cd "MCP Enhanced Data Retrieval"Create virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activateInstall dependencies:
pip install -r requirements.txtConfigure environment variables:
cp .env.example .env # Edit .env with your credentialsRun the server:
uvicorn src.server.main:app --reload
Milestone 1 Goals
✅ MCP protocol analysis and communication flow evaluation
✅ High-level architecture design for enterprise knowledge integration
🔄 Functional MCP server with GitHub integration
🔄 OAuth 2.1 authentication implementation
🔄 1500-token context chunking mechanism
🔄 Vector-based semantic search
Success Criteria
Functional MCP server that can retrieve and contextualize GitHub repository information
OAuth 2.1 authentication for secure GitHub access
1500-token context chunking maintaining sub-500ms TTFT
Parallel retrieval with 2-second timeout
Vector-based semantic search for relevant content
Technologies
MCP SDK: Anthropic MCP Python SDK
Web Framework: FastAPI with Streamable HTTP transport
GitHub API: PyGithub
Authentication: OAuth 2.1 (authlib)
Vector Database: ChromaDB
Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
Token Processing: tiktoken
Author
Kalpalathika Ramanujam Advisor: Dr. Thomas Kinsman Rochester Institute of Technology
License
Academic Project - RIT Capstone
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Enables AI applications to access and contextualize organizational knowledge sources including GitHub repositories and internal documentation through standardized MCP protocol integration. Features OAuth 2.1 authentication, vector-based semantic search, and optimized context chunking for enterprise development workflows.