Global MCP Server

globalmcp
.github
instructions

globalmcp.planning.instructions.md•3.82 KiB

## applyTo: "\*\*" # Copilot MCP Compression & Routing – Planning Instructions ## Project Overview This project creates a modular, extensible toolchain using GitHub Copilot’s MCP (Model Customization Protocol) to: - Compress large prompt contexts using frequency and convolutional techniques (FreqKV & LoCoCo). - Dynamically route prompts to local LLMs (Tiny, Medium, or Large) based on prompt complexity. - Extend the effective working context of Copilot during large, multi-file or long-lived coding sessions. Goal: Enable faster, smarter, and longer coding interactions with Copilot using smart local tools. --- ## Architecture Overview - **Technology Stack**: - Python 3.10+ (FastAPI for microservices) - NumPy / SciPy (for KV processing) - Ollama (for local LLM orchestration) - GitHub Copilot Agent Mode (MCP tool interface) - Redis (for optional routing/memory cache) - **Design Patterns**: - Microservice Architecture - Mixture of Experts (MoE) Routing - MCP Toolchain Composition - Functional Service Isolation - **Key Components**: - `freqkv_service`: Frequency-domain compressor for KV cache. - `lococo_service`: Convolution-based KV fuser for context retention. - `routing_service`: Classifies prompt complexity and forwards to appropriate model. - `model_registry`: Maps prompt types to LLM endpoints. - `copilot_agent_chain`: Chains all services together as part of Copilot Agent Mode. --- ## Current Focus Areas - [x] **Phase 1**: Architecture research and prototype planning (completed). - [ ] **Phase 2**: Build and test FreqKV, LoCoCo, and prompt routing services independently. - [ ] **Phase 3**: Chain tools via MCP and integrate with GitHub Copilot Agent Mode. - [ ] **Phase 4**: Add caching, test datasets, and benchmarking for local dev performance. --- ## Key Architectural Decisions 1. **Prompt Routing via Local Classifier**: Use a lightweight LLM (e.g. Phi-3, TinyLlama) to classify prompt complexity and route accordingly. Avoids overloading larger models and cuts latency on simple tasks. 2. **KV Compression via DCT + Convolution**: FreqKV handles semantic compression (drop high-frequency info), and LoCoCo handles spatial compression (fuse tokens into fewer slots). Combining both maintains prompt fidelity while shrinking size for model consumption. 3. **Model Agnosticism via Registry**: All model endpoints are abstracted in a registry layer. This allows plug-and-play swapping of LLMs (Ollama, llama.cpp, vLLM) without changing routing logic. --- ## Integration Points - **External APIs**: - GitHub Copilot MCP Toolchain (Agent Mode) - Optional: Claude Sonnet 4 / GPT-4 as fallback cloud inference endpoints - **Internal Services**: - `freqkv_service` - `lococo_service` - `routing_service` - `model_proxy_service` (for LLMs via Ollama or similar) - **Data Flow**: 1. Editor → Copilot context buffer 2. KV → FreqKV → LoCoCo 3. Prompt → Router → LLM (via model registry) 4. Output → Copilot response --- ## Security Considerations - No external data leaves the dev machine (unless using optional cloud model fallback). - LLM endpoints are run locally with isolated access. - Tools can be run in a container sandbox (Docker or Firejail for extra isolation). - No persistent user data logged unless explicitly opted-in for telemetry. --- ## Performance Requirements - **Expected Load**: - Support for 5–20 file active contexts (~200K–2M tokens). - Up to 10 requests/minute per developer session. - **Performance Targets**: - Compression + Routing latency ≤ 250ms - Total toolchain response time (incl. LLM) ≤ 2s - **Optimization Priorities**: - Prefer routing simple prompts to 1.5B–3B models to reduce GPU usage. - Cache model outputs and classifier results. - Optimize FreqKV and LoCoCo for array throughput (batching, GPU optional).

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Apofenic/globalmcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

globalmcp.planning.instructions.md•3.82 KiB