README.md•4.58 kB
# Kubernetes MCP Server
An interactive Kubernetes monitoring system built with Flask and OpenAI's Model Context Protocol (MCP). This project provides an agentic interface for diagnosing cluster issues using natural language queries.
## Features
- **Flask MCP Server**: Exposes Kubernetes cluster data via JSON-RPC endpoints
- **Interactive Client**: Ask questions like "What is the status of the checkout service?"
- **OpenAI Integration**: Uses GPT models to intelligently investigate cluster problems
- **Kubernetes Integration**: Real-time pod monitoring, events, and logs
- **Colorized Output**: Beautiful terminal interface with ANSI colors
## Architecture
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Interactive │───▶│ Flask MCP │───▶│ Kubernetes │
│ Client │ │ Server │ │ Cluster │
│ (client.py) │ │ (server.py) │ │ (KIND/etc) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────┐
│ OpenAI GPT │ │ Static Fixtures │
│ (optional) │ │ (metrics, etc) │
└─────────────────┘ └──────────────────┘
```
## Setup
### Prerequisites
- Python 3.9+
- Kubernetes cluster (KIND recommended for local development)
- OpenAI API key (optional, fallback mode available)
### Installation
1. Clone the repository:
```bash
git clone https://github.com/YOUR_USERNAME/YOUR_REPO_NAME.git
cd YOUR_REPO_NAME
```
2. Install dependencies:
```bash
pip install flask kubernetes openai requests
```
3. Set up environment variables:
```bash
export OPENAI_API_KEY="your-api-key-here" # Optional
export KUBECONFIG="path/to/your/kubeconfig" # If not using default
```
### Running the Server
```bash
cd mcp
python3 server.py
```
The server will start on `http://localhost:5050`
### Running the Interactive Client
```bash
cd mcp
python3 client.py
```
## Usage
### Interactive Mode
Start the client and ask natural language questions:
```
> what is the status of my checkout service?
> show failing pods in namespace staging
> summarize errors for service payments in the last 45 minutes
```
### One-shot Mode
```bash
python3 client.py --ask "what pods are failing in default namespace?"
```
### Available Tools
- `k8s.listProblemPods` - Find problematic pods
- `k8s.getPodDetails` - Get detailed pod information
- `deployments.listRecentChanges` - Recent deployment history
- `metrics.getErrors` - Error rate analysis
- `traces.sampleErrors` - Sample failing traces
- `config.getDiff` - Configuration changes
## Example Output
```
=== 🧩 FINAL ANSWER ===
📋 Summary:
The pod 'demo-fail-5df44cbf79-tqg6l' is experiencing CrashLoopBackOff
🔍 Evidence:
• Pod: demo-fail-5df44cbf79-tqg6l
Status: Running
Restarts: 115
Reason: CrashLoopBackOff
⚠️ Probable Cause:
Application failing to start successfully due to exit code 1
🛠️ Safe Next Step:
Investigate application logs and configuration
✅ Confidence: High
```
## Configuration
Environment variables:
- `RPC_URL` - MCP server URL (default: http://127.0.0.1:5050/rpc)
- `OPENAI_API_KEY` - OpenAI API key for LLM features
- `OPENAI_MODEL` - Model to use (default: gpt-4o-mini)
- `SERVICE` - Default service name (default: checkout)
- `NAMESPACE` - Default K8s namespace (default: default)
- `SINCE_MINS` - Time window for queries (default: 120)
## Development
### Project Structure
```
mcp-demo/
├── mcp/
│ ├── server.py # Flask MCP server
│ ├── client.py # Interactive client
│ ├── tools_catalog.json # Tool definitions
│ └── fixtures/ # Static test data
├── k8s/
│ └── deployment.yaml # Sample K8s resources
└── README.md
```
### Adding New Tools
1. Add tool definition to `tools_catalog.json`
2. Implement handler in `server.py`
3. Test with client
### Demo
https://github.com/user-attachments/assets/e30a7a69-ff7a-46f1-a2ff-e75eff79334b
## License
MIT License