# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
This is a **Model Context Protocol (MCP) server** for managing PostgreSQL clusters using the CloudNativePG operator in Kubernetes. It provides a bridge between LLMs and CloudNativePG resources, enabling natural language interaction with PostgreSQL cluster lifecycle management.
**Key characteristics:**
- Python-based MCP server using **FastMCP** (simplified MCP SDK with auto-schema generation)
- Kubernetes client interacting with CloudNativePG Custom Resources (CRDs)
- Designed for transport-agnostic architecture (stdio for local, HTTP/SSE for remote)
- **OIDC/OAuth2 authentication** for HTTP mode with JWT bearer token verification
- **DCR proxy support** for IdPs without Dynamic Client Registration
- All operations are async using Python asyncio
## Development Commands
### Running the Server
```bash
# Default stdio transport (for Claude Desktop integration)
python src/cnpg_mcp_server.py
# With specific transport mode
python src/cnpg_mcp_server.py --transport stdio
# HTTP transport with OIDC authentication (recommended for production)
# Requires OIDC_ISSUER and OIDC_AUDIENCE environment variables
export OIDC_ISSUER=https://auth.example.com
export OIDC_AUDIENCE=mcp-api
python src/cnpg_mcp_server.py --transport http --port 3000
# Or use the convenience script
./test/start-http.sh
```
### Testing
```bash
# Syntax check
python -m py_compile src/cnpg_mcp_server.py
# Test Kubernetes connectivity
kubectl get nodes
kubectl get clusters -A # List CloudNativePG clusters
# Deploy test cluster
kubectl apply -f example-cluster.yaml
# Check cluster status
kubectl get cluster example-cluster -w
```
### Dependencies
```bash
# Install all dependencies (includes HTTP/SSE and OIDC auth)
pip install -r requirements.txt
# Dependencies now include:
# - FastMCP (MCP server framework)
# - Kubernetes client
# - Uvicorn (ASGI server for HTTP mode)
# - Starlette (ASGI framework for middleware)
# - Authlib (OIDC/JWT verification)
# - httpx (async HTTP client for JWKS fetching)
```
### RBAC Setup
**Important:** CloudNativePG helm chart automatically creates ClusterRoles. You only need to create ServiceAccount + RoleBindings.
**Option 1: Using Python script (recommended):**
```bash
# Install dependencies (if not already installed)
pip install -r requirements.txt
# Create ServiceAccount and bind to edit role
python rbac/bind_cnpg_role.py --namespace default --service-account cnpg-mcp-server
# For read-only access
python rbac/bind_cnpg_role.py --namespace default --service-account cnpg-mcp-server --role view
# Dry run to see what would be created
python rbac/bind_cnpg_role.py --dry-run
```
**Option 2: Using kubectl:**
```bash
# Apply RBAC configuration (creates ServiceAccount and binds to existing cnpg roles)
kubectl apply -f rbac.yaml
```
**Verify setup:**
```bash
# Verify the helm-created roles exist
kubectl get clusterroles | grep cnpg
# Should show: cnpg-cloudnative-pg, cnpg-cloudnative-pg-edit, cnpg-cloudnative-pg-view
# Verify permissions for the service account
kubectl auth can-i get clusters.postgresql.cnpg.io --as=system:serviceaccount:default:cnpg-mcp-server
kubectl auth can-i create clusters.postgresql.cnpg.io --as=system:serviceaccount:default:cnpg-mcp-server
```
**Available CloudNativePG roles:**
- `cnpg-cloudnative-pg-edit`: Full edit access (recommended, used by default)
- `cnpg-cloudnative-pg-view`: Read-only access
- `cnpg-cloudnative-pg`: Full admin access
## Architecture
### Transport-Agnostic Design
The server architecture separates transport concerns from business logic:
```
MCP Tools (@mcp.tool() decorated functions)
↓ (transport-agnostic)
Transport Layer (stdio or HTTP/SSE)
↓
Kubernetes API (CustomObjectsApi + CoreV1Api)
↓
CloudNativePG Operator
```
**Key architectural points:**
- **FastMCP auto-generates schemas** from function signatures and docstrings - no manual schema definitions needed
- All tool functions work with any transport mode (just add `@mcp.tool()` decorator)
- Transport selection happens at startup via `main()` → `run_stdio_transport()` or `run_http_transport()`
- Kubernetes clients initialized lazily on first use: `custom_api` (CustomObjectsApi) and `core_api` (CoreV1Api)
- All I/O operations use `asyncio.to_thread()` to prevent blocking the event loop
### Core Components
**src/cnpg_mcp_server.py** (single-file architecture, ~1,746 lines):
- Lines 1-120: Imports, configuration, Kubernetes client initialization (lazy)
- Lines 121-257: Utility functions and Kubernetes API helpers
- Lines 258-466: Pydantic models for input validation (12 tools)
- Lines 467-1575: MCP tool implementations decorated with `@mcp.tool()`:
- Cluster management: list, get, create, scale, delete
- Role management: list, create, update, delete
- Database management: list, create, delete
- Lines 1576-1667: Transport implementations (`run_stdio_transport`, `run_http_transport`)
- Lines 1668-1746: CLI argument parsing and main entry point
### MCP Tools
The server exposes 12 tools to LLMs:
**Cluster Management:**
1. **list_postgres_clusters**: List all clusters with optional namespace filtering
2. **get_cluster_status**: Get detailed status for a specific cluster
3. **create_postgres_cluster**: Create new PostgreSQL cluster with HA configuration
4. **scale_postgres_cluster**: Scale cluster by adjusting instance count
5. **delete_postgres_cluster**: Delete cluster with safety confirmation
**Role/User Management:**
6. **list_postgres_roles**: List all roles in a cluster
7. **create_postgres_role**: Create role with auto-generated password stored in K8s secret
8. **update_postgres_role**: Update role attributes and password
9. **delete_postgres_role**: Delete role and associated secret
**Database Management:**
10. **list_postgres_databases**: List databases managed by Database CRDs
11. **create_postgres_database**: Create database with reclaim policy
12. **delete_postgres_database**: Delete Database CRD (actual deletion depends on policy)
**Tool implementation pattern (FastMCP simplified):**
- Decorated with `@mcp.tool()` - that's it! No manual schema needed
- FastMCP auto-generates schemas from function signatures and docstrings
- Comprehensive docstrings with Args, Returns, Examples, Error Handling sections
- Type hints (Pydantic models, Literal, Optional) automatically become schema constraints
- Return formatted strings optimized for LLM consumption
- Error handling via `format_error_message()` with actionable suggestions
### CloudNativePG Integration
**Resource structure:**
- Group: `postgresql.cnpg.io`
- Version: `v1`
- Kind: `Cluster`
- Plural: `clusters`
**Key fields in Cluster spec:**
- `spec.instances`: Number of PostgreSQL instances (for HA)
- `spec.imageName`: PostgreSQL version (e.g., `ghcr.io/cloudnative-pg/postgresql:16`)
- `spec.storage.size`: Storage per instance
- `spec.postgresql.parameters`: PostgreSQL configuration parameters
**Key fields in Cluster status:**
- `status.phase`: Overall cluster phase (e.g., "Cluster in healthy state")
- `status.readyInstances`: Count of ready instances
- `status.currentPrimary`: Name of current primary pod
- `status.conditions`: Array of condition objects
### Response Formatting
- **Character limit**: 25,000 characters (CHARACTER_LIMIT constant)
- **Truncation**: Applied via `truncate_response()` to prevent context overflow
- **Detail levels**: "concise" (default) vs "detailed" for progressive disclosure
- **Error messages**: Structured with status code, message, and actionable suggestions
## Code Conventions
### Adding New MCP Tools
Follow this pattern when adding tools:
1. **Create Pydantic model** for input validation (lines 190-268 area)
```python
class MyToolInput(BaseModel):
"""Input for my_tool."""
param1: str = Field(..., description="Clear description with examples")
```
2. **Implement tool function** (after existing tools, around line 590)
```python
@mcp.tool()
async def my_tool(param1: str, param2: Optional[str] = None) -> str:
"""
Brief description.
Detailed explanation of what this tool does and when to use it.
Args:
param1: Parameter description with usage guidance
param2: Optional parameter description
Returns:
Description of return value format
Examples:
- Example usage 1
- Example usage 2
Error Handling:
- Common error scenarios and resolution steps
"""
try:
# Implementation
result = await some_async_operation(param1, param2)
return truncate_response(format_result(result))
except Exception as e:
return format_error_message(e, "context description")
```
3. **Use async/await for Kubernetes calls**
```python
cluster = await asyncio.to_thread(
custom_api.get_namespaced_custom_object,
group=CNPG_GROUP,
version=CNPG_VERSION,
namespace=namespace,
plural=CNPG_PLURAL,
name=name
)
```
### Error Handling Strategy
- Always use try/except blocks in tool functions
- Format errors via `format_error_message(error, context)`
- Provide actionable suggestions based on HTTP status codes:
- 404: Resource not found → suggest listing or checking namespace
- 403: Permission denied → suggest RBAC verification
- 409: Conflict → suggest resource may already exist
- 422: Invalid spec → suggest checking API documentation
### Testing Kubernetes Operations
When testing or debugging Kubernetes operations:
```bash
# Directly inspect resources
kubectl get clusters -A -o yaml
kubectl describe cluster <name> -n <namespace>
# Check operator logs
kubectl logs -n cnpg-system deployment/cnpg-controller-manager
# Test API access
kubectl auth can-i get clusters.postgresql.cnpg.io --as=system:serviceaccount:default:cnpg-mcp-server
# Get connection credentials
kubectl get secret <cluster-name>-app -o jsonpath='{.data.password}' | base64 -d
```
## Important Notes
### Transport Modes
- **stdio (default)**: Uses stdin/stdout via `mcp.run_stdio_async()`, perfect for Claude Desktop, single client only
- **HTTP (production-ready)**: Full production deployment with OIDC authentication using Streamable HTTP
- Implemented in `run_http_transport()` at line ~2078
- **MCP endpoint**: `/mcp` (standard path)
- **OIDC/OAuth2 authentication** using JWT bearer tokens (src/auth_oidc.py module)
- JWKS-based public key discovery with automatic rotation
- Support for non-DCR IdPs via DCR proxy
- Health check endpoint at `/health` (unauthenticated)
- OAuth metadata at `/.well-known/oauth-authorization-server`
- For production: run behind reverse proxy (nginx/traefik) for TLS
- Configuration via environment variables (see OIDC Setup section below)
### OIDC Authentication Setup
For HTTP transport mode, OIDC authentication is **required for production** use:
**Required Environment Variables:**
```bash
export OIDC_ISSUER=https://auth.example.com # Your OIDC provider URL
export OIDC_AUDIENCE=mcp-api # Expected JWT audience claim
```
**Optional Configuration:**
```bash
export OIDC_JWKS_URI=https://auth.example.com/.well-known/jwks.json # Override JWKS URI
export DCR_PROXY_URL=https://dcr-proxy.example.com/register # DCR proxy for non-DCR IdPs
export OIDC_SCOPE=openid # Required OAuth2 scope
```
**Quick Start:**
```bash
# 1. Set OIDC configuration
export OIDC_ISSUER=https://your-idp.example.com
export OIDC_AUDIENCE=mcp-api
# 2. Start server using convenience script
./test/start-http.sh
# 3. Test with inspector tool
./test/test-inspector.sh --transport http --url http://localhost:3000 --token <JWT>
```
**Supported OIDC Providers:**
- Auth0, Keycloak, Okta, Azure AD, Google, any RFC 6749/OpenID Connect compliant IdP
**For detailed setup instructions**, see `docs/OIDC_SETUP.md`
**Files:**
- `src/auth_oidc.py`: OIDC authentication provider implementation
- `docs/OIDC_SETUP.md`: Complete setup guide with IdP-specific examples
- `test/start-http.sh`: Convenience script for HTTP mode startup
- `test/test-inspector.sh`: Testing tool using MCP Inspector (supports stdio and HTTP)
- `kubernetes-deployment-oidc.yaml`: Production Kubernetes deployment manifest
### Kubernetes Configuration
- **In-cluster**: Uses service account tokens automatically
- **Local**: Uses `~/.kube/config` or `KUBECONFIG` environment variable
- Initialization at line 45-60 attempts in-cluster first, falls back to kubeconfig
### Response Optimization
- Responses are optimized for LLM consumption (markdown formatting, concise by default)
- Use `detail_level="detailed"` parameter for comprehensive information
- Always truncate responses to stay within CHARACTER_LIMIT (25,000 chars)
### Security Considerations
- **Authentication (HTTP mode)**:
- **OIDC/OAuth2 required for production** - enforced via JWT bearer tokens
- Automatic JWT signature verification using JWKS
- Token validation: issuer, audience, expiration, signature
- Health check endpoint excluded from authentication
- Short-lived tokens recommended (15-60 minutes)
- **RBAC**: Uses CloudNativePG's built-in roles (no custom ClusterRoles needed)
- rbac.yaml binds to `cnpg-cloudnative-pg-edit` by default
- For read-only, change to `cnpg-cloudnative-pg-view`
- Follow principle of least privilege
- **Best Practices**:
- Never log or expose database credentials
- All inputs validated via Pydantic models
- Run HTTP mode behind TLS termination (Ingress with cert-manager)
- Use network policies to restrict traffic
- Consider namespace isolation for multi-tenant scenarios
- Enable access logging and monitoring
## Common Tasks
### Debugging Connection Issues
```bash
# Check Kubernetes connectivity
kubectl cluster-info
kubectl get nodes
# Verify CloudNativePG operator is running
kubectl get deployment -n cnpg-system cnpg-controller-manager
# Check server can load config
python -c "from kubernetes import config; config.load_kube_config(); print('OK')"
```
### Extending Tool Capabilities
**Currently implemented (12 tools):**
- ✅ Cluster lifecycle: list, get, create, scale, delete
- ✅ Role/user management: list, create, update, delete (with K8s secret management)
- ✅ Database operations: list, create, delete (via Database CRDs)
**Natural extensions for future:**
- Backup management (list_backups, create_backup, restore_backup)
- Pod logs retrieval (get_cluster_logs)
- Connection information with automatic secret decoding (get_connection_info)
- Monitoring metrics integration
- Pooler management (PgBouncer)
- Certificate and TLS management
**When adding new tools with FastMCP:**
1. Add `@mcp.tool()` decorator to your async function
2. Use type hints (Pydantic models, Literal, Optional) for parameters
3. Write comprehensive docstring - FastMCP auto-generates schema from it
4. Follow existing patterns for async operations, error handling, and response formatting
5. That's it! No manual schema definition needed.
### Deployment Considerations
- **Development**: Run locally with `python src/cnpg_mcp_server.py`
- **Production**: Use kubernetes-deployment.yaml with proper RBAC
- **Claude Desktop**: Configure in `claude_desktop_config.json` with absolute path
- **Container**: Use provided Dockerfile (Python 3.11-slim base)
## File Organization
### Server Source (`src/`)
- **cnpg_mcp_server.py**: Main server implementation (single file)
- **auth_oidc.py**: OIDC authentication provider
### Testing (`test/`)
- **test-inspector.py**: MCP inspector test tool
- **test-inspector.sh**: Inspector test script
- **start-http.sh**: HTTP mode startup script
- **get-user-token.py**: OIDC token retrieval utility
- **mcp-auth-proxy.py**: Auth proxy for testing
- **test_*.py**: Unit tests
### Utilities (`bin/`)
- **add-user-to-allowed-clients.py**: OIDC user management
- **fix-user-auth-connection.py**: OIDC connection fix utility
- **make_config.py**: Configuration file generator
- **setup-auth0.py**: Auth0 setup automation
- **create_secrets.py**: Kubernetes secret generation
### Documentation (`docs/`)
- **QUICKSTART.md**: Quick start guide
- **OIDC_SETUP.md**: OIDC authentication setup guide
- **HTTP_TRANSPORT_GUIDE.md**: HTTP transport implementation guide
- **REFACTORING_SUMMARY.md**: Transport-agnostic refactoring notes
- Additional guides and release notes
### Configuration
- **requirements.txt**: Python dependencies
- **rbac.yaml**: Kubernetes RBAC configuration
- **example-cluster.yaml**: Sample PostgreSQL cluster manifest
- **kubernetes-deployment.yaml**: K8s Deployment and Service
- **kubernetes-deployment-oidc.yaml**: Production deployment with OIDC
- **Dockerfile**: Container image definition
- **Makefile**: Build and deployment automation
## Related Resources
- CloudNativePG API: https://cloudnative-pg.io/documentation/current/
- MCP Protocol: https://modelcontextprotocol.io/
- Kubernetes Python Client: https://github.com/kubernetes-client/python