DevOps AI Toolkit

287-resource-sync-endpoint.md•21.4 KiB

# PRD #287: Resource Sync Endpoint - Receive and Store Cluster Resources **GitHub Issue**: [#287](https://github.com/vfarcic/dot-ai/issues/287) **Status**: Complete **Priority**: High **Created**: 2025-12-19 --- ## Problem Statement Users lack efficient visibility into resources within a Kubernetes cluster. The dot-ai-controller (PRD #28) has been built to watch cluster resources and send them to MCP, but **there is no MCP endpoint to receive this data**. Without this endpoint: 1. **No semantic search** - Can't ask "show me all database-related resources" or "find production services" 2. **No resource inventory** - No way to discover what's deployed across namespaces 3. **Controller has nowhere to send data** - Phase 1 complete, blocked on Phase 2 ### Related Work This PRD implements **Phase 2** of [vfarcic/dot-ai-controller#28](https://github.com/vfarcic/dot-ai-controller/issues/28) (Resource Visibility). **Phase 1 (Controller - COMPLETE)**: - Watches all cluster resources via dynamic informers - Detects changes (labels, annotations) - Debounces and batches changes - Sends HTTP requests to MCP endpoint (this PRD) **Phase 2 (MCP - THIS PRD)**: - Receive resource data from controller - Generate embeddings for semantic search - Store in Qdrant `resources` collection - Enable resource discovery queries --- ## Solution Overview Implement the MCP-side infrastructure to receive, embed, and store Kubernetes resource data: ``` ┌─────────────────────────────────────────────────────────────────────┐ │ MCP Server │ │ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ POST /api/v1/resources/sync │ │ │ │ │ │ │ │ 1. Receive ResourceSyncRequest from controller │ │ │ │ 2. Validate request structure │ │ │ │ 3. Generate embeddings for each resource │ │ │ │ 4. Upsert to Qdrant 'resources' collection │ │ │ │ 5. Handle deletes (idempotent - ignore not found) │ │ │ │ 6. For resync: diff against Qdrant, apply changes │ │ │ │ 7. Return success/failure counts │ │ │ └──────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ Qdrant 'resources' Collection │ │ │ │ │ │ │ │ - Semantic search: "database deployments", "prod services" │ │ │ │ - Keyword filters: namespace, kind, labels │ │ │ │ - Resource inventory and discovery │ │ │ └──────────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ ``` ### Key Design Principles 1. **Follow capabilities pattern** - Resources collection mirrors capabilities collection architecture 2. **Controller is "dumb"** - sends data; MCP is "smart" - handles embeddings, diffing, Qdrant 3. **Qdrant is search index, not source of truth** - Kubernetes etcd is authoritative 4. **Idempotent operations** - deletes ignore "not found", upserts are safe to retry --- ## API Contract ### Endpoint: `POST /api/v1/resources/sync` **Request Body** (from controller): ```json { "upserts": [ { "namespace": "default", "name": "nginx", "kind": "Deployment", "apiVersion": "apps/v1", "labels": {"app": "nginx", "env": "prod"}, "annotations": {"description": "Web server"}, "createdAt": "2025-12-13T10:00:00Z", "updatedAt": "2025-12-18T14:30:00Z" } ], "deletes": [ { "namespace": "default", "name": "old-nginx", "kind": "Deployment", "apiVersion": "apps/v1" } ], "isResync": false } ``` **Note**: Controller sends objects with component fields. MCP constructs IDs internally using format `namespace:apiVersion:kind:name` and hashes to UUIDs for Qdrant storage. **Response Body** (follows `RestApiResponse` pattern): Success: ```json { "success": true, "data": { "upserted": 5, "deleted": 2 }, "meta": { "timestamp": "2025-12-18T14:30:00Z", "requestId": "abc-123", "version": "1.0.0" } } ``` Error (partial or complete failure): ```json { "success": false, "error": { "code": "SYNC_FAILED", "message": "Failed to process some resources", "details": { "upserted": 4, "deleted": 2, "failures": [ {"id": "default:apps/v1:Deployment:nginx", "error": "embedding failed"} ] } }, "meta": { "timestamp": "2025-12-18T14:30:00Z", "requestId": "abc-123", "version": "1.0.0" } } ``` ### ID Format `namespace:apiVersion:kind:name` Examples: - `default:apps/v1:Deployment:nginx` - `kube-system:v1:ConfigMap:coredns` - `_cluster:rbac.authorization.k8s.io/v1:ClusterRole:admin` (cluster-scoped) The `_cluster` prefix is used for cluster-scoped resources (no namespace). --- ## Technical Design ### 1. Resource Vector Service Create `src/core/resource-vector-service.ts` following the `capability-vector-service.ts` pattern: ```typescript interface ClusterResource { namespace: string; // Kubernetes namespace or '_cluster' for cluster-scoped name: string; kind: string; apiVersion: string; apiGroup?: string; // Derived from apiVersion labels: Record<string, string>; annotations?: Record<string, string>; createdAt: string; updatedAt: string; } // Note: ID is constructed by MCP from components (namespace:apiVersion:kind:name) // and hashed to UUID for Qdrant storage class ResourceVectorService extends BaseVectorService { constructor(config: VectorDBConfig) { super({ ...config, collectionName: 'resources' }); } async upsertResource(resource: ClusterResource): Promise<void>; async deleteResource(id: string): Promise<void>; async listResources(): Promise<ClusterResource[]>; async diffAndSync(incoming: ClusterResource[]): Promise<SyncResult>; } ``` **Note**: Status is intentionally excluded. Kubernetes is the source of truth for live status. This collection is for resource discovery/inventory, not status monitoring. Status queries should use kubectl or Kubernetes API directly. ### 2. Embedding Generation Generate embeddings from a semantic text representation of the resource: ```typescript function buildEmbeddingText(resource: ClusterResource): string { const parts = [ `${resource.kind} ${resource.name}`, `namespace: ${resource.namespace}`, `apiVersion: ${resource.apiVersion}`, ]; // Add meaningful labels (skip standard Kubernetes labels) if (resource.labels) { const labelText = Object.entries(resource.labels) .filter(([k]) => !k.startsWith('app.kubernetes.io/')) .map(([k, v]) => `${k}=${v}`) .join(', '); if (labelText) parts.push(`labels: ${labelText}`); // Include app name from standard labels if present const appName = resource.labels['app.kubernetes.io/name'] || resource.labels['app']; if (appName) parts.push(`app: ${appName}`); } // Add description from annotations if present if (resource.annotations?.description) { parts.push(`description: ${resource.annotations.description}`); } return parts.join(' | '); } ``` ### 3. REST Endpoint Handler Add to `src/interfaces/rest-api.ts`: ```typescript // In handleRestRequest routing case 'resources': if (method === 'POST' && pathParts[1] === 'sync') { return handleResourceSync(requestBody); } break; async function handleResourceSync(body: ResourceSyncRequest): Promise<RestApiResponse> { const resourceService = new ResourceVectorService(getVectorDBConfig()); let upserted = 0; let deleted = 0; const failures: Array<{id: string, error: string}> = []; // Handle upserts for (const resource of body.upserts ?? []) { try { const embeddingText = buildEmbeddingText(resource); const embedding = await embeddingService.embed(embeddingText); await resourceService.upsertResource({ ...resource, embedding }); upserted++; } catch (error) { failures.push({ id: resource.id, error: error.message }); } } // Handle deletes (idempotent) for (const id of body.deletes ?? []) { try { await resourceService.deleteResource(id); deleted++; } catch (error) { // Ignore not found errors - idempotent delete if (!error.message.includes('not found')) { failures.push({ id, error: error.message }); } else { deleted++; // Count as success } } } // Handle resync (diff mode) if (body.isResync && body.upserts?.length) { const diffResult = await resourceService.diffAndSync(body.upserts); // diffResult contains: inserted, updated, deleted counts } if (failures.length > 0) { return { success: false, error: { code: 'SYNC_PARTIAL_FAILURE', message: `Failed to process ${failures.length} resources`, details: { upserted, deleted, failures } } }; } return { success: true, data: { upserted, deleted } }; } ``` ### 4. MCP Tool: search-resources *(Deferred)* > **Decision**: Standalone MCP tool deferred to future "cluster intelligence" PRD. The `ResourceVectorService` implements full search capabilities (`searchResources()`, `listResources()` with filters) ready for future AI orchestration layer that combines resources, capabilities, and live kubectl queries. ### 5. Resync Diff Logic For periodic resyncs, the controller sends all resources with `isResync: true`. MCP must diff against Qdrant: ```typescript async function diffAndSync(incoming: ClusterResource[]): Promise<SyncResult> { // Get all existing resources from Qdrant const existing = await this.listAllResources(); const existingMap = new Map(existing.map(r => [r.id, r])); const incomingMap = new Map(incoming.map(r => [r.id, r])); const toInsert: ClusterResource[] = []; const toUpdate: ClusterResource[] = []; const toDelete: string[] = []; // Find new and changed resources for (const resource of incoming) { const existing = existingMap.get(resource.id); if (!existing) { toInsert.push(resource); } else if (hasChanged(existing, resource)) { toUpdate.push(resource); } } // Find deleted resources for (const id of existingMap.keys()) { if (!incomingMap.has(id)) { toDelete.push(id); } } // Apply changes for (const r of [...toInsert, ...toUpdate]) { const embedding = await embeddingService.embed(buildEmbeddingText(r)); await this.upsertResource({ ...r, embedding }); } for (const id of toDelete) { await this.deleteResource(id); } return { inserted: toInsert.length, updated: toUpdate.length, deleted: toDelete.length }; } ``` --- ## Scope ### In Scope **M1: Resource Vector Service** - New `ResourceVectorService` class extending `BaseVectorService` - Qdrant collection `resources` with appropriate schema - Upsert, delete, search, list operations - Embedding text generation from resource data **M2: REST Sync Endpoint** - `POST /api/v1/resources/sync` endpoint - Request validation (Zod schema) - Upsert and delete handling - Partial failure reporting - Idempotent delete (ignore not found) **M3: Resync Diff Logic** - Diff algorithm for full resync - Insert new, update changed, delete missing - Efficient comparison using resource ID **M4: MCP Tools** *(Deferred)* - ~~`search-resources` - semantic search tool~~ → Deferred to future "cluster intelligence" PRD - Search capabilities built into ResourceVectorService for future orchestration - Service layer ready for unified AI tool that combines resources, capabilities, and live kubectl **M5: Testing** - Integration tests for sync endpoint - Documentation guide removed (no user-facing functionality - sync endpoint is internal, MCP tool deferred) - Controller docs (ResourceSyncConfig) tracked separately in dot-ai-controller repo ### Out of Scope - **Standalone MCP tool** - `search-resources` deferred to future "cluster intelligence" PRD that will unify resources, capabilities, and kubectl - On-demand resource detail fetching (call Kubernetes API) - Future enhancement - Resource modification through the interface - Multi-cluster support (single cluster only) - Events resource syncing (high volume, low signal) --- ## Milestones - [x] **M1: Resource Vector Service** - Create `src/core/resource-vector-service.ts` - Qdrant collection initialization with proper schema - Upsert, delete, list operations - Embedding text generation (`buildEmbeddingText`) from labels/annotations - [x] **M2: REST Sync Endpoint** - Add `/api/v1/resources/sync` to REST router - Manual validation for `ResourceSyncRequest` (Zod replaced due to runtime issues) - Handle upserts with embedding generation - Handle deletes (idempotent) - RestApiResponse formatting with partial failure support - [x] **M3: Resync Diff Logic** - Implement `diffAndSync()` method - Compare incoming vs Qdrant state - Insert new, update changed, delete missing - [~] **M4: MCP Tools** *(Deferred)* - ~~Add `search-resources` tool definition~~ → Deferred - Search capabilities deferred to future "cluster intelligence" PRD - Future unified tool will combine resources, capabilities, and kubectl - [x] **M5: Testing** - [x] Integration tests for sync endpoint (6 tests passing) - Unit tests skipped - integration tests provide sufficient coverage - Documentation guide removed - no user-facing functionality (sync endpoint is internal, MCP tool deferred) - Controller docs (ResourceSyncConfig) tracked separately in dot-ai-controller repo - [ ] **M6: Cluster Intelligence PRD** *(Placeholder)* - Create new PRD for unified "Cluster Intelligence" tool - Natural language interface: "which databases?", "show failing workloads", etc. - AI orchestrates multiple data sources: resources (this PRD), capabilities, live kubectl - Discuss scope, approach, and priorities when M1-M5 are complete - **Note**: This is a placeholder for future discussion, not implementation work --- ## Dependencies ### Internal Dependencies - `src/core/base-vector-service.ts` - Base class for vector operations - `src/core/embedding-service.ts` - Embedding generation - `src/core/vector-db-service.ts` - Qdrant client - `src/interfaces/rest-api.ts` - REST endpoint router ### External Dependencies - Qdrant running and accessible - Embedding provider configured (OpenAI, Google, or Bedrock) - dot-ai-controller v0.x.x sending sync requests ### Cross-Repository Dependencies - **vfarcic/dot-ai-controller#28** (Phase 1 complete) - Controller sending data - **Follow-up**: Update dot-ai-controller docs to uncomment ResourceSyncConfig references (tracked in controller repo) --- ## Success Criteria 1. **Endpoint receives data**: Controller successfully sends resources to `/api/v1/resources/sync` 2. **Resources stored**: Resources appear in Qdrant `resources` collection with embeddings 3. **Semantic search works**: Query "database deployments" or "production services" returns relevant results based on labels/annotations 4. **Resync works**: Hourly resync correctly identifies and removes deleted resources 5. **Idempotent deletes**: Deleting non-existent resources doesn't cause errors 6. **Partial failures handled**: API returns success=false with details on which resources failed --- ## Risks and Mitigations | Risk | Mitigation | |------|------------| | High embedding API costs | Batch embedding requests; only re-embed on actual changes | | Qdrant overload | Batch writes; use controller-side debouncing (already implemented) | | Stale data after restart | Periodic resync (hourly) catches missed changes | | Embedding provider failures | Retry with backoff; fail gracefully with error response | | Large clusters | Pagination in listAllResources; incremental sync for resync | --- ## Decision Log | Decision | Rationale | |----------|-----------| | **Separate collection from capabilities** | Resources are what IS running; capabilities are what CAN be deployed. Different purposes, different queries. | | **Follow capabilities pattern** | Consistent architecture; reuse existing base classes and patterns. | | **Idempotent deletes** | Controller may send duplicate delete requests; simplifies retry logic. | | **Partial failure response** | Don't fail entire batch for one resource; report what succeeded. | | **Embedding text from labels+annotations** | Enables semantic search on "production", "database", app names, etc. | | **Exclude status from storage** | Status changes frequently and Kubernetes is the source of truth. This collection is for resource discovery/inventory, not status monitoring. Status queries should use kubectl directly. | | **Defer standalone MCP tool** | A standalone `search-resources` MCP tool provides low value since users will interact via a unified "cluster intelligence" AI that orchestrates multiple data sources (resources, capabilities, kubectl). The service layer implements full search capabilities for future orchestration, but premature tool exposure forces users/AI to know implementation details. | | **Controller sends components, not IDs** | Controller sends objects with `namespace`, `apiVersion`, `kind`, `name`. MCP constructs IDs internally (`namespace:apiVersion:kind:name`) and hashes to UUIDs for Qdrant (which requires UUIDs or integers). This keeps the controller simple and ensures ID format consistency. | | **Manual validation over Zod** | Zod schema parsing threw runtime exceptions in the container environment. Manual validation provides clearer error messages and avoids dependency issues. | | **Global collection initialization flag** | Collection is initialized once on first request, with conflict handling for race conditions. Avoids expensive re-initialization on every request while handling MCP restarts gracefully. | --- ## Progress Log | Date | Update | |------|--------| | 2025-12-19 | PRD created | | 2025-12-19 | Decision: Defer M4 (MCP tool) - standalone `search-resources` tool provides low value; service layer will have search capabilities ready for future unified "cluster intelligence" tool | | 2025-12-19 | Added M6 placeholder for future "Cluster Intelligence" PRD - unified natural language interface for cluster queries | | 2025-12-19 | **M1 Complete**: Created `ResourceVectorService` with full CRUD operations, `buildEmbeddingText()`, `diffAndSync()` | | 2025-12-19 | **M3 Complete**: Implemented resync diff logic with `hasResourceChanged()` comparison | | 2025-12-19 | **Design Decision**: Excluded `status` from storage - Kubernetes is source of truth for live status; collection is for resource discovery/inventory only. Controller updated to not send status. | | 2025-12-19 | **Simplification**: Removed `searchResources()`, `ResourceSearchOptions`, `ResourceFilters` - YAGNI, will add when cluster intelligence PRD is implemented | | 2025-12-19 | **M2 Complete**: REST sync endpoint implemented with manual validation (Zod replaced due to runtime issues). Controller sends objects with component fields; MCP constructs IDs and hashes to UUIDs for Qdrant. Integration tests passing (6 tests). | | 2025-12-19 | **API Contract Update**: Removed `id` field from request - controller sends only component fields (`namespace`, `apiVersion`, `kind`, `name`). MCP constructs deterministic IDs internally. Deletes array now accepts objects instead of ID strings. | | 2025-12-19 | **M5 Complete**: Integration tests (6 passing) provide sufficient coverage. Documentation guide removed - sync endpoint is internal infrastructure with no user-facing functionality. Controller docs tracked separately in dot-ai-controller repo. | | 2025-12-19 | **PRD Complete**: M1-M3, M5 complete. M4 deferred. M6 is a placeholder for future work. | --- ## References - [dot-ai-controller PRD #28](https://github.com/vfarcic/dot-ai-controller/blob/main/prds/28-resource-visibility.md) - Controller-side implementation (Phase 1) - [PRD #48: Resource Capabilities](./done/48-resource-capabilities-discovery-integration.md) - Pattern to follow - [PRD #110: REST API Gateway](./done/110-rest-api-gateway.md) - REST endpoint patterns - [capability-vector-service.ts](../src/core/capability-vector-service.ts) - Template for ResourceVectorService - [rest-api.ts](../src/interfaces/rest-api.ts) - REST router to extend

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/vfarcic/dot-ai'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

287-resource-sync-endpoint.md•21.4 KiB