DevOps AI Toolkit

343-kubectl-plugin-migration.md•19.7 KiB

# PRD #343: kubectl Plugin Migration **GitHub Issue**: [#343](https://github.com/vfarcic/dot-ai/issues/343) **Parent PRD**: [#342 - Modular Plugin Architecture](./342-modular-plugin-architecture.md) **Status**: Complete **Priority**: High **Created**: 2025-01-25 --- ## Problem Statement kubectl tools (`kubectl_get`, `kubectl_apply`, etc.) are embedded in dot-ai core (`src/core/kubectl-tools.ts`), preventing: 1. Modular extension by users 2. Clear separation of concerns 3. Independent testing and development of tools Additionally, Kubernetes interactions are scattered across the codebase: - `src/core/kubectl-tools.ts` - agentic kubectl tools - `src/core/discovery.ts` - `KubernetesDiscovery` class with `executeKubectl()` - `src/tools/version.ts` - direct `@kubernetes/client-node` usage for K8s version - Various files using `KubernetesDiscovery` or direct K8s client This duplication leads to inconsistent error handling, harder testing, and maintenance burden. --- ## Solution Overview 1. Create `agentic-tools` package in monorepo (`packages/agentic-tools/`) 2. Implement kubectl_* tools with standard plugin interface 3. Build as container image with HTTP endpoint 4. Integrate into dot-ai (plugin discovery and invocation) 5. Remove old kubectl-tools code from dot-ai core **Done when**: ALL Kubernetes interactions go through the plugin system and scattered K8s code is removed. --- ## Technical Design ### Plugin Interface **Endpoint**: `POST /execute` **Describe Hook** (returns tool definitions): ```json // Request { "hook": "describe" } // Response { "name": "agentic-tools", "version": "1.0.0", "tools": [ { "name": "kubectl_get", "type": "agentic", "description": "Get Kubernetes resources", "inputSchema": { "type": "object", "properties": { "kind": { "type": "string", "description": "Resource kind" }, "namespace": { "type": "string", "description": "Namespace" }, "name": { "type": "string", "description": "Resource name" }, "output": { "type": "string", "default": "json" } }, "required": ["kind"] } } // ... other tools ] } ``` **Invoke Hook** (executes tool): ```json // Request { "hook": "invoke", "sessionId": "ses_abc123", "payload": { "tool": "kubectl_get", "args": { "kind": "Pod", "namespace": "default" }, "state": {} } } // Response { "sessionId": "ses_abc123", "success": true, "result": { "items": [...] }, "state": { "lastQuery": "pods" } } ``` **Error Response**: ```json { "sessionId": "ses_abc123", "success": false, "error": { "code": "KUBECTL_FAILED", "message": "Failed to get pods: connection refused", "details": {} } } ``` ### Package Structure ``` packages/agentic-tools/ ├── Dockerfile ├── .dockerignore ├── package.json ├── tsconfig.json ├── vitest.config.ts ├── src/ │ ├── index.ts # HTTP server, /execute endpoint │ ├── types.ts # Shared types for HTTP interface │ ├── hooks/ │ │ ├── describe.ts # Returns tool definitions │ │ └── invoke.ts # Routes to tool handlers │ └── tools/ │ ├── index.ts # Tool registry │ ├── kubectl-get.ts │ ├── kubectl-apply.ts │ ├── kubectl-describe.ts │ ├── kubectl-logs.ts │ ├── kubectl-events.ts │ ├── kubectl-api-resources.ts │ └── kubectl-get-crd-schema.ts └── tests/ └── unit/ # Fast tests, no cluster needed ``` ### dot-ai Integration **Helm Configuration** (`values.yaml`): Plugins are configured as a map, allowing users to customize defaults or add their own. Built-in plugins use `dot-ai-` prefix. ```yaml # Plugin configuration (PRD #343) # Two modes: # - Deployed: Provide `image` + `port` → chart creates Deployment + Service # - External: Provide `endpoint` → chart just registers it (you deploy separately) plugins: # Built-in plugin - deployed by chart dot-ai-agentic-tools: enabled: true image: repository: ghcr.io/vfarcic/dot-ai-agentic-tools tag: "0.1.0" port: 8080 # RBAC created automatically by chart (hardcoded for agentic-tools) env: [] # Additional environment variables envFrom: [] # Environment from ConfigMaps/Secrets resources: requests: memory: "128Mi" cpu: "100m" limits: memory: "512Mi" cpu: "500m" # Example: User plugin deployed by chart # my-company-tools: # enabled: true # image: # repository: myregistry/my-tools # tag: "v1.0.0" # port: 8082 # serviceAccountName: "my-tools-sa" # You create this SA with needed permissions # env: # - name: API_KEY # valueFrom: # secretKeyRef: # name: my-tools-secrets # key: api-key # Example: External plugin (you deploy separately) # my-external-plugin: # enabled: true # endpoint: "http://my-plugin.other-namespace.svc:8080" # # NOTE: You are responsible for deploying this plugin and # # ensuring it has appropriate Kubernetes RBAC permissions. ``` **Deployment Model**: Separate Deployments (not sidecars). - Each plugin runs as its own Deployment + Service - Independent scaling and lifecycle - Plugin updates don't restart MCP server - Standard K8s service discovery via DNS - `DOT_AI_PLUGINS_CONFIG` env var auto-built from enabled plugins **Plugin Configuration Modes**: - **Deployed by chart**: Provide `image` + `port` → chart creates Deployment + Service, auto-generates endpoint - **External (user-managed)**: Provide `endpoint` only → chart just registers it, user deploys separately ```yaml plugins: # Deployed by chart (has image + port) dot-ai-agentic-tools: enabled: true image: repository: ghcr.io/vfarcic/dot-ai-agentic-tools tag: "0.1.0" port: 8080 env: [] # Additional environment variables envFrom: [] # Environment from ConfigMaps/Secrets resources: ... # endpoint auto-computed: http://<release>-dot-ai-agentic-tools:8080 # External plugin (user deploys separately) # my-external-plugin: # enabled: true # endpoint: "http://my-plugin.other-namespace.svc:8080" # serviceAccountName: "my-existing-sa" # Optional, user creates this ``` **RBAC Strategy**: - MCP server no longer needs Kubernetes permissions (all K8s ops go through plugins) - Only `agentic-tools` plugin needs K8s RBAC (runs kubectl commands) - Chart creates ServiceAccount + ClusterRole + ClusterRoleBinding specifically for `agentic-tools` (hardcoded in template) - User plugins requiring K8s access specify `serviceAccountName` pointing to their own SA - Removing RBAC from MCP server validates migration completeness (if something breaks, we missed migrating code) **Discovery Flow**: 1. dot-ai starts, reads plugin configuration from `DOT_AI_PLUGINS_CONFIG` env var 2. Plugin Deployments run as separate pods with their own Services 3. dot-ai waits for plugin readiness (health/ready endpoints) 4. dot-ai calls `POST /execute {hook: "describe"}` on each plugin via Service DNS 5. dot-ai registers tools from responses 6. Tools available for both agentic loops AND direct code invocation **Invocation Flow** (two paths): 1. **Agentic** (AI-driven): LLM → toolLoop → `PluginManager.createToolExecutor()` → plugin HTTP 2. **Direct** (code-driven): Code → `PluginManager.invokeTool()` → plugin HTTP --- ## Scope ### In Scope - Create `packages/agentic-tools/` package structure - Implement HTTP server with `/execute` endpoint - Implement `describe` hook returning tool definitions - Implement `invoke` hook with routing to tools - Migrate all kubectl_* agentic tools: - kubectl_get - kubectl_apply - kubectl_describe - kubectl_logs - kubectl_events - kubectl_api_resources - kubectl_get_crd_schema - Add utility kubectl tools for non-agentic use: - kubectl_version (for `version` tool to get K8s version) - Other utility operations as discovered - Create Dockerfile for container image - Integrate plugin discovery into dot-ai - Integrate plugin invocation into dot-ai (both agentic loops AND direct calls) - Consolidate ALL Kubernetes interactions to use plugin: - Update `version.ts` to use plugin for K8s version instead of `KubernetesDiscovery` - Update other files using `KubernetesDiscovery.executeKubectl()` to use plugin - Remove direct `@kubernetes/client-node` usage where plugin tools suffice - Remove old `src/core/kubectl-tools.ts` - Deprecate/remove `KubernetesDiscovery.executeKubectl()` (keep only config/connection logic) - Update Helm chart for plugin deployment - Integration tests ### Out of Scope - Other agentic tools (search_*, mermaid) - future PRDs - MCP tools as plugins - future PRDs - User-provided plugin documentation - future PRDs ### K8s Interaction Consolidation Map Files currently using Kubernetes interactions that need migration to plugin: | File | Current Approach | Migration Action | |------|------------------|------------------| | `src/core/kubectl-tools.ts` | Direct kubectl execution | Remove entirely (M7) | | `src/core/discovery.ts` | `executeKubectl()` method | Remove method, keep connection/config logic | | `src/tools/version.ts` | `KubernetesDiscovery` + `@kubernetes/client-node` | Use `kubectl_version` plugin tool | | `src/tools/query.ts` | `executeKubectlTools` | Use plugin executor | | `src/tools/remediate.ts` | `executeKubectlTools` | Use plugin executor | | `src/tools/operate-analysis.ts` | `executeKubectlTools` | Use plugin executor | | `src/core/capability-scan-workflow.ts` | `KubernetesDiscovery` | Evaluate; may need plugin tools | | `src/core/policy-operations.ts` | `KubernetesDiscovery` | Evaluate; may need plugin tools | | `src/core/deploy-operation.ts` | `KubernetesDiscovery` | Evaluate; may need plugin tools | **Note**: Some files may legitimately need `@kubernetes/client-node` for watch streams or complex operations. Evaluate case-by-case during M5. --- ## Milestones - [x] **M1: Package structure and HTTP server** - Create `packages/agentic-tools/` with basic structure - Implement HTTP server with `/execute` endpoint - Implement `describe` hook (empty tools list initially) - Dockerfile builds and runs - [x] **M2: kubectl tools implementation** - Migrate all kubectl_* tools to new package - Implement `invoke` hook with routing - Tools work when called directly via HTTP - [x] **M3: dot-ai plugin discovery** - dot-ai reads plugin configuration - dot-ai calls `describe` on plugins at startup - dot-ai registers tools from plugin responses - [x] **M4a: Helm chart plugin deployment** - Update `values.yaml` with `plugins` map configuration (image+port OR endpoint) - Create `templates/plugin-deployment.yaml` for plugin Deployments + Services - Create `templates/plugin-configmap.yaml` for plugins.json (mounted at `/etc/dot-ai/`) - Add `dot-ai.pluginsConfig` helper to `_helpers.tpl` for building plugins JSON - Update `deployment.yaml` to mount ConfigMap and remove `serviceAccountName` - Plugin uses existing ServiceAccount (RBAC shared, removed from MCP server) - Update `PluginManager.parsePluginConfig()` to read from file instead of env var - Integration test: Verify plugin Deployment, Service, and ConfigMap created - **Done when**: Plugin deploys as separate Deployment, `/health` and `/ready` respond, MCP server connects via Service - [x] **M4b: `version` tool plugin integration** (prove direct invocation) - Add `kubectl_version` tool to agentic-tools plugin - `version.ts` calls plugin via `PluginManager.invokeTool()` for K8s version - Add registered agentic tools list to `version` output - **Testing strategy**: - Existing tests validate K8s version still returned (no regression) - New test validates registered tools list is included - **Done when**: `version` MCP tool returns K8s version from plugin + lists discovered tools - [x] **M5: `query` tool + REST API plugin integration** (prove agentic invocation) - Add `createToolExecutor()` to PluginManager for agentic loop routing - `query.ts` uses plugin executor in its toolLoop - Remove direct `executeKubectlTools` import from `query.ts` - Migrate REST API endpoints to use plugin (no fallback): - `GET /api/v1/resource` → uses `kubectl_get_resource_json` via plugin - `GET /api/v1/logs` → uses `kubectl_logs` via plugin - `GET /api/v1/events` → uses `kubectl_events` via plugin - `GET /api/v1/resources?includeStatus=true` → fetches status via plugin - Remove direct kubectl imports from `rest-api.ts` - **Testing strategy**: - Existing `query.test.ts` tests cover kubectl tool usage (kubectl_get, semantic bridge pattern) - Since MCP server passes `pluginManager` to query tool, existing tests implicitly validate plugin routing - Existing REST API tests validate endpoint functionality through plugin - **Done when**: `query` tool and REST API endpoints work end-to-end through plugin, existing tests pass - [x] **M6: Remaining agentic consumers** - Migrate `remediate.ts` to use plugin executor - Migrate `operate-analysis.ts` to use plugin executor - Audit and migrate other `KubernetesDiscovery.executeKubectl()` usages - Fixed Helm 4 dry-run behavior (`--dry-run=client`) - Fixed plugin client timeout (30s → 5m for Helm operations) - [x] **M7: Cleanup** - Remove old `src/core/kubectl-tools.ts` - Remove `src/core/kubernetes-utils.ts` (standalone executeKubectl) - Remove unused functions from `resource-tools.ts` (`fetchResource`, `getPodLogs`, `getResourceEvents`) - Note: `KubernetesDiscovery.executeKubectl()` kept as internal method - already migrated to use plugin in M5/M6 - [x] **M8: CI/CD updates** - Restructured release.yml into parallel jobs: `prepare`, `build-dot-ai`, `build-agentic-tools`, `finalize` - `build-dot-ai` and `build-agentic-tools` jobs run in parallel after `prepare` - Both images use same version tags and are published to ghcr.io - Helm chart values.yaml updated during release to sync both image tags - GitHub release notes include both Docker images - **Done when**: CI builds plugin image, release publishes both images --- ## Dependencies ### Internal - dot-ai core (MCP server, agentic loop engine) - Helm chart ### External - Kubernetes cluster for testing - Container registry (ghcr.io) --- ## Success Criteria 1. **Plugin works standalone**: Can call `/execute` directly and get correct responses 2. **Integration works**: MCP tool calls flow through plugin correctly 3. **No regression**: Existing functionality works as before 4. **Old code removed**: `kubectl-tools.ts` deleted, no dead code 5. **K8s consolidated**: ALL Kubernetes interactions go through plugin (no scattered K8s code) 6. **Version tool enhanced**: Reports agentic tools count and uses plugin for K8s version 7. **Tests pass**: Integration tests cover the new flow 8. **Deployable**: Helm chart deploys plugin alongside dot-ai --- ## Risks and Mitigations | Risk | Mitigation | |------|------------| | Latency from HTTP calls | Benchmark; optimize if needed; keep containers warm | | Plugin startup timing | Implement readiness probes and retry logic in dot-ai | | Breaking existing flows | Comprehensive integration tests before removing old code | | Kubernetes auth in plugin | Pass kubeconfig via mount or environment | | Plugin unavailable blocks all K8s ops | Implement health checks; graceful degradation; clear error messages | | Scope creep from consolidation | Document exact files/methods to migrate; stick to list | --- ## Decision Log | Date | Decision | Rationale | |------|----------|-----------| | 2025-01-25 | Single `agentic-tools` package | All agentic tools in one container; simpler deployment | | 2025-01-25 | Two hooks: describe + invoke | Minimal surface to start; expand when needed | | 2025-01-25 | HTTP interface | Language-agnostic, standard, easy to test | | 2025-01-25 | Monorepo package | Can split later; easier to develop together initially | | 2026-01-26 | Self-contained tool modules with `KubectlTool` interface | Combines definition+handler; eliminates switch statement; enables isolated testing | | 2026-01-26 | ALL K8s interactions must go through plugin | Eliminates scattered K8s code (`KubernetesDiscovery.executeKubectl()`, direct `@kubernetes/client-node`); single source of truth for K8s operations; consistent error handling; easier testing | | 2026-01-26 | Incremental integration: `version` first, then `query`, then others | Proves direct invocation (version) before agentic invocation (query); each step validates the pattern before expanding; reduces risk | | 2026-01-26 | Sidecar deployment model | Simple networking (localhost), same lifecycle as MCP server, no service discovery needed | | 2026-01-26 | Plugins as map in values.yaml (not array) | Helm maps merge on override; users can selectively disable or add plugins without redefining all defaults | | 2026-01-26 | `dot-ai-` prefix for built-in plugins | Clear distinction between official and user-provided plugins in `helm get values` output | | 2026-01-26 | Separate M4a (Helm/infra) from M4b (code integration) | Validate infrastructure works before debugging application logic; catch deployment issues early | | 2026-01-26 | Separate Deployments instead of sidecars | Independent scaling, independent lifecycle (plugin updates don't restart MCP), cleaner separation of concerns, plugin failure doesn't crash MCP pod; networking via K8s Service DNS is standard pattern | | 2026-01-26 | Plugin config: `image`+`port` OR `endpoint` (mutually exclusive) | Implicit deploy vs external mode; no redundant `deploy: true/false` flag; validation is simple - presence of `image` implies chart deploys it | | 2026-01-26 | Generic plugin config with `env`, `envFrom`, `serviceAccountName` | Future-proof; plugins may need API keys via env vars or secrets; follows standard K8s container patterns | | 2026-01-26 | RBAC hardcoded for `agentic-tools` only, not generic | Only agentic-tools needs K8s RBAC (kubectl commands); chart shouldn't create arbitrary ClusterRoles for user plugins (security concern); users handle their own RBAC | | 2026-01-26 | Remove RBAC from MCP server | MCP server no longer needs K8s permissions after full migration; serves as validation - if something breaks, we know we missed migrating code to plugins | | 2026-01-26 | Plugin config via ConfigMap file, not env var | Cleaner K8s-native approach; ConfigMap mounted at `/etc/dot-ai/plugins.json`; no env var fallback since plugins only work in-cluster anyway | | 2026-01-26 | Plugin reuses existing ServiceAccount | No separate RBAC for plugin; agentic-tools uses dot-ai ServiceAccount; simpler than duplicating RBAC rules | | 2026-01-26 | Plugin name `agentic-tools` (not `dot-ai-agentic-tools`) | Avoids duplicate prefix in resource names (`dot-ai-dot-ai-agentic-tools`); deployment becomes `dot-ai-agentic-tools` | | 2026-01-26 | REST API endpoints must use plugin for kubectl ops | MCP server has no RBAC; REST endpoints (`/api/v1/resource`, `/logs`, `/events`) were failing; expanded M5 scope to include REST API migration | | 2026-01-26 | No fallback to direct kubectl execution | MCP server intentionally has no RBAC; fallback would never work; cleaner to fail with "plugin unavailable" error than silent failure | | 2026-01-26 | Add M8 for CI/CD updates | Need workflow to build/publish `dot-ai-agentic-tools` image; release workflow must publish both images | --- ## On Completion **When this PRD is complete, update the umbrella PRD [#342](./342-modular-plugin-architecture.md):** 1. **Migration Tracking table**: Mark kubectl_* tools as "Complete" with link to this PRD 2. **Lessons Learned section**: Document what worked, what didn't, and recommendations for future tool migrations 3. **Child PRDs section**: Mark this PRD as complete

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/vfarcic/dot-ai'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

343-kubectl-plugin-migration.md•19.7 KiB