---
description: "DevOps and infrastructure: Docker, Kubernetes, Terraform, CI/CD pipelines, and cloud deployment patterns"
globs: ["**/Dockerfile*", "**/*.tf", "**/docker-compose*.yml", "**/docker-compose*.yaml", "**/k8s/**", "**/.github/workflows/**", "**/.gitlab-ci.yml", "**/Jenkinsfile", "**/cloudbuild.yaml", "**/terraform/**", "**/helm/**"]
alwaysApply: false
---
# DevOps & Infrastructure Patterns
Containerization, orchestration, infrastructure as code, and CI/CD best practices.
## CRITICAL: Agentic-First DevOps
### Pre-Development Verification (MANDATORY)
Before writing ANY DevOps configuration:
```
1. CHECK TOOL AVAILABILITY
→ run_terminal_cmd("docker --version")
→ run_terminal_cmd("kubectl version --client")
→ run_terminal_cmd("terraform --version")
→ run_terminal_cmd("helm version")
2. VERIFY CURRENT VERSIONS (use web_search)
→ web_search("Terraform latest version December 2024")
→ web_search("Kubernetes latest stable version 2024")
→ web_search("Docker best practices 2024")
3. CHECK EXISTING INFRASTRUCTURE
→ Read existing Dockerfile, docker-compose.yml, *.tf files
→ Understand current state before modifying
→ Check terraform.tfstate or remote state
4. VALIDATE CONFIGURATIONS BEFORE APPLYING
→ terraform validate
→ docker-compose config
→ kubectl apply --dry-run=client
```
### CLI-First DevOps Workflow
**ALWAYS use CLI for validation:**
```bash
# Docker
docker build -t test:latest .
docker-compose config # Validate compose file
docker-compose up --dry-run # Test without running
# Kubernetes
kubectl apply --dry-run=client -f manifest.yaml
kubectl diff -f manifest.yaml # See changes before applying
kubeval manifest.yaml # Validate against schema
# Terraform
terraform init
terraform fmt -recursive
terraform validate
terraform plan -out=tfplan # ALWAYS plan before apply
# Helm
helm lint ./my-chart
helm template ./my-chart # Render templates locally
helm install --dry-run --debug my-release ./my-chart
```
### Post-Edit Verification
After ANY infrastructure code changes:
```bash
# Docker
docker build --no-cache -t test:latest .
docker run --rm test:latest echo "Build verified"
# Terraform
terraform fmt -check -recursive
terraform validate
terraform plan
# Kubernetes
kubectl apply --dry-run=server -f . # Server-side validation
kubectl get events --sort-by='.lastTimestamp' # Check for issues
```
### Common DevOps Syntax Traps (Avoid These!)
```yaml
# WRONG: YAML indentation with tabs
services:
app: # Tab character - YAML error!
image: nginx
# CORRECT: Always use spaces (2 spaces standard)
services:
app:
image: nginx
# WRONG: Missing quotes for special values
environment:
- VERSION=1.0 # Might be parsed as number
- ENABLED=true # Might be parsed as boolean
# CORRECT: Quote string values
environment:
- VERSION="1.0"
- ENABLED="true"
# WRONG: Hardcoded secrets in config
env:
- name: DB_PASSWORD
value: "supersecret123" # NEVER do this!
# CORRECT: Use secrets
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-secrets
key: password
```
### Infrastructure Version Pinning
Always pin versions explicitly:
```dockerfile
# WRONG
FROM node:latest
FROM python
# CORRECT - Pin major.minor at minimum
FROM node:20-alpine
FROM python:3.12-slim
```
```hcl
# WRONG
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
}
}
}
# CORRECT - Pin provider versions
terraform {
required_version = ">= 1.6.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
```
---
## Docker
### Dockerfile Best Practices
```dockerfile
# Use specific version tags
FROM node:20-alpine AS builder
# Set working directory
WORKDIR /app
# Copy dependency files first (better caching)
COPY package*.json ./
# Install dependencies
RUN npm ci --only=production
# Copy source code
COPY . .
# Build application
RUN npm run build
# Production stage
FROM node:20-alpine AS production
WORKDIR /app
# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
# Copy built assets from builder
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
# Switch to non-root user
USER nodejs
# Expose port
EXPOSE 3000
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
# Run application
CMD ["node", "dist/main.js"]
```
### Multi-Stage Builds
```dockerfile
# Build stage
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.* ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/server ./cmd/server
# Final stage
FROM alpine:3.18
RUN apk --no-cache add ca-certificates
WORKDIR /app
COPY --from=builder /app/server .
EXPOSE 8080
ENTRYPOINT ["./server"]
```
### Docker Compose
```yaml
version: '3.8'
services:
app:
build:
context: .
dockerfile: Dockerfile
target: development
ports:
- "3000:3000"
volumes:
- .:/app
- /app/node_modules
environment:
- NODE_ENV=development
- DATABASE_URL=postgres://user:pass@db:5432/mydb
depends_on:
db:
condition: service_healthy
networks:
- backend
db:
image: postgres:15-alpine
environment:
POSTGRES_USER: user
POSTGRES_PASSWORD: pass
POSTGRES_DB: mydb
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user -d mydb"]
interval: 5s
timeout: 5s
retries: 5
networks:
- backend
redis:
image: redis:7-alpine
ports:
- "6379:6379"
networks:
- backend
volumes:
postgres_data:
networks:
backend:
driver: bridge
```
---
## Kubernetes
### Deployment
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myregistry/myapp:v1.0.0
ports:
- containerPort: 8080
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: myapp-secrets
key: database-url
volumeMounts:
- name: config
mountPath: /app/config
volumes:
- name: config
configMap:
name: myapp-config
```
### Service
```yaml
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp
ports:
- port: 80
targetPort: 8080
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-ingress
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- myapp.example.com
secretName: myapp-tls
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp
port:
number: 80
```
### ConfigMap and Secrets
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: myapp-config
data:
APP_ENV: production
LOG_LEVEL: info
config.yaml: |
server:
port: 8080
features:
cache: true
---
apiVersion: v1
kind: Secret
metadata:
name: myapp-secrets
type: Opaque
stringData:
database-url: postgres://user:pass@db:5432/mydb
api-key: your-secret-api-key
```
### Horizontal Pod Autoscaler
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
```
---
## Terraform
### Provider Configuration
```hcl
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "my-terraform-state"
key = "prod/terraform.tfstate"
region = "us-west-2"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
Project = var.project_name
ManagedBy = "terraform"
}
}
}
```
### Variables and Outputs
```hcl
# variables.tf
variable "environment" {
description = "Deployment environment"
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
variable "instance_type" {
description = "EC2 instance type"
type = string
default = "t3.micro"
}
variable "db_config" {
description = "Database configuration"
type = object({
instance_class = string
storage_gb = number
multi_az = bool
})
default = {
instance_class = "db.t3.micro"
storage_gb = 20
multi_az = false
}
}
# outputs.tf
output "api_endpoint" {
description = "API Gateway endpoint URL"
value = aws_apigatewayv2_api.main.api_endpoint
}
output "database_endpoint" {
description = "RDS endpoint"
value = aws_db_instance.main.endpoint
sensitive = true
}
```
### Modules
```hcl
# modules/vpc/main.tf
resource "aws_vpc" "main" {
cidr_block = var.cidr_block
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.project}-vpc"
}
}
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.cidr_block, 4, count.index)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.project}-public-${count.index + 1}"
Type = "public"
}
}
# Usage in root module
module "vpc" {
source = "./modules/vpc"
project = var.project_name
cidr_block = "10.0.0.0/16"
availability_zones = ["us-west-2a", "us-west-2b"]
}
module "ecs" {
source = "./modules/ecs"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.public_subnet_ids
depends_on = [module.vpc]
}
```
---
## GitHub Actions
### CI Pipeline
```yaml
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Lint
run: npm run lint
- name: Test
run: npm test -- --coverage
- name: Upload coverage
uses: codecov/codecov-action@v3
build:
needs: test
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- name: Log in to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=sha,prefix=
type=ref,event=branch
type=semver,pattern={{version}}
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
```
### CD Pipeline
```yaml
name: Deploy
on:
push:
tags:
- 'v*'
jobs:
deploy-staging:
runs-on: ubuntu-latest
environment: staging
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-west-2
- name: Deploy to ECS
run: |
aws ecs update-service \
--cluster staging-cluster \
--service myapp \
--force-new-deployment
deploy-production:
needs: deploy-staging
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-west-2
- name: Deploy to ECS
run: |
aws ecs update-service \
--cluster production-cluster \
--service myapp \
--force-new-deployment
```
---
## Monitoring & Observability
### Prometheus Metrics
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
data:
prometheus.yml: |
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
```
### Application Logging
```yaml
# Fluent Bit configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
data:
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Refresh_Interval 5
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_Tag_Prefix kube.var.log.containers.
[OUTPUT]
Name es
Match *
Host elasticsearch
Port 9200
Index logs
```
---
## Security Best Practices
### Container Security
```dockerfile
# Use distroless or minimal base images
FROM gcr.io/distroless/nodejs20-debian12
# Never run as root
USER nonroot:nonroot
# Use specific versions, not latest
FROM node:20.10.0-alpine3.18
# Scan images in CI
# - trivy image myapp:latest
# - grype myapp:latest
```
### Secrets Management
```yaml
# External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: myapp-secrets
spec:
refreshInterval: 1h
secretStoreRef:
kind: ClusterSecretStore
name: aws-secrets-manager
target:
name: myapp-secrets
creationPolicy: Owner
data:
- secretKey: database-url
remoteRef:
key: prod/myapp/database
property: url
```
### Network Policies
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: myapp-network-policy
spec:
podSelector:
matchLabels:
app: myapp
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: nginx-ingress
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
```
---
## Common Commands
### Docker
```bash
# Build and run
docker build -t myapp .
docker run -p 3000:3000 myapp
# Debug container
docker exec -it <container_id> sh
docker logs -f <container_id>
# Clean up
docker system prune -af
```
### Kubernetes
```bash
# Apply resources
kubectl apply -f k8s/
# Debug pod
kubectl logs -f <pod_name>
kubectl exec -it <pod_name> -- sh
kubectl describe pod <pod_name>
# Rollout management
kubectl rollout status deployment/myapp
kubectl rollout undo deployment/myapp
```
### Terraform
```bash
# Initialize and plan
terraform init
terraform plan -out=tfplan
# Apply changes
terraform apply tfplan
# Destroy resources
terraform destroy
# Format and validate
terraform fmt -recursive
terraform validate
```