Skip to main content
Glama
devops-infrastructure-automation-2025.md15.9 kB
# DevOps & Infrastructure Automation 2025 **Updated**: 2025-11-24 | **Focus**: CI/CD, Docker, Kubernetes, IaC, Monitoring --- ## CI/CD Pipelines ```yaml # GITHUB ACTIONS (CI/CD workflow) name: CI/CD Pipeline on: push: branches: [main, develop] pull_request: branches: [main] jobs: test: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v3 - name: Setup Node.js uses: actions/setup-node@v3 with: node-version: '18' cache: 'npm' - name: Install dependencies run: npm ci - name: Run linter run: npm run lint - name: Run tests run: npm test - name: Upload coverage uses: codecov/codecov-action@v3 with: files: ./coverage/coverage-final.json build: needs: test # Only run if tests pass runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v3 - name: Login to Docker Hub uses: docker/login-action@v2 with: username: ${{ secrets.DOCKER_USERNAME }} password: ${{ secrets.DOCKER_PASSWORD }} - name: Build and push Docker image uses: docker/build-push-action@v4 with: context: . push: true tags: | myapp:latest myapp:${{ github.sha }} deploy: needs: build runs-on: ubuntu-latest if: github.ref == 'refs/heads/main' # Only deploy from main steps: - name: Deploy to production uses: appleboy/ssh-action@v0.1.10 with: host: ${{ secrets.PROD_SERVER }} username: ${{ secrets.SSH_USERNAME }} key: ${{ secrets.SSH_PRIVATE_KEY }} script: | docker pull myapp:${{ github.sha }} docker stop myapp || true docker rm myapp || true docker run -d --name myapp -p 80:3000 myapp:${{ github.sha }} --- # JENKINS (Alternative CI/CD) pipeline { agent any environment { DOCKER_IMAGE = "myapp" DOCKER_TAG = "${env.BUILD_NUMBER}" } stages { stage('Checkout') { steps { checkout scm } } stage('Test') { steps { sh 'npm ci' sh 'npm run lint' sh 'npm test' } } stage('Build') { steps { sh "docker build -t ${DOCKER_IMAGE}:${DOCKER_TAG} ." } } stage('Push') { steps { withCredentials([usernamePassword( credentialsId: 'docker-hub', usernameVariable: 'DOCKER_USER', passwordVariable: 'DOCKER_PASS' )]) { sh "docker login -u $DOCKER_USER -p $DOCKER_PASS" sh "docker push ${DOCKER_IMAGE}:${DOCKER_TAG}" } } } stage('Deploy') { when { branch 'main' } steps { sh """ kubectl set image deployment/myapp \ myapp=${DOCKER_IMAGE}:${DOCKER_TAG} """ } } } post { failure { mail to: 'team@example.com', subject: "Pipeline Failed: ${env.JOB_NAME} #${env.BUILD_NUMBER}", body: "Check ${env.BUILD_URL}" } } } --- # CI/CD BEST PRACTICES: 1. FAST FEEDBACK: - Total pipeline <10 min (developers wait) - Parallelize tests (split into chunks) - Cache dependencies (npm ci vs npm install) 2. FAIL FAST: - Run fast tests first (linting, unit tests) - Slow tests later (integration, e2e) - Stop on first failure (don't waste time) 3. ARTIFACT VERSIONING: - Tag images with commit SHA or build number - Immutable (never overwrite "latest" in prod) 4. ENVIRONMENT PARITY: - Dev, staging, prod same (avoid "works on my machine") - Use same Docker images across envs 5. ROLLBACK STRATEGY: - Blue-green deployment (instant rollback) - Canary (gradual rollout, detect issues early) ``` --- ## Docker ```dockerfile # DOCKERFILE (Multi-stage build, optimized) # STAGE 1: Build FROM node:18-alpine AS builder WORKDIR /app # Copy dependency files first (layer caching) COPY package*.json ./ RUN npm ci --only=production # Copy source code COPY . . # Build (if TypeScript, Next.js, etc.) RUN npm run build --- # STAGE 2: Production FROM node:18-alpine # Security: Non-root user RUN addgroup -g 1001 -S nodejs RUN adduser -S nodejs -u 1001 WORKDIR /app # Copy built artifacts from builder COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules COPY --from=builder --chown=nodejs:nodejs /app/package.json ./ USER nodejs EXPOSE 3000 CMD ["node", "dist/index.js"] --- # WHY MULTI-STAGE? # ❌ Single-stage (bloated): FROM node:18 WORKDIR /app COPY . . RUN npm install # Includes dev dependencies (build tools, etc.) CMD ["node", "index.js"] # Result: 1.2 GB image # ✅ Multi-stage (optimized): # Only production dependencies, no build tools # Result: 150 MB image (8× smaller!) --- # DOCKER COMPOSE (Local development, multiple services) version: '3.8' services: app: build: . ports: - "3000:3000" environment: - NODE_ENV=development - DATABASE_URL=postgres://user:pass@db:5432/mydb volumes: - .:/app # Hot reload (code changes reflect immediately) - /app/node_modules # Don't overwrite node_modules depends_on: - db - redis db: image: postgres:15-alpine environment: POSTGRES_USER: user POSTGRES_PASSWORD: pass POSTGRES_DB: mydb ports: - "5432:5432" volumes: - postgres_data:/var/lib/postgresql/data redis: image: redis:7-alpine ports: - "6379:6379" volumes: postgres_data: --- # COMMANDS: # Build and start all services docker-compose up --build # Start in background (detached) docker-compose up -d # View logs docker-compose logs -f app # Stop and remove containers docker-compose down # Stop and remove volumes (data) docker-compose down -v --- # DOCKER BEST PRACTICES: 1. LAYER CACHING: # ❌ Invalidates cache on any code change COPY . . RUN npm install # ✅ Cache dependencies (change infrequently) COPY package*.json ./ RUN npm ci COPY . . 2. ALPINE IMAGES: # node:18 = 1 GB # node:18-alpine = 170 MB (minimal, faster pulls) 3. .dockerignore (Exclude files from image): node_modules .git .env *.md .dockerignore 4. SECURITY: - Run as non-root user (RUN adduser) - Scan for vulnerabilities (docker scan, Snyk) - Pin versions (node:18.17.0, not node:latest) ``` --- ## Kubernetes ```yaml # DEPLOYMENT (App replicas, rolling updates) apiVersion: apps/v1 kind: Deployment metadata: name: myapp labels: app: myapp spec: replicas: 3 # 3 copies (high availability) selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: myapp image: myapp:1.0.0 ports: - containerPort: 3000 env: - name: NODE_ENV value: "production" - name: DATABASE_URL valueFrom: secretKeyRef: name: db-secret key: url resources: requests: cpu: "100m" # 0.1 CPU (min) memory: "128Mi" # 128 MB (min) limits: cpu: "500m" # 0.5 CPU (max) memory: "512Mi" # 512 MB (max) livenessProbe: # Restart if unhealthy httpGet: path: /health port: 3000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: # Don't send traffic if not ready httpGet: path: /ready port: 3000 initialDelaySeconds: 5 periodSeconds: 5 --- # SERVICE (Load balancer, routes traffic to pods) apiVersion: v1 kind: Service metadata: name: myapp-service spec: selector: app: myapp # Routes to pods with this label ports: - protocol: TCP port: 80 # External port targetPort: 3000 # Container port type: LoadBalancer # Exposes externally (cloud provider creates load balancer) --- # INGRESS (HTTP routing, SSL/TLS) apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: myapp-ingress annotations: cert-manager.io/cluster-issuer: "letsencrypt-prod" spec: tls: - hosts: - myapp.com secretName: myapp-tls rules: - host: myapp.com http: paths: - path: / pathType: Prefix backend: service: name: myapp-service port: number: 80 --- # CONFIGMAP (Configuration, non-sensitive) apiVersion: v1 kind: ConfigMap metadata: name: app-config data: APP_NAME: "MyApp" LOG_LEVEL: "info" config.json: | { "feature_flag": true } --- # SECRET (Sensitive data, base64 encoded) apiVersion: v1 kind: Secret metadata: name: db-secret type: Opaque data: url: cG9zdGdyZXM6Ly91c2VyOnBhc3NAZGIuZXhhbXBsZS5jb20vbXlkYg== # base64 of: postgres://user:pass@db.example.com/mydb # Create from command line (auto-encodes): kubectl create secret generic db-secret \ --from-literal=url='postgres://user:pass@db.example.com/mydb' --- # COMMANDS: # Apply all YAML files kubectl apply -f k8s/ # Get resources kubectl get pods kubectl get services kubectl get deployments # Describe (detailed info) kubectl describe pod myapp-xyz123 # Logs kubectl logs myapp-xyz123 kubectl logs -f myapp-xyz123 # Follow (tail) # Execute command in pod kubectl exec -it myapp-xyz123 -- /bin/sh # Scale replicas kubectl scale deployment myapp --replicas=5 # Rolling update (new image) kubectl set image deployment/myapp myapp=myapp:2.0.0 # Rollback kubectl rollout undo deployment/myapp # Port forward (local testing) kubectl port-forward svc/myapp-service 8080:80 # Access at localhost:8080 ``` --- ## Infrastructure as Code (IaC) ```hcl # TERRAFORM (Provision cloud resources) # AWS Example: VPC, EC2, RDS provider "aws" { region = "us-east-1" } # VPC (Virtual Private Cloud) resource "aws_vpc" "main" { cidr_block = "10.0.0.0/16" tags = { Name = "main-vpc" } } # Subnet (Public) resource "aws_subnet" "public" { vpc_id = aws_vpc.main.id cidr_block = "10.0.1.0/24" availability_zone = "us-east-1a" tags = { Name = "public-subnet" } } # Internet Gateway resource "aws_internet_gateway" "gw" { vpc_id = aws_vpc.main.id } # Security Group (Firewall rules) resource "aws_security_group" "web" { name = "web-sg" description = "Allow HTTP/HTTPS traffic" vpc_id = aws_vpc.main.id ingress { from_port = 80 to_port = 80 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] # Allow from anywhere } ingress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } egress { from_port = 0 to_port = 0 protocol = "-1" # All protocols cidr_blocks = ["0.0.0.0/0"] } } # EC2 Instance resource "aws_instance" "web" { ami = "ami-0c55b159cbfafe1f0" # Amazon Linux 2 instance_type = "t3.micro" subnet_id = aws_subnet.public.id vpc_security_group_ids = [aws_security_group.web.id] user_data = <<-EOF #!/bin/bash yum update -y yum install -y docker systemctl start docker docker run -d -p 80:3000 myapp:latest EOF tags = { Name = "web-server" } } # RDS Database resource "aws_db_instance" "postgres" { allocated_storage = 20 engine = "postgres" engine_version = "15.3" instance_class = "db.t3.micro" db_name = "mydb" username = "admin" password = var.db_password # From variable (don't hardcode!) skip_final_snapshot = true vpc_security_group_ids = [aws_security_group.db.id] db_subnet_group_name = aws_db_subnet_group.main.name } # Variables variable "db_password" { description = "Database password" type = string sensitive = true } # Outputs output "instance_ip" { value = aws_instance.web.public_ip } output "db_endpoint" { value = aws_db_instance.postgres.endpoint } --- # COMMANDS: # Initialize (download providers) terraform init # Plan (preview changes, dry-run) terraform plan # Apply (create resources) terraform apply # Destroy (delete all resources) terraform destroy # State (view current infrastructure) terraform state list terraform state show aws_instance.web --- # BEST PRACTICES: 1. REMOTE STATE: # Store state in S3 (not local, team collaboration) terraform { backend "s3" { bucket = "my-terraform-state" key = "prod/terraform.tfstate" region = "us-east-1" } } 2. MODULES (Reusable): # modules/vpc/main.tf (define once, reuse) module "vpc" { source = "./modules/vpc" cidr_block = "10.0.0.0/16" } 3. WORKSPACES (Separate envs): terraform workspace new dev terraform workspace new prod terraform workspace select prod ``` --- ## Monitoring & Logging ```yaml # PROMETHEUS (Metrics collection) # prometheus.yml (Scrape config) global: scrape_interval: 15s scrape_configs: - job_name: 'myapp' static_configs: - targets: ['localhost:3000'] --- # GRAFANA (Dashboards) # Add Prometheus as data source: 1. Configuration → Data Sources → Add → Prometheus 2. URL: http://prometheus:9090 # Create dashboard: - Panel 1: CPU usage (PromQL query) rate(process_cpu_seconds_total[1m]) - Panel 2: Memory usage process_resident_memory_bytes - Panel 3: HTTP requests per second rate(http_requests_total[1m]) - Panel 4: Error rate rate(http_requests_total{status=~"5.."}[1m]) --- # ALERTS (Alertmanager) # alertmanager.yml groups: - name: example rules: - alert: HighErrorRate expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05 for: 5m labels: severity: critical annotations: summary: "High error rate detected" description: "Error rate is {{ $value }} requests/sec" --- # ELK STACK (Logging: Elasticsearch, Logstash, Kibana) # docker-compose.yml services: elasticsearch: image: elasticsearch:8.9.0 environment: - discovery.type=single-node ports: - "9200:9200" logstash: image: logstash:8.9.0 volumes: - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf depends_on: - elasticsearch kibana: image: kibana:8.9.0 ports: - "5601:5601" depends_on: - elasticsearch # App logs → Logstash → Elasticsearch → Kibana (visualize) --- # DISTRIBUTED TRACING (Jaeger, OpenTelemetry) # Track request across microservices import { trace } from '@opentelemetry/api'; const tracer = trace.getTracer('myapp'); app.get('/api/users', async (req, res) => { const span = tracer.startSpan('get-users'); try { const users = await db.users.findAll(); res.json(users); } finally { span.end(); } }); # View trace in Jaeger UI (shows latency of each service call) ``` --- ## Key Takeaways 1. **Automate everything** - Manual deployments = errors (CI/CD pipelines, IaC) 2. **Immutable infrastructure** - Replace, don't modify (Docker images, not SSH and change files) 3. **Observability** - Can't fix what you can't see (metrics, logs, traces) 4. **High availability** - Multiple replicas, health checks, rollback strategy 5. **Security** - Scan images, secrets management (not hardcoded), least privilege --- ## References - "The DevOps Handbook" - Gene Kim - "Kubernetes Up & Running" - Kelsey Hightower - Docker Documentation, Kubernetes Documentation **Related**: `kubernetes-patterns.md`, `terraform-modules.md`, `monitoring-best-practices.md`

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/seanshin0214/persona-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server