Skip to main content
Glama
devops-cicd-infrastructure-2025.md20.5 kB
# DevOps, CI/CD & Infrastructure 2025 **Updated**: 2025-11-24 | **Focus**: CI/CD Pipelines, Docker, Kubernetes, Infrastructure as Code, Monitoring --- ## CI/CD Fundamentals ```yaml # CONTINUOUS INTEGRATION/CONTINUOUS DEPLOYMENT CI (Continuous Integration): - Developers commit code frequently (multiple times per day) - Automated build & test on each commit - Fast feedback (catch bugs early, before merge to main) - Tools: Jenkins, GitLab CI, GitHub Actions, CircleCI, Travis CI CD (Continuous Deployment/Delivery): - Delivery: Automated to staging (manual approval to production) - Deployment: Fully automated to production (no manual step) - Deploy small changes frequently (less risk than big releases) --- # GITHUB ACTIONS CI/CD PIPELINE .github/workflows/ci-cd.yml: ``` ```yaml name: CI/CD Pipeline on: push: branches: [main, develop] pull_request: branches: [main] jobs: test: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v3 - name: Set up Node.js uses: actions/setup-node@v3 with: node-version: '18' - name: Install dependencies run: npm ci - name: Run linter run: npm run lint - name: Run tests run: npm test - name: Run coverage run: npm run coverage - name: Upload coverage to Codecov uses: codecov/codecov-action@v3 with: token: ${{ secrets.CODECOV_TOKEN }} build: needs: test runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v3 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v2 - name: Login to Docker Hub uses: docker/login-action@v2 with: username: ${{ secrets.DOCKER_USERNAME }} password: ${{ secrets.DOCKER_PASSWORD }} - name: Build and push Docker image uses: docker/build-push-action@v4 with: context: . push: true tags: myuser/myapp:${{ github.sha }},myuser/myapp:latest cache-from: type=registry,ref=myuser/myapp:latest cache-to: type=inline deploy: needs: build runs-on: ubuntu-latest if: github.ref == 'refs/heads/main' steps: - name: Checkout code uses: actions/checkout@v3 - name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v2 with: aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} aws-region: us-east-1 - name: Deploy to ECS run: | aws ecs update-service \ --cluster my-cluster \ --service my-service \ --force-new-deployment ``` ```bash # GITLAB CI/CD PIPELINE # .gitlab-ci.yml stages: - test - build - deploy variables: DOCKER_IMAGE: registry.gitlab.com/$CI_PROJECT_PATH:$CI_COMMIT_SHA test: stage: test image: node:18 script: - npm ci - npm run lint - npm test coverage: '/Lines\s*:\s*(\d+\.\d+)%/' artifacts: reports: coverage_report: coverage_format: cobertura path: coverage/cobertura-coverage.xml build: stage: build image: docker:latest services: - docker:dind before_script: - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY script: - docker build -t $DOCKER_IMAGE . - docker push $DOCKER_IMAGE only: - main - develop deploy_staging: stage: deploy image: alpine:latest before_script: - apk add --no-cache curl script: - | curl -X POST https://api.render.com/deploy/srv-xxxxx?key=$RENDER_API_KEY environment: name: staging url: https://staging.myapp.com only: - develop deploy_production: stage: deploy image: alpine:latest before_script: - apk add --no-cache curl script: - | curl -X POST https://api.render.com/deploy/srv-yyyyy?key=$RENDER_API_KEY environment: name: production url: https://myapp.com when: manual # Manual approval for production only: - main ``` --- ## Docker ```dockerfile # DOCKERFILE (Multi-stage build, optimized) # Stage 1: Build FROM node:18-alpine AS builder WORKDIR /app # Copy package files first (leverage Docker cache) COPY package*.json ./ # Install dependencies RUN npm ci --only=production # Copy source code COPY . . # Build application (if needed, e.g., TypeScript compilation) RUN npm run build # Stage 2: Production FROM node:18-alpine WORKDIR /app # Create non-root user (security) RUN addgroup -g 1001 -S nodejs && adduser -S nodejs -u 1001 # Copy only necessary files from builder COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules COPY --from=builder --chown=nodejs:nodejs /app/package.json ./ # Switch to non-root user USER nodejs # Expose port EXPOSE 3000 # Health check HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \ CMD node -e "require('http').get('http://localhost:3000/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})" # Start application CMD ["node", "dist/index.js"] ``` ```bash # DOCKER COMMANDS # Build image docker build -t myapp:latest . # Tag image docker tag myapp:latest myuser/myapp:v1.0.0 # Push to Docker Hub docker push myuser/myapp:v1.0.0 # Run container docker run -d \ --name myapp \ -p 3000:3000 \ -e DATABASE_URL=postgres://user:pass@db:5432/mydb \ --restart unless-stopped \ myapp:latest # View logs docker logs myapp docker logs -f myapp # Follow logs # Execute command in running container docker exec -it myapp sh # Stop and remove container docker stop myapp docker rm myapp # Remove image docker rmi myapp:latest # Prune unused resources (clean up) docker system prune -a --volumes --- # DOCKER COMPOSE (Multi-container applications) # docker-compose.yml version: '3.8' services: app: build: context: . dockerfile: Dockerfile ports: - "3000:3000" environment: DATABASE_URL: postgres://postgres:password@db:5432/mydb REDIS_URL: redis://redis:6379 depends_on: - db - redis restart: unless-stopped networks: - app-network db: image: postgres:15-alpine environment: POSTGRES_USER: postgres POSTGRES_PASSWORD: password POSTGRES_DB: mydb volumes: - postgres-data:/var/lib/postgresql/data ports: - "5432:5432" restart: unless-stopped networks: - app-network redis: image: redis:7-alpine ports: - "6379:6379" volumes: - redis-data:/data restart: unless-stopped networks: - app-network nginx: image: nginx:alpine ports: - "80:80" - "443:443" volumes: - ./nginx.conf:/etc/nginx/nginx.conf:ro - ./certs:/etc/nginx/certs:ro depends_on: - app restart: unless-stopped networks: - app-network volumes: postgres-data: redis-data: networks: app-network: driver: bridge # Commands: # docker-compose up -d # Start all services (detached) # docker-compose down # Stop and remove all services # docker-compose logs -f app # View logs for specific service # docker-compose ps # List running services # docker-compose exec app sh # Execute command in service ``` --- ## Kubernetes ```yaml # KUBERNETES DEPLOYMENT # deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: myapp namespace: production labels: app: myapp spec: replicas: 3 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: myapp image: myuser/myapp:v1.0.0 ports: - containerPort: 3000 env: - name: DATABASE_URL valueFrom: secretKeyRef: name: myapp-secrets key: database-url - name: NODE_ENV value: "production" resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 3000 initialDelaySeconds: 5 periodSeconds: 5 imagePullSecrets: - name: dockerhub-secret --- # SERVICE (Expose deployment) apiVersion: v1 kind: Service metadata: name: myapp-service namespace: production spec: selector: app: myapp ports: - protocol: TCP port: 80 targetPort: 3000 type: ClusterIP --- # INGRESS (External access, HTTPS) apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: myapp-ingress namespace: production annotations: cert-manager.io/cluster-issuer: "letsencrypt-prod" nginx.ingress.kubernetes.io/ssl-redirect: "true" spec: ingressClassName: nginx tls: - hosts: - myapp.com secretName: myapp-tls rules: - host: myapp.com http: paths: - path: / pathType: Prefix backend: service: name: myapp-service port: number: 80 --- # CONFIGMAP (Non-sensitive configuration) apiVersion: v1 kind: ConfigMap metadata: name: myapp-config namespace: production data: LOG_LEVEL: "info" MAX_CONNECTIONS: "100" --- # SECRET (Sensitive data, base64 encoded) apiVersion: v1 kind: Secret metadata: name: myapp-secrets namespace: production type: Opaque data: database-url: cG9zdGdyZXM6Ly91c2VyOnBhc3NAaG9zdDo1NDMyL2RiCg== # Base64 --- # HORIZONTAL POD AUTOSCALER (Auto-scale based on CPU) apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: myapp-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 ``` ```bash # KUBECTL COMMANDS # Apply manifests kubectl apply -f deployment.yaml kubectl apply -f . # Apply all files in directory # Get resources kubectl get pods -n production kubectl get services -n production kubectl get deployments -n production kubectl get ingress -n production # Describe resource (detailed info) kubectl describe pod myapp-xxxxx -n production # View logs kubectl logs myapp-xxxxx -n production kubectl logs -f myapp-xxxxx -n production # Follow logs kubectl logs myapp-xxxxx -n production --previous # Previous container logs (if crashed) # Execute command in pod kubectl exec -it myapp-xxxxx -n production -- sh # Port forward (access pod locally) kubectl port-forward pod/myapp-xxxxx 3000:3000 -n production # Scale deployment kubectl scale deployment myapp --replicas=5 -n production # Rolling update (new image) kubectl set image deployment/myapp myapp=myuser/myapp:v1.1.0 -n production # Rollback deployment kubectl rollout undo deployment/myapp -n production kubectl rollout status deployment/myapp -n production # Delete resources kubectl delete pod myapp-xxxxx -n production kubectl delete deployment myapp -n production kubectl delete -f deployment.yaml # Create secret from literal kubectl create secret generic myapp-secrets \ --from-literal=database-url=postgres://user:pass@host:5432/db \ -n production # Base64 encode/decode (for secrets) echo -n 'my-secret-value' | base64 echo 'bXktc2VjcmV0LXZhbHVl' | base64 --decode ``` --- ## Infrastructure as Code ```hcl # TERRAFORM (AWS ECS + RDS + Load Balancer) # main.tf terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } backend "s3" { bucket = "my-terraform-state" key = "production/terraform.tfstate" region = "us-east-1" } } provider "aws" { region = var.aws_region } # VPC resource "aws_vpc" "main" { cidr_block = "10.0.0.0/16" enable_dns_hostnames = true enable_dns_support = true tags = { Name = "main-vpc" } } # Subnets resource "aws_subnet" "public_1" { vpc_id = aws_vpc.main.id cidr_block = "10.0.1.0/24" availability_zone = "${var.aws_region}a" map_public_ip_on_launch = true tags = { Name = "public-subnet-1" } } resource "aws_subnet" "public_2" { vpc_id = aws_vpc.main.id cidr_block = "10.0.2.0/24" availability_zone = "${var.aws_region}b" map_public_ip_on_launch = true tags = { Name = "public-subnet-2" } } # Internet Gateway resource "aws_internet_gateway" "main" { vpc_id = aws_vpc.main.id tags = { Name = "main-igw" } } # Route table resource "aws_route_table" "public" { vpc_id = aws_vpc.main.id route { cidr_block = "0.0.0.0/0" gateway_id = aws_internet_gateway.main.id } tags = { Name = "public-route-table" } } # Security Group (Load Balancer) resource "aws_security_group" "alb" { name = "alb-sg" description = "Security group for ALB" vpc_id = aws_vpc.main.id ingress { from_port = 80 to_port = 80 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } ingress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } } # Application Load Balancer resource "aws_lb" "main" { name = "main-alb" internal = false load_balancer_type = "application" security_groups = [aws_security_group.alb.id] subnets = [aws_subnet.public_1.id, aws_subnet.public_2.id] tags = { Name = "main-alb" } } # ECS Cluster resource "aws_ecs_cluster" "main" { name = "main-cluster" setting { name = "containerInsights" value = "enabled" } } # ECS Task Definition resource "aws_ecs_task_definition" "app" { family = "myapp" network_mode = "awsvpc" requires_compatibilities = ["FARGATE"] cpu = "256" memory = "512" execution_role_arn = aws_iam_role.ecs_execution_role.arn container_definitions = jsonencode([ { name = "myapp" image = "myuser/myapp:latest" essential = true portMappings = [ { containerPort = 3000 protocol = "tcp" } ] environment = [ { name = "NODE_ENV" value = "production" } ] secrets = [ { name = "DATABASE_URL" valueFrom = aws_secretsmanager_secret.db_url.arn } ] logConfiguration = { logDriver = "awslogs" options = { "awslogs-group" = "/ecs/myapp" "awslogs-region" = var.aws_region "awslogs-stream-prefix" = "ecs" } } } ]) } # ECS Service resource "aws_ecs_service" "app" { name = "myapp-service" cluster = aws_ecs_cluster.main.id task_definition = aws_ecs_task_definition.app.arn desired_count = 2 launch_type = "FARGATE" network_configuration { subnets = [aws_subnet.public_1.id, aws_subnet.public_2.id] security_groups = [aws_security_group.ecs_tasks.id] assign_public_ip = true } load_balancer { target_group_arn = aws_lb_target_group.app.arn container_name = "myapp" container_port = 3000 } } # RDS Database resource "aws_db_instance" "main" { identifier = "myapp-db" engine = "postgres" engine_version = "15.3" instance_class = "db.t3.micro" allocated_storage = 20 storage_type = "gp3" db_name = "mydb" username = "postgres" password = var.db_password skip_final_snapshot = true publicly_accessible = false vpc_security_group_ids = [aws_security_group.rds.id] db_subnet_group_name = aws_db_subnet_group.main.name backup_retention_period = 7 backup_window = "03:00-04:00" maintenance_window = "sun:04:00-sun:05:00" tags = { Name = "myapp-db" } } # Variables variable "aws_region" { default = "us-east-1" } variable "db_password" { type = string sensitive = true } # Outputs output "alb_dns_name" { value = aws_lb.main.dns_name } output "rds_endpoint" { value = aws_db_instance.main.endpoint } ``` ```bash # TERRAFORM COMMANDS # Initialize (download providers) terraform init # Format code terraform fmt # Validate configuration terraform validate # Plan (preview changes) terraform plan # Apply (create resources) terraform apply terraform apply -auto-approve # Skip confirmation # Destroy (delete resources) terraform destroy # Show state terraform show # List resources terraform state list # Import existing resource terraform import aws_instance.example i-1234567890abcdef0 # Output values terraform output terraform output alb_dns_name ``` --- ## Monitoring & Logging ```yaml # PROMETHEUS + GRAFANA (Kubernetes) # prometheus-values.yaml (Helm chart) server: persistentVolume: enabled: true size: 50Gi retention: "30d" global: scrape_interval: 15s evaluation_interval: 15s alertmanager: enabled: true config: global: smtp_smarthost: 'smtp.gmail.com:587' smtp_from: 'alerts@myapp.com' smtp_auth_username: 'alerts@myapp.com' smtp_auth_password: 'password' route: receiver: 'email' group_by: ['alertname', 'cluster'] group_wait: 10s group_interval: 10s repeat_interval: 12h receivers: - name: 'email' email_configs: - to: 'team@myapp.com' send_resolved: true grafana: enabled: true adminPassword: 'admin' datasources: datasources.yaml: apiVersion: 1 datasources: - name: Prometheus type: prometheus url: http://prometheus-server access: proxy isDefault: true # Install Prometheus + Grafana helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm install prometheus prometheus-community/kube-prometheus-stack -f prometheus-values.yaml ``` ```python # APPLICATION METRICS (Python, Prometheus client) from prometheus_client import Counter, Histogram, Gauge, start_http_server import time # Define metrics request_count = Counter('app_requests_total', 'Total requests', ['method', 'endpoint', 'status']) request_duration = Histogram('app_request_duration_seconds', 'Request duration', ['method', 'endpoint']) active_users = Gauge('app_active_users', 'Active users') # Flask example from flask import Flask, request app = Flask(__name__) @app.before_request def before_request(): request.start_time = time.time() @app.after_request def after_request(response): duration = time.time() - request.start_time request_duration.labels(method=request.method, endpoint=request.endpoint).observe(duration) request_count.labels(method=request.method, endpoint=request.endpoint, status=response.status_code).inc() return response @app.route('/metrics') def metrics(): from prometheus_client import generate_latest return generate_latest() @app.route('/') def index(): active_users.set(get_active_user_count()) return "Hello World!" if __name__ == '__main__': start_http_server(8000) # Metrics endpoint on :8000/metrics app.run(host='0.0.0.0', port=5000) ``` --- ## Key Takeaways 1. **Automate everything** - CI/CD pipelines (no manual deployments, reduce human error) 2. **Immutable infrastructure** - Containers, IaC (don't modify running servers, replace) 3. **Monitor proactively** - Metrics, logs, alerts (know issues before users report) 4. **Security in pipelines** - Secrets management, vulnerability scanning, least privilege IAM 5. **Small, frequent deployments** - Less risk (easy to rollback, isolate issues) --- ## References - "The DevOps Handbook" - Gene Kim - "Kubernetes in Action" - Marko Lukša - "Terraform: Up & Running" - Yevgeniy Brikman **Related**: `kubernetes-production-best-practices.md`, `terraform-aws-infrastructure.md`, `ci-cd-pipeline-optimization.md`

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/seanshin0214/persona-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server