# DevOps, CI/CD & Infrastructure 2025
**Updated**: 2025-11-23 | **Stack**: Docker, Kubernetes, GitHub Actions, Terraform
---
## Docker Containerization
```dockerfile
# Multi-stage Dockerfile (Node.js app)
# Stage 1: Build
FROM node:20-alpine AS builder
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install dependencies
RUN npm ci --only=production
# Copy source code
COPY . .
# Build app (if using TypeScript)
RUN npm run build
# Stage 2: Production
FROM node:20-alpine
WORKDIR /app
# Copy only necessary files from builder
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./
# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
USER nodejs
EXPOSE 3000
CMD ["node", "dist/index.js"]
# Build: docker build -t myapp:1.0 .
# Run: docker run -p 3000:3000 myapp:1.0
---
# Docker Compose (Multi-container)
version: '3.8'
services:
# Web app
web:
build: .
ports:
- "3000:3000"
environment:
- DATABASE_URL=postgresql://postgres:password@db:5432/myapp
- REDIS_URL=redis://redis:6379
depends_on:
- db
- redis
restart: unless-stopped
# PostgreSQL database
db:
image: postgres:16-alpine
environment:
POSTGRES_DB: myapp
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
volumes:
- postgres_data:/var/lib/postgresql/data
ports:
- "5432:5432"
restart: unless-stopped
# Redis cache
redis:
image: redis:7-alpine
ports:
- "6379:6379"
restart: unless-stopped
volumes:
postgres_data:
# Run: docker-compose up -d
# Stop: docker-compose down
# Logs: docker-compose logs -f web
---
# Best Practices
MULTI-STAGE BUILDS:
- Separate build and runtime
- Smaller final image (exclude build tools)
- Example: 1.2GB → 150MB
.dockerignore:
node_modules
.git
.env
*.log
dist
coverage
SECURITY:
- Use official images (node:20-alpine)
- Non-root user (RUN adduser)
- Scan for vulnerabilities (docker scan myapp:1.0)
- Pin versions (node:20.5.0, not node:latest)
LAYERS:
- Order matters (least changing → most changing)
- Copy package.json first (cache dependencies)
- Copy source last (changes frequently)
```
---
## Kubernetes Orchestration
```yaml
# Deployment (app.yaml)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
replicas: 3 # 3 pods for high availability
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myrepo/myapp:1.0.0
ports:
- containerPort: 3000
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-secret
key: url
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
---
# Service (expose pods)
apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
selector:
app: myapp
ports:
- protocol: TCP
port: 80
targetPort: 3000
type: LoadBalancer # or ClusterIP, NodePort
---
# Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# Commands
# Apply: kubectl apply -f app.yaml
# Get pods: kubectl get pods
# Logs: kubectl logs -f <pod-name>
# Exec: kubectl exec -it <pod-name> -- /bin/sh
# Describe: kubectl describe pod <pod-name>
# Delete: kubectl delete -f app.yaml
---
# ConfigMap (non-sensitive config)
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
API_URL: "https://api.example.com"
LOG_LEVEL: "info"
---
# Secret (sensitive data)
apiVersion: v1
kind: Secret
metadata:
name: db-secret
type: Opaque
data:
url: cG9zdGdyZXNxbDovLy4uLg== # base64 encoded
# Create: kubectl create secret generic db-secret --from-literal=url='postgresql://...'
```
---
## CI/CD Pipeline
```yaml
# GitHub Actions (.github/workflows/deploy.yml)
name: CI/CD Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
# Job 1: Test
test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run linter
run: npm run lint
- name: Run tests
run: npm test
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: ./coverage/lcov.info
# Job 2: Build & Push Docker image
build:
needs: test
runs-on: ubuntu-latest
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
permissions:
contents: read
packages: write
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Log in to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=sha,prefix={{branch}}-
type=semver,pattern={{version}}
type=raw,value=latest,enable={{is_default_branch}}
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
# Job 3: Deploy to Kubernetes
deploy:
needs: build
runs-on: ubuntu-latest
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup kubectl
uses: azure/setup-kubectl@v3
- name: Configure Kubernetes
run: |
echo "${{ secrets.KUBE_CONFIG }}" | base64 -d > kubeconfig.yaml
export KUBECONFIG=kubeconfig.yaml
- name: Deploy to Kubernetes
run: |
kubectl set image deployment/myapp \
myapp=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:main-${{ github.sha }}
kubectl rollout status deployment/myapp
---
# GitLab CI (.gitlab-ci.yml)
stages:
- test
- build
- deploy
variables:
DOCKER_DRIVER: overlay2
DOCKER_TLS_CERTDIR: "/certs"
test:
stage: test
image: node:20
script:
- npm ci
- npm run lint
- npm test
coverage: '/All files[^|]*\|[^|]*\s+([\d\.]+)/'
artifacts:
reports:
coverage_report:
coverage_format: cobertura
path: coverage/cobertura-coverage.xml
build:
stage: build
image: docker:24
services:
- docker:24-dind
before_script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
script:
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA .
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
only:
- main
deploy:
stage: deploy
image: bitnami/kubectl:latest
script:
- kubectl config use-context $KUBE_CONTEXT
- kubectl set image deployment/myapp myapp=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
- kubectl rollout status deployment/myapp
only:
- main
when: manual # Require manual approval
```
---
## Infrastructure as Code (Terraform)
```hcl
# main.tf (AWS EKS cluster)
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "my-terraform-state"
key = "eks/terraform.tfstate"
region = "us-east-1"
}
}
provider "aws" {
region = var.aws_region
}
# VPC
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "${var.cluster_name}-vpc"
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = false
enable_dns_hostnames = true
tags = {
Environment = var.environment
ManagedBy = "Terraform"
}
}
# EKS Cluster
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 19.0"
cluster_name = var.cluster_name
cluster_version = "1.28"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
# Managed node groups
eks_managed_node_groups = {
general = {
min_size = 2
max_size = 10
desired_size = 3
instance_types = ["t3.medium"]
capacity_type = "ON_DEMAND"
labels = {
role = "general"
}
}
}
tags = {
Environment = var.environment
}
}
# RDS Database
resource "aws_db_instance" "postgres" {
identifier = "${var.cluster_name}-db"
engine = "postgres"
engine_version = "16.1"
instance_class = "db.t3.micro"
allocated_storage = 20
max_allocated_storage = 100
storage_encrypted = true
db_name = var.db_name
username = var.db_username
password = var.db_password
vpc_security_group_ids = [aws_security_group.db.id]
db_subnet_group_name = aws_db_subnet_group.db.name
backup_retention_period = 7
skip_final_snapshot = false
final_snapshot_identifier = "${var.cluster_name}-db-final"
tags = {
Environment = var.environment
}
}
# Variables (variables.tf)
variable "aws_region" {
default = "us-east-1"
}
variable "cluster_name" {
default = "my-eks-cluster"
}
variable "environment" {
default = "production"
}
# Outputs (outputs.tf)
output "cluster_endpoint" {
value = module.eks.cluster_endpoint
}
output "cluster_name" {
value = module.eks.cluster_id
}
output "db_endpoint" {
value = aws_db_instance.postgres.endpoint
}
# Commands:
# terraform init # Initialize
# terraform plan # Preview changes
# terraform apply # Apply changes
# terraform destroy # Destroy infrastructure
# terraform fmt # Format code
# terraform validate # Validate syntax
```
---
## Monitoring & Observability
```yaml
# Prometheus (metrics)
# prometheus-config.yaml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
---
# Grafana Dashboard (JSON)
{
"dashboard": {
"title": "Application Metrics",
"panels": [
{
"title": "Request Rate",
"targets": [
{
"expr": "rate(http_requests_total[5m])"
}
]
},
{
"title": "Error Rate",
"targets": [
{
"expr": "rate(http_requests_total{status=~\"5..\"}[5m])"
}
]
},
{
"title": "Latency (p95)",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))"
}
]
}
]
}
}
---
# Application Instrumentation (Node.js)
const express = require('express');
const client = require('prom-client');
const app = express();
// Create metrics
const httpRequestsTotal = new client.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status']
});
const httpRequestDuration = new client.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status']
});
// Middleware to collect metrics
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
httpRequestsTotal.inc({
method: req.method,
route: req.route?.path || req.path,
status: res.statusCode
});
httpRequestDuration.observe({
method: req.method,
route: req.route?.path || req.path,
status: res.statusCode
}, duration);
});
next();
});
// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', client.register.contentType);
res.end(await client.register.metrics());
});
app.listen(3000);
```
---
## Key Takeaways
1. **Immutable infrastructure** - Treat servers as cattle, not pets (replace, don't patch)
2. **Everything as code** - Infrastructure, config, pipelines (version control all)
3. **Automate everything** - Manual = error-prone (CI/CD, auto-scaling)
4. **Observability** - You can't fix what you can't see (metrics, logs, traces)
5. **Security first** - Secrets management, least privilege, vulnerability scanning
---
## References
- "The Phoenix Project" - Gene Kim
- "Site Reliability Engineering" - Google
- Kubernetes Documentation
**Related**: `kubernetes-advanced.md`, `terraform-best-practices.md`, `monitoring-observability.md`