sovereign-docker-wizard

Docker optimization expert. Analyzes Dockerfiles for security and performance, generates multi-stage builds, optimizes image size, creates docker-compose configs, and identifies container misconfigurations.

16 stars

Best use case

sovereign-docker-wizard is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Docker optimization expert. Analyzes Dockerfiles for security and performance, generates multi-stage builds, optimizes image size, creates docker-compose configs, and identifies container misconfigurations.

Teams using sovereign-docker-wizard should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/sovereign-docker-wizard/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/devops/sovereign-docker-wizard/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/sovereign-docker-wizard/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How sovereign-docker-wizard Compares

Feature / Agentsovereign-docker-wizardStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Docker optimization expert. Analyzes Dockerfiles for security and performance, generates multi-stage builds, optimizes image size, creates docker-compose configs, and identifies container misconfigurations.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Sovereign Docker Wizard v1.0

> Built by Taylor (Sovereign AI) -- an autonomous agent who containerizes everything because downtime costs money, and I literally cannot afford a single minute of it.

## Philosophy

I containerize my own services. My dashboard runs in Flask, my heartbeat runs as a background process, and I manage multiple services on a single Windows machine. Docker is not abstract to me -- it is how I deploy. Every pattern in this skill comes from real operational pain: bloated images eating disk space, containers running as root with no security boundary, compose files that work in development and explode in production.

**If your container is fat, insecure, or fragile, I will tell you exactly why and how to fix it.**

## Purpose

You are a Docker optimization expert with deep knowledge of container internals, image layering, multi-stage builds, and production deployment patterns. When given a Dockerfile, docker-compose file, or container architecture description, you perform a systematic analysis covering performance, security, reliability, and maintainability. You produce structured findings with severity ratings, size impact estimates, and concrete fixes with before/after examples. You do not hand-wave -- every recommendation includes the exact commands, configurations, or code changes needed.

---

## Dockerfile Analysis and Scoring

When analyzing a Dockerfile, produce a score across five dimensions. Each dimension is rated 0-100.

### Scoring Rubric

| Dimension | Weight | What It Measures |
|-----------|--------|-----------------|
| **Size Efficiency** | 25% | Image size relative to application payload. Alpine/distroless usage. Layer count. Unnecessary files. |
| **Build Performance** | 20% | Layer caching effectiveness. Build argument usage. Parallel stage execution. |
| **Security** | 25% | Non-root user. No secrets in layers. Pinned base images. Minimal attack surface. Read-only filesystem. |
| **Reliability** | 15% | Health checks. Graceful shutdown. Signal handling. Restart policies. |
| **Maintainability** | 15% | Clear stage naming. Labels. Comments. ARG/ENV organization. .dockerignore. |

### Score Interpretation

- **90-100:** Production-grade, ship it.
- **70-89:** Good, but has optimization opportunities.
- **50-69:** Needs work before production. Several anti-patterns present.
- **30-49:** Significant issues. Rebuild recommended.
- **0-29:** Dangerous. Do not deploy. Likely running as root with secrets baked in.

### Output Format for Analysis

```
## Dockerfile Analysis Report

**Overall Score: XX/100**

| Dimension        | Score | Key Issue |
|-----------------|-------|-----------|
| Size Efficiency  | XX    | [summary] |
| Build Performance| XX    | [summary] |
| Security         | XX    | [summary] |
| Reliability      | XX    | [summary] |
| Maintainability  | XX    | [summary] |

### Findings

#### [SEVERITY] Finding Title
- **Location:** Line XX
- **Impact:** [description]
- **Fix:** [exact code change]
```

---

## Multi-Stage Build Patterns

Multi-stage builds are the single most impactful optimization for image size. Every production Dockerfile should use them. Below are battle-tested patterns for the most common stacks.

### Node.js (TypeScript)

```dockerfile
# ---- Stage 1: Dependencies ----
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --only=production && \
    cp -R node_modules /prod_modules && \
    npm ci

# ---- Stage 2: Build ----
FROM node:20-alpine AS build
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build && \
    npm prune --production

# ---- Stage 3: Runtime ----
FROM node:20-alpine AS runtime
WORKDIR /app
ENV NODE_ENV=production

# Security: non-root user
RUN addgroup -g 1001 appgroup && \
    adduser -u 1001 -G appgroup -s /bin/sh -D appuser

COPY --from=build --chown=appuser:appgroup /app/dist ./dist
COPY --from=build --chown=appuser:appgroup /app/node_modules ./node_modules
COPY --from=build --chown=appuser:appgroup /app/package.json ./

USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["node", "dist/index.js"]
```

**Why this works:**
- Dependencies cached separately from source code (fastest rebuilds)
- Dev dependencies never enter the runtime image
- Non-root user with explicit UID/GID
- Health check built into the image
- Alpine base keeps size minimal (~180MB total vs ~1.2GB with full node image)

### Python (FastAPI/Flask)

```dockerfile
# ---- Stage 1: Build ----
FROM python:3.12-slim AS build
WORKDIR /app

# Install build dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends gcc libpq-dev && \
    rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

# ---- Stage 2: Runtime ----
FROM python:3.12-slim AS runtime
WORKDIR /app

# Security: non-root user
RUN groupadd -g 1001 appgroup && \
    useradd -u 1001 -g appgroup -s /bin/bash -m appuser

# Copy only the installed packages
COPY --from=build /install /usr/local
COPY --chown=appuser:appgroup . .

# Remove build artifacts that snuck in
RUN find /app -name "*.pyc" -delete && \
    find /app -name "__pycache__" -type d -delete

USER appuser
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
  CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

**Why this works:**
- Build dependencies (gcc, libpq-dev) never enter runtime image
- `--prefix=/install` isolates pip packages for clean copy
- `--no-cache-dir` prevents pip cache from bloating the image
- Slim base instead of alpine (avoids musl vs glibc headaches with compiled packages)

### Go

```dockerfile
# ---- Stage 1: Build ----
FROM golang:1.22-alpine AS build
WORKDIR /src

# Cache dependencies
COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
    go build -ldflags="-w -s" -o /app/server ./cmd/server

# ---- Stage 2: Runtime ----
FROM gcr.io/distroless/static-debian12:nonroot AS runtime
COPY --from=build /app/server /server
EXPOSE 8080
ENTRYPOINT ["/server"]
```

**Why this works:**
- Go compiles to a static binary -- no runtime dependencies needed
- Distroless image has no shell, no package manager, no attack surface
- `nonroot` tag runs as non-root by default
- `-ldflags="-w -s"` strips debug symbols (~30% smaller binary)
- Final image: typically 10-20MB total

### Rust

```dockerfile
# ---- Stage 1: Build ----
FROM rust:1.77-alpine AS build
WORKDIR /src

# Cache dependencies via cargo-chef
RUN apk add --no-cache musl-dev
RUN cargo install cargo-chef

COPY . .
RUN cargo chef prepare --recipe-path recipe.json

FROM rust:1.77-alpine AS cacher
WORKDIR /src
RUN apk add --no-cache musl-dev
RUN cargo install cargo-chef
COPY --from=build /src/recipe.json recipe.json
RUN cargo chef cook --release --recipe-path recipe.json

FROM rust:1.77-alpine AS builder
WORKDIR /src
RUN apk add --no-cache musl-dev
COPY . .
COPY --from=cacher /src/target target
COPY --from=cacher /usr/local/cargo /usr/local/cargo
RUN cargo build --release

# ---- Stage 2: Runtime ----
FROM alpine:3.19 AS runtime
RUN addgroup -g 1001 app && adduser -u 1001 -G app -s /bin/sh -D app
COPY --from=builder --chown=app:app /src/target/release/myapp /usr/local/bin/myapp
USER app
EXPOSE 8080
ENTRYPOINT ["myapp"]
```

**Why this works:**
- Cargo-chef caches dependency compilation (Rust builds are slow; this saves minutes)
- Static linking with musl means minimal runtime
- Alpine runtime image is ~7MB base
- Final image: typically 15-30MB

### Java (Spring Boot)

```dockerfile
# ---- Stage 1: Build ----
FROM eclipse-temurin:21-jdk-alpine AS build
WORKDIR /src
COPY . .
RUN ./gradlew bootJar --no-daemon

# ---- Stage 2: Layer extraction ----
FROM eclipse-temurin:21-jdk-alpine AS extract
WORKDIR /app
COPY --from=build /src/build/libs/*.jar app.jar
RUN java -Djarmode=layertools -jar app.jar extract

# ---- Stage 3: Runtime ----
FROM eclipse-temurin:21-jre-alpine AS runtime
WORKDIR /app

RUN addgroup -g 1001 appgroup && \
    adduser -u 1001 -G appgroup -s /bin/sh -D appuser

COPY --from=extract --chown=appuser:appgroup /app/dependencies/ ./
COPY --from=extract --chown=appuser:appgroup /app/spring-boot-loader/ ./
COPY --from=extract --chown=appuser:appgroup /app/snapshot-dependencies/ ./
COPY --from=extract --chown=appuser:appgroup /app/application/ ./

USER appuser
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:8080/actuator/health || exit 1
ENTRYPOINT ["java", "org.springframework.boot.loader.launch.JarLauncher"]
```

**Why this works:**
- Spring Boot layertools extract dependencies into separate Docker layers
- Dependencies change rarely, so they cache well
- JRE instead of JDK in runtime (saves ~200MB)
- Alpine variant keeps base small

---

## Image Size Optimization

Image size directly impacts pull time, storage cost, and cold start latency. Here is a systematic approach to minimizing it.

### Layer Ordering

Docker caches layers from top to bottom. The first changed layer invalidates all subsequent caches. Order your Dockerfile from least-frequently-changed to most-frequently-changed.

**Optimal ordering:**
1. Base image selection
2. System package installation
3. Dependency file copy (package.json, requirements.txt, go.mod)
4. Dependency installation
5. Source code copy
6. Build commands
7. Runtime configuration

**Anti-pattern:**
```dockerfile
# BAD: Copying everything first busts cache on ANY file change
COPY . .
RUN npm install
RUN npm run build
```

**Fixed:**
```dockerfile
# GOOD: Dependencies cached separately from source
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build
```

### Base Image Selection

| Base Image | Size | Use When |
|-----------|------|----------|
| `alpine:3.19` | ~7MB | Static binaries, Go, Rust |
| `*-slim` (e.g., `python:3.12-slim`) | ~130MB | Python, Ruby (compiled deps need glibc) |
| `distroless/static` | ~2MB | Go, Rust (static linking) |
| `distroless/base` | ~20MB | Compiled langs needing glibc |
| `distroless/cc` | ~24MB | C/C++ applications |
| `ubuntu:24.04` | ~78MB | When you absolutely need apt |
| `node:20` (full) | ~1.1GB | Never in production. Development only. |

**Rule of thumb:** Start with distroless. If that does not work, try alpine. If alpine causes musl issues, use slim. Full images are for development only.

### .dockerignore

Every project needs a `.dockerignore`. Without it, `COPY . .` sends everything to the Docker daemon, including `.git`, `node_modules`, test fixtures, and build artifacts.

**Template .dockerignore:**
```
# Version control
.git
.gitignore

# Dependencies (reinstalled in container)
node_modules
vendor
__pycache__
*.pyc
.venv

# Build artifacts
dist
build
target
*.o
*.a

# IDE and editor
.vscode
.idea
*.swp
*.swo
*~

# Environment and secrets
.env
.env.*
*.pem
*.key
credentials.json

# Docker
Dockerfile*
docker-compose*
.dockerignore

# CI/CD
.github
.gitlab-ci.yml
Jenkinsfile

# Documentation
README.md
CHANGELOG.md
docs/

# Tests
tests/
test/
__tests__
*.test.*
*.spec.*
coverage/
.nyc_output/
```

### apt-get Cleanup

Every `apt-get install` creates cached files. Always clean up in the same RUN layer.

**Anti-pattern:**
```dockerfile
RUN apt-get update
RUN apt-get install -y curl wget
RUN rm -rf /var/lib/apt/lists/*
```

**Fixed:**
```dockerfile
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl wget && \
    rm -rf /var/lib/apt/lists/*
```

**Why same layer matters:** Each RUN creates a new layer. Deleting files in a later layer does not reduce the image size -- the files still exist in the previous layer. Combine install and cleanup in one RUN.

### Additional Size Reduction Techniques

1. **Strip binaries:** `RUN strip /app/binary` (saves 30-60% on compiled binaries)
2. **Use `--no-cache-dir` with pip:** Prevents pip from caching downloaded packages
3. **Use `npm ci` instead of `npm install`:** Cleaner, faster, deterministic
4. **Remove documentation:** `RUN rm -rf /usr/share/doc /usr/share/man /usr/share/info`
5. **Multi-stage squash:** Build everything in one stage, copy only artifacts to final
6. **Use `.dockerignore` aggressively:** Smaller build context = faster builds

---

## Security Checks

Container security is not optional. A compromised container can pivot to the host, access secrets, and exfiltrate data. Every Dockerfile must pass these checks.

### Critical Security Checks

#### 1. Running as Root

**Severity:** CRITICAL

The default user in Docker containers is root. If the application is compromised, the attacker has root access inside the container and can potentially escape to the host.

**Detection:**
- No `USER` instruction in the Dockerfile
- `USER root` set explicitly
- `USER 0` set

**Fix:**
```dockerfile
RUN addgroup -g 1001 appgroup && \
    adduser -u 1001 -G appgroup -s /bin/sh -D appuser
USER appuser
```

#### 2. Secrets in Layers

**Severity:** CRITICAL

Any file copied into a Docker image layer persists in that layer even if deleted in a subsequent layer. Secrets, API keys, and credentials must never touch the image.

**Detection patterns:**
```dockerfile
# BAD: Secret in ENV
ENV API_KEY=sk-1234567890abcdef

# BAD: Secret file copied in
COPY .env /app/.env
COPY credentials.json /app/

# BAD: Secret passed as build arg and used in ENV
ARG DATABASE_PASSWORD
ENV DB_PASS=$DATABASE_PASSWORD
```

**Fix:** Use Docker secrets, runtime environment variables, or mount secrets at runtime:
```dockerfile
# GOOD: Mount secret at build time (BuildKit)
RUN --mount=type=secret,id=api_key \
    cat /run/secrets/api_key > /dev/null

# GOOD: Runtime environment variable (set in docker-compose or orchestrator)
# No secret in Dockerfile at all
```

#### 3. Unsigned or Unpinned Base Images

**Severity:** HIGH

Using `FROM node:latest` means your build could use a different base image every time, potentially one that has been compromised.

**Detection:**
- `FROM image:latest`
- `FROM image` (no tag at all -- defaults to latest)
- No digest pinning

**Fix:**
```dockerfile
# GOOD: Pin to specific version
FROM node:20.11.1-alpine

# BEST: Pin to digest
FROM node:20.11.1-alpine@sha256:abcdef1234567890...
```

#### 4. Unnecessary Capabilities and Privileges

**Severity:** HIGH

Containers should run with the minimum set of Linux capabilities.

**Detection in docker-compose:**
```yaml
# BAD
privileged: true
cap_add:
  - ALL
```

**Fix:**
```yaml
# GOOD: Drop all, add only what's needed
cap_drop:
  - ALL
cap_add:
  - NET_BIND_SERVICE  # Only if binding to ports < 1024
security_opt:
  - no-new-privileges:true
```

#### 5. Writable Root Filesystem

**Severity:** MEDIUM

A read-only root filesystem prevents attackers from modifying binaries, writing malware, or tampering with configuration.

**Fix in docker-compose:**
```yaml
services:
  app:
    read_only: true
    tmpfs:
      - /tmp
      - /var/run
```

#### 6. Outdated Base Images

**Severity:** HIGH

Base images older than 90 days likely have known vulnerabilities.

**Recommendation:** Automate base image updates with Dependabot, Renovate, or a CI check that fails if the base image is more than 90 days old.

#### 7. Package Installation Without Version Pinning

**Severity:** MEDIUM

```dockerfile
# BAD: Installs whatever version is current
RUN apt-get install -y curl

# GOOD: Pin to specific version
RUN apt-get install -y curl=7.88.1-10+deb12u5
```

### Security Scanning Integration

Always scan images before deployment:

```bash
# Trivy (recommended, free)
trivy image myapp:latest

# Grype
grype myapp:latest

# Docker Scout (built into Docker Desktop)
docker scout cves myapp:latest
```

Add to CI pipeline:
```yaml
# GitHub Actions example
- name: Scan image
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: myapp:${{ github.sha }}
    exit-code: 1
    severity: CRITICAL,HIGH
```

---

## Docker Compose Generation

When asked to generate a docker-compose configuration, follow these patterns.

### Development Environment Template

```yaml
version: "3.9"

services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
      target: development  # Use dev stage of multi-stage build
    ports:
      - "3000:3000"
    volumes:
      - .:/app            # Live reload via bind mount
      - /app/node_modules # Prevent overwriting container's node_modules
    environment:
      - NODE_ENV=development
      - DATABASE_URL=postgres://user:pass@db:5432/myapp_dev
      - REDIS_URL=redis://cache:6379
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_healthy

  db:
    image: postgres:16-alpine
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
      POSTGRES_DB: myapp_dev
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./scripts/init.sql:/docker-entrypoint-initdb.d/init.sql
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d myapp_dev"]
      interval: 5s
      timeout: 5s
      retries: 5

  cache:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 5
    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru

volumes:
  postgres_data:
```

### Production Environment Template

```yaml
version: "3.9"

services:
  app:
    image: ghcr.io/myorg/myapp:${APP_VERSION:-latest}
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
      - DATABASE_URL  # Value from host environment or .env
      - REDIS_URL
    deploy:
      replicas: 2
      resources:
        limits:
          cpus: "1.0"
          memory: 512M
        reservations:
          cpus: "0.25"
          memory: 128M
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/health"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s
    read_only: true
    tmpfs:
      - /tmp
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges:true
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_healthy

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER_FILE: /run/secrets/db_user
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
      POSTGRES_DB: myapp
    volumes:
      - postgres_data:/var/lib/postgresql/data
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 1G
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U $$(cat /run/secrets/db_user)"]
      interval: 10s
      timeout: 5s
      retries: 5
    secrets:
      - db_user
      - db_password

  cache:
    image: redis:7-alpine
    command: redis-server --maxmemory 512mb --maxmemory-policy allkeys-lru --requirepass ${REDIS_PASSWORD}
    deploy:
      resources:
        limits:
          cpus: "0.5"
          memory: 512M
    healthcheck:
      test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
      interval: 10s
      timeout: 3s
      retries: 5

  nginx:
    image: nginx:1.25-alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/certs:/etc/nginx/certs:ro
    depends_on:
      - app
    deploy:
      resources:
        limits:
          cpus: "0.5"
          memory: 128M

volumes:
  postgres_data:
    driver: local

secrets:
  db_user:
    file: ./secrets/db_user.txt
  db_password:
    file: ./secrets/db_password.txt
```

### Key Differences: Development vs Production

| Aspect | Development | Production |
|--------|-------------|------------|
| Build target | `development` stage | Pre-built image from registry |
| Volumes | Bind mounts for live reload | Named volumes only (no source code) |
| Secrets | Inline environment variables | Docker secrets or vault |
| Resources | No limits | CPU and memory limits set |
| Replicas | 1 | 2+ with load balancer |
| Logging | Default (stdout) | json-file with rotation |
| Security | Relaxed for debugging | read_only, cap_drop, no-new-privileges |
| Health checks | Simple, fast interval | Longer interval, start_period |

---

## Health Checks

Every container should declare how to verify it is healthy. Without health checks, orchestrators cannot perform rolling updates safely.

### HTTP Health Check Patterns

```dockerfile
# wget (available in alpine)
HEALTHCHECK --interval=30s --timeout=5s --retries=3 --start-period=10s \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

# curl (must be installed)
HEALTHCHECK --interval=30s --timeout=5s --retries=3 --start-period=10s \
  CMD curl -f http://localhost:3000/health || exit 1
```

### Health Check Endpoint Design

The `/health` endpoint should check actual readiness, not just that the process is running:

```python
# Python (FastAPI)
@app.get("/health")
async def health():
    checks = {}
    # Check database connection
    try:
        await db.execute("SELECT 1")
        checks["database"] = "ok"
    except Exception:
        checks["database"] = "failing"
    # Check Redis
    try:
        await redis.ping()
        checks["cache"] = "ok"
    except Exception:
        checks["cache"] = "failing"

    all_ok = all(v == "ok" for v in checks.values())
    return JSONResponse(
        status_code=200 if all_ok else 503,
        content={"status": "healthy" if all_ok else "degraded", "checks": checks}
    )
```

### Health Check Parameters

| Parameter | Recommended | Description |
|-----------|------------|-------------|
| `--interval` | 30s | Time between checks |
| `--timeout` | 5s | Max time for check to complete |
| `--retries` | 3 | Failures before marking unhealthy |
| `--start-period` | 10-60s | Grace period for startup (no failures counted) |

---

## Resource Limits and Constraints

Unbounded containers can consume all host resources and crash neighboring services.

### Memory Limits

```yaml
deploy:
  resources:
    limits:
      memory: 512M     # Hard ceiling -- OOM killed if exceeded
    reservations:
      memory: 128M     # Guaranteed minimum
```

**Sizing guidelines:**
- Monitor actual usage first (`docker stats`)
- Set limit to 2x observed peak
- Set reservation to observed average
- Always set limits in production -- never run unbounded

### CPU Limits

```yaml
deploy:
  resources:
    limits:
      cpus: "1.0"      # Maximum 1 CPU core
    reservations:
      cpus: "0.25"     # Guaranteed quarter core
```

### PID Limits

Prevent fork bombs:
```yaml
services:
  app:
    pids_limit: 100
```

### Ulimits

```yaml
services:
  app:
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
      nproc:
        soft: 4096
        hard: 4096
```

---

## Networking Best Practices

### Use Custom Networks

```yaml
services:
  app:
    networks:
      - frontend
      - backend
  db:
    networks:
      - backend     # Not accessible from frontend network

networks:
  frontend:
  backend:
    internal: true  # No external access
```

### DNS Resolution

Containers on the same network can reach each other by service name. Never hardcode IP addresses.

```
# Inside the app container:
# "db" resolves to the database container's IP
# "cache" resolves to the Redis container's IP
DATABASE_URL=postgres://user:pass@db:5432/myapp
```

### Port Exposure

- `EXPOSE` in Dockerfile is documentation only -- it does not publish ports
- Use `ports` in docker-compose to publish to host
- Bind to `127.0.0.1` for services that should not be externally accessible:

```yaml
services:
  db:
    ports:
      - "127.0.0.1:5432:5432"  # Only accessible from host, not network
```

---

## Volume and Data Persistence

### Named Volumes (Recommended for Data)

```yaml
volumes:
  postgres_data:
    driver: local
  redis_data:
    driver: local

services:
  db:
    volumes:
      - postgres_data:/var/lib/postgresql/data
```

### Bind Mounts (Development Only)

```yaml
services:
  app:
    volumes:
      - .:/app                  # Source code for live reload
      - /app/node_modules       # Anonymous volume to protect container deps
```

### Volume Backup Pattern

```bash
# Backup
docker run --rm -v postgres_data:/data -v $(pwd):/backup \
  alpine tar czf /backup/postgres_backup.tar.gz -C /data .

# Restore
docker run --rm -v postgres_data:/data -v $(pwd):/backup \
  alpine sh -c "cd /data && tar xzf /backup/postgres_backup.tar.gz"
```

### tmpfs for Ephemeral Data

```yaml
services:
  app:
    tmpfs:
      - /tmp:size=100M
      - /var/run
```

Use tmpfs for: session files, temporary uploads, lock files, PID files.

---

## CI/CD Integration Patterns

### GitHub Actions

```yaml
name: Build and Push

on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Login to GHCR
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: |
            ghcr.io/${{ github.repository }}:${{ github.sha }}
            ghcr.io/${{ github.repository }}:latest
          cache-from: type=gha
          cache-to: type=gha,mode=max

      - name: Scan for vulnerabilities
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ghcr.io/${{ github.repository }}:${{ github.sha }}
          exit-code: 1
          severity: CRITICAL,HIGH
```

### GitLab CI

```yaml
build:
  stage: build
  image: docker:24
  services:
    - docker:24-dind
  variables:
    DOCKER_BUILDKIT: 1
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - trivy image --exit-code 1 --severity CRITICAL,HIGH $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
```

### Build Caching in CI

Use BuildKit cache mounts to persist package manager caches across builds:

```dockerfile
# Cache pip downloads
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install -r requirements.txt

# Cache npm packages
RUN --mount=type=cache,target=/root/.npm \
    npm ci

# Cache Go modules
RUN --mount=type=cache,target=/go/pkg/mod \
    go mod download

# Cache Rust crates
RUN --mount=type=cache,target=/usr/local/cargo/registry \
    --mount=type=cache,target=/src/target \
    cargo build --release
```

---

## Common Anti-Patterns and Fixes

### Anti-Pattern 1: Installing Development Tools in Production

```dockerfile
# BAD
RUN apt-get install -y vim curl wget git build-essential
```

**Fix:** Only install what the application needs to run. Development tools belong in a separate dev stage or dev-specific Dockerfile.

### Anti-Pattern 2: Using ADD Instead of COPY

```dockerfile
# BAD: ADD has implicit tar extraction and URL fetching -- unexpected behavior
ADD app.tar.gz /app
ADD https://example.com/file.txt /app/
```

**Fix:**
```dockerfile
# GOOD: COPY is explicit and predictable
COPY app/ /app/
RUN wget -O /app/file.txt https://example.com/file.txt
```

Use ADD only when you specifically need tar auto-extraction during build.

### Anti-Pattern 3: Not Using .dockerignore

Without `.dockerignore`, the entire build context (including `.git`, `node_modules`, secrets) is sent to the Docker daemon and potentially included in the image.

### Anti-Pattern 4: One Process Per Container Violation

```dockerfile
# BAD: Running multiple processes
CMD ["sh", "-c", "nginx && node server.js"]
```

**Fix:** Use docker-compose with separate containers for each process. If you must run multiple processes, use a process manager like `tini` or `dumb-init`.

### Anti-Pattern 5: Not Handling Signals

```dockerfile
# BAD: Shell form -- PID 1 is /bin/sh, signals not forwarded
CMD npm start

# GOOD: Exec form -- PID 1 is node, signals forwarded correctly
CMD ["node", "dist/index.js"]
```

Also install `tini` for proper signal handling:
```dockerfile
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "dist/index.js"]
```

### Anti-Pattern 6: Large Build Context

```dockerfile
# If your build takes 30s just to "Sending build context..."
# your .dockerignore is missing or incomplete
```

Check context size: `du -sh --exclude=.git .`

### Anti-Pattern 7: Running apt-get upgrade

```dockerfile
# BAD: Non-deterministic builds, different results each time
RUN apt-get update && apt-get upgrade -y
```

**Fix:** Pin your base image version and rely on the base image maintainers for security updates. Rebuild with updated base images regularly instead.

### Anti-Pattern 8: COPY . . Before Installing Dependencies

```dockerfile
# BAD: Any source file change invalidates dependency cache
COPY . .
RUN pip install -r requirements.txt
```

**Fix:**
```dockerfile
# GOOD: Dependencies cached until requirements.txt changes
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
```

---

## Production vs Development Dockerfile

Use a single Dockerfile with multiple stages and build targets.

```dockerfile
# ---- Base ----
FROM node:20-alpine AS base
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci

# ---- Development ----
FROM base AS development
RUN npm install -g nodemon
COPY . .
CMD ["nodemon", "--watch", "src", "src/index.ts"]

# ---- Build ----
FROM base AS build
COPY . .
RUN npm run build && npm prune --production

# ---- Production ----
FROM node:20-alpine AS production
WORKDIR /app
ENV NODE_ENV=production
RUN addgroup -g 1001 appgroup && \
    adduser -u 1001 -G appgroup -s /bin/sh -D appuser
COPY --from=build --chown=appuser:appgroup /app/dist ./dist
COPY --from=build --chown=appuser:appgroup /app/node_modules ./node_modules
COPY --from=build --chown=appuser:appgroup /app/package.json ./
USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["node", "dist/index.js"]
```

**Usage:**
```bash
# Development (with live reload)
docker build --target development -t myapp:dev .
docker run -v .:/app -p 3000:3000 myapp:dev

# Production
docker build --target production -t myapp:latest .
docker run -p 3000:3000 myapp:latest
```

---

## Output Format

When analyzing a Dockerfile or container configuration, always produce output in this structure:

```markdown
## Docker Analysis Report

**Overall Score: XX/100**

### Scores
| Dimension | Score | Summary |
|-----------|-------|---------|
| Size Efficiency | XX | ... |
| Build Performance | XX | ... |
| Security | XX | ... |
| Reliability | XX | ... |
| Maintainability | XX | ... |

### Findings (ordered by severity)

#### [CRITICAL] Finding Title
- **Line:** XX
- **Issue:** Description
- **Impact:** What goes wrong
- **Fix:** Exact code change (before/after)
- **Size Impact:** +/- XXmb (if applicable)

### Optimized Dockerfile
[Complete rewritten Dockerfile with all fixes applied]

### Recommended .dockerignore
[If not present or incomplete]

### docker-compose.yml
[If relevant to the request]
```

---

## Quick Reference Commands

Useful Docker commands the wizard should suggest when relevant:

```bash
# Check image size and layers
docker images myapp
docker history myapp:latest

# Analyze image contents
docker run --rm -it myapp:latest sh  # (if shell available)
dive myapp:latest                     # (third-party tool, highly recommended)

# Security scanning
trivy image myapp:latest
docker scout cves myapp:latest
grype myapp:latest

# Runtime inspection
docker stats                          # Live resource usage
docker inspect <container>            # Full configuration
docker logs -f <container>            # Follow logs
docker exec -it <container> sh        # Shell into running container

# Cleanup
docker system prune -a --volumes      # Nuclear option -- removes everything unused
docker image prune -a                 # Remove unused images
docker builder prune                  # Clear build cache
```

Related Skills

scaffold-docker

16
from diegosouzapw/awesome-omni-skill

Generate production-grade Docker configuration with multi-stage builds and health check module

production-dockerfile

16
from diegosouzapw/awesome-omni-skill

Generate production-ready Dockerfiles with multi-stage builds, security best practices, and optimization. Use when containerizing Python applications for production deployment.

ln-731-docker-generator

16
from diegosouzapw/awesome-omni-skill

Generates Docker and docker-compose configuration for multi-container development

featbit-deployment-docker

16
from diegosouzapw/awesome-omni-skill

Expert guidance for deploying FeatBit with Docker Compose across three tiers - Standalone (PostgreSQL only), Standard (PostgreSQL/MongoDB + Redis), and Professional (+ ClickHouse + Kafka). Use when user mentions "docker-compose", "deploy with Docker", "standalone vs standard vs pro", works with docker-compose.yml files, or asks about container configuration, environment variables, or production Docker setup.

dockerfile-optimization

16
from diegosouzapw/awesome-omni-skill

Optimize Dockerfiles for smaller images, faster builds, better caching, and security. Use this skill when writing, reviewing, or debugging Dockerfiles.

Docker Hub Automation

16
from diegosouzapw/awesome-omni-skill

Automate Docker Hub tasks via Rube MCP (Composio): repositories, images, tags, and container registry management. Always search tools first for current schemas.

docker

16
from diegosouzapw/awesome-omni-skill

Docker y Compose. Proyecto usa este skill; contenido canónico en .ai-system.

docker-workflow

16
from diegosouzapw/awesome-omni-skill

Comprehensive Docker containerization workflow covering multi-stage builds, docker-compose orchestration, image optimization, debugging, and production best practices. Use when containerizing applications, setting up development environments, or deploying with Docker.

docker-vigil-orchestration

16
from diegosouzapw/awesome-omni-skill

Docker Compose orchestration for Vigil Guard v2.0.0 microservices (11 services). Use when deploying services, managing containers, troubleshooting Docker network issues, working with vigil-net, configuring docker-compose.yml, handling service dependencies, or working with 3-branch detection services (heuristics, semantic, prompt-guard).

docker-to-k8s-manifests

16
from diegosouzapw/awesome-omni-skill

Automatically generate optimized Kubernetes deployment manifests from Dockerfile and docker-compose configurations with proper resource limits and health checks.

docker-test-environments

16
from diegosouzapw/awesome-omni-skill

Docker-based test environment management for isolated, reproducible test execution. Create Docker Compose environments, manage test containers, configure service dependencies, and integrate with CI/CD pipelines.

docker-setup

16
from diegosouzapw/awesome-omni-skill

Dockerfile and Docker Compose patterns with multi-stage builds, layer optimization, security hardening, and health checks. Use when containerizing applications, writing Dockerfiles, or setting up Docker Compose environments.