production-readiness-checklist

Comprehensive production readiness verification, code quality gates, deployment checks, and production standards compliance for platform-go

16 stars

Best use case

production-readiness-checklist is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Comprehensive production readiness verification, code quality gates, deployment checks, and production standards compliance for platform-go

Teams using production-readiness-checklist should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/production-readiness-checklist/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/backend/production-readiness-checklist/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/production-readiness-checklist/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How production-readiness-checklist Compares

Feature / Agentproduction-readiness-checklistStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Comprehensive production readiness verification, code quality gates, deployment checks, and production standards compliance for platform-go

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Production Readiness Checklist

This skill provides comprehensive checklists to ensure all code meets production-grade standards before deployment.

## When to Use

Apply this skill when:
- Preparing code for production deployment
- Conducting final code review before merge
- Verifying system readiness for release
- Implementing quality gates in CI/CD
- Auditing existing production code
- Planning major feature releases
- Setting up new environments
- Establishing deployment procedures

## Pre-Commit Checklist (Code Level)

### Code Quality (10 items)

- [ ] Code follows golang-production-standards skill
- [ ] All functions have clear documentation comments
- [ ] No hardcoded values (use constants or config)
- [ ] No print statements (use structured logging)
- [ ] No commented-out code (delete or explain)
- [ ] Variable names are meaningful (no single letters except i, j)
- [ ] Function names describe exact behavior
- [ ] No TODO/FIXME comments without issue reference
- [ ] Imports are organized (stdlib, third-party, internal)
- [ ] File does not exceed 200 lines (except special cases)

### Error Handling (8 items)

- [ ] All errors are wrapped with context (fmt.Errorf %w)
- [ ] No ignored errors (no blank _ assignment)
- [ ] Custom error types defined for domain errors
- [ ] Error messages are user-friendly
- [ ] No error information leaks (no secrets in messages)
- [ ] Panic only in main, never in libraries
- [ ] Error recovery implemented where needed
- [ ] Goroutine errors are properly handled

### Testing Coverage (7 items)

- [ ] Unit tests exist for all public functions
- [ ] Test coverage >= 70%
- [ ] Edge cases and error scenarios tested
- [ ] Tests use table-driven pattern where applicable
- [ ] Mocking used appropriately (not over-mocked)
- [ ] Concurrent code tested with race detector
- [ ] Tests pass locally with `go test -race -cover ./...`

### Security (8 items)

- [ ] No hardcoded secrets or credentials
- [ ] Passwords hashed with bcrypt (cost >= 12)
- [ ] All inputs validated at API boundary
- [ ] SQL injection prevented (parameterized queries)
- [ ] Path traversal prevented (validated file paths)
- [ ] No sensitive data logged (passwords, tokens, PII)
- [ ] TLS used for all external communications
- [ ] Authentication/authorization implemented

## Pre-Review Checklist (Integration Level)

### API Design (6 items)

- [ ] RESTful endpoints follow naming conventions
- [ ] Request/response DTOs used (not domain models)
- [ ] Error responses standardized
- [ ] Pagination implemented for large result sets
- [ ] API versioning clear (v1, v2, etc.)
- [ ] Documentation present (Swagger/OpenAPI)

### Database (7 items)

- [ ] Migrations are versioned and tested
- [ ] Indexes created for frequently queried columns
- [ ] Foreign keys properly defined
- [ ] No N+1 queries (use preloading)
- [ ] Transactions used for multi-step operations
- [ ] Connection pool configured correctly
- [ ] Performance tested (<100ms typical queries)

### Concurrency (5 items)

- [ ] Goroutine leaks prevented
- [ ] Race detector passes (`go test -race`)
- [ ] Context used correctly (not leaked)
- [ ] Timeouts set for all blocking operations
- [ ] Resource cleanup guaranteed (defer cleanup)

### Kubernetes (6 items)

- [ ] K8s client nil-checked for test environments
- [ ] Resources labeled properly
- [ ] Retry logic for transient failures
- [ ] Graceful shutdown implemented
- [ ] Resource requests/limits defined
- [ ] Probes configured (startup, readiness, liveness)

## Pre-Deployment Checklist (System Level)

### Configuration Management (8 items)

- [ ] All config from environment variables
- [ ] No secrets in version control
- [ ] Config validation on startup
- [ ] Defaults sensible but explicit
- [ ] Config documented in README
- [ ] Multiple environment configs tested
- [ ] Feature flags implemented where needed
- [ ] Config hot-reload tested if supported

### Logging and Monitoring (8 items)

- [ ] Structured JSON logging configured
- [ ] Log levels appropriate (Debug, Info, Warn, Error)
- [ ] Request IDs tracked through request lifecycle
- [ ] Metrics exposed at /metrics endpoint
- [ ] Health checks implemented (/health endpoint)
- [ ] Readiness/liveness probes work correctly
- [ ] Error rates monitored
- [ ] Performance metrics baseline established

### Performance (6 items)

- [ ] API response time < 200ms (p95)
- [ ] Database queries < 100ms typical
- [ ] K8s API calls < 500ms
- [ ] Memory usage < 512MB per pod
- [ ] Startup time < 30 seconds
- [ ] Load tested with expected traffic

### Security (10 items)

- [ ] Authentication implemented
- [ ] Authorization (RBAC) enforced
- [ ] Rate limiting enabled
- [ ] CORS configured correctly
- [ ] Security headers present
- [ ] Input validation enforced
- [ ] SQL injection prevention verified
- [ ] Secrets management configured
- [ ] TLS certificates valid
- [ ] Vulnerability scan passed (gosec)

### Operations (8 items)

- [ ] Runbooks written for common issues
- [ ] Alert thresholds set and tested
- [ ] Rollback procedures documented
- [ ] Backup/restore tested
- [ ] Disaster recovery plan exists
- [ ] On-call documentation complete
- [ ] Incident response procedures defined
- [ ] Service dependencies documented

## Pre-Release Checklist (Quality Gate)

### Code Review Completion (5 items)

- [ ] Minimum 2 reviewers approved
- [ ] All review comments addressed
- [ ] No blocking comments remain
- [ ] Security review completed
- [ ] Architecture review completed

### Testing Completion (8 items)

- [ ] Unit tests pass (100%)
- [ ] Integration tests pass (100%)
- [ ] Smoke tests pass
- [ ] Load tests pass
- [ ] Security tests pass
- [ ] No test skips without reason
- [ ] Coverage report reviewed
- [ ] Race detector clean

### CI/CD Pipeline (8 items)

- [ ] All GitHub Actions workflows pass
- [ ] Build completes in < 5 minutes
- [ ] Docker image builds successfully
- [ ] Linting passes (golangci-lint)
- [ ] Format check passes (gofmt)
- [ ] Vet passes (go vet)
- [ ] Dependency check passes
- [ ] License check passes (if applicable)

### Documentation (7 items)

- [ ] README.md updated
- [ ] API documentation updated
- [ ] Migration guide written (if applicable)
- [ ] Changelog entry added
- [ ] Code comments added for complex logic
- [ ] Architecture decision recorded
- [ ] Performance benchmarks updated

### Deployment Readiness (8 items)

- [ ] Deployment plan documented
- [ ] Rollback plan documented
- [ ] Communication plan ready
- [ ] Stakeholders notified
- [ ] Maintenance window scheduled (if needed)
- [ ] Monitoring configured
- [ ] Logging configured
- [ ] Alerting configured

## Production Deployment Checklist

### Pre-Deployment (10 items)

- [ ] Backup taken
- [ ] Deployment plan reviewed with team
- [ ] Rollback procedure tested
- [ ] Database migrations tested in staging
- [ ] Feature flags disabled by default
- [ ] Circuit breakers configured
- [ ] Rate limits tested
- [ ] Load balancing configured
- [ ] DNS propagation planned
- [ ] Communication channels open

### Deployment Execution (8 items)

- [ ] Deployment performed during planned window
- [ ] Deployment leader assigned
- [ ] Changes deployed incrementally
- [ ] Health checks passing after each step
- [ ] Logs monitored during deployment
- [ ] Metrics monitored during deployment
- [ ] Incidents tracked if any occur
- [ ] All steps documented in runbook

### Post-Deployment (10 items)

- [ ] All services healthy
- [ ] No error rate spike
- [ ] Performance metrics normal
- [ ] User-facing features working
- [ ] Database queries responsive
- [ ] API latency acceptable
- [ ] Memory/CPU usage normal
- [ ] All probes returning healthy
- [ ] Alerts not triggering
- [ ] Team standby for 1 hour

### Post-Release (8 items)

- [ ] Feature monitored for 24 hours
- [ ] Performance metrics stable
- [ ] Error rates normal
- [ ] User feedback positive
- [ ] No critical issues found
- [ ] Documentation updated with lessons learned
- [ ] Monitoring alerts tuned if needed
- [ ] Success communicated to stakeholders

## Production Code Compliance

### Skills Compliance Verification

Ensure code follows all applicable skills:

```
Mandatory Skills for All Code:
- golang-production-standards (required)
- error-handling-guide (required)
- security-best-practices (if handling user data)

Feature-Specific Skills:
- api-design-patterns (for API endpoints)
- database-best-practices (for database operations)
- kubernetes-integration (for K8s operations)
- testing-best-practices (for test code)
- package-organization (for new packages)
- file-structure-guidelines (for file organization)

Operations Skills:
- monitoring-observability (for logging/metrics)
- cicd-pipeline-optimization (for CI/CD)
```

### Automated Checks

```bash
# Code quality checks
go vet ./...
golangci-lint run
gofmt -l .

# Security checks
gosec ./...
trufflehog filesystem ./

# Testing
go test -race -cover -timeout 30m ./...

# Build
go build ./cmd/api
go build ./cmd/scheduler

# Docker
docker build -t platform-go:latest .

# Compliance
grep -r "TODO\|FIXME" --include="*.go" internal/ cmd/ || true
grep -r "print\|println" --include="*.go" internal/ cmd/ || true
```

## Common Failure Scenarios

### API Latency High (> 200ms p95)

Checklist:
- [ ] Database queries analyzed (use slow query log)
- [ ] N+1 queries identified and fixed
- [ ] Indexes verified on queried columns
- [ ] Connection pool size verified
- [ ] Caching strategy reviewed
- [ ] Load test results analyzed
- [ ] Network latency checked
- [ ] Third-party API latency checked

### Memory Usage High (> 512MB)

Checklist:
- [ ] Goroutine leaks detected with pprof
- [ ] Memory profiling run
- [ ] Large object allocations identified
- [ ] Cache eviction policies checked
- [ ] Database connection pool reviewed
- [ ] Resource cleanup verified
- [ ] GC tuning optimized
- [ ] Heap snapshot analyzed

### Error Rate Spike (> 1%)

Checklist:
- [ ] Error logs analyzed for pattern
- [ ] Dependencies health checked
- [ ] Database connectivity verified
- [ ] Rate limits triggered?
- [ ] Circuit breaker states checked
- [ ] Resource exhaustion checked
- [ ] Configuration changes reviewed
- [ ] Network connectivity tested

### Build Failure

Checklist:
- [ ] Compilation errors cleared
- [ ] Linting errors resolved
- [ ] Test failures investigated
- [ ] Docker build logs analyzed
- [ ] Dependency versions compatible
- [ ] Go version compatible
- [ ] CGO dependencies installed
- [ ] Build cache cleaned

## Metrics to Monitor Post-Deployment

### Availability Metrics

```
- Uptime percentage (target: 99.9%)
- Health check pass rate (target: 100%)
- Pod crash rate (target: 0%)
- Service availability (target: 99.9%)
```

### Performance Metrics

```
- API response time p50 (target: <50ms)
- API response time p95 (target: <200ms)
- API response time p99 (target: <500ms)
- Database query time (target: <100ms)
- K8s API call time (target: <500ms)
```

### Error Metrics

```
- Error rate (target: <0.1%)
- 5xx error rate (target: <0.01%)
- Timeout rate (target: <0.01%)
- Panic rate (target: 0%)
```

### Resource Metrics

```
- CPU usage (target: <70%)
- Memory usage (target: <70%)
- Disk usage (target: <80%)
- Network I/O (monitor trends)
```

## Production Standards Verification

All code must satisfy:

```
Code Quality:
- Golangci-lint: all checks pass
- Go fmt: all files formatted
- Go vet: no issues
- Coverage: >= 70%

Security:
- gosec: no high/critical issues
- trufflehog: no secrets found
- Dependencies: no known vulnerabilities

Performance:
- API: <200ms p95
- Database: <100ms
- Memory: <512MB per pod
- Startup: <30s

Testing:
- All tests pass
- Race detector clean
- Integration tests pass
- Load tests pass
```

## Sign-Off Process

Before deployment, require sign-off from:

- [ ] **Code Owner**: Reviewed code changes
- [ ] **Security Lead**: Security review passed
- [ ] **QA Lead**: Testing complete
- [ ] **DevOps Lead**: Deployment plan reviewed
- [ ] **Product Manager**: Feature readiness confirmed

## Emergency Rollback

If deployment issues occur:

1. **Immediate Actions** (< 5 minutes)
   - [ ] Alert team immediately
   - [ ] Stop deployment if in progress
   - [ ] Assess impact scope
   - [ ] Decide rollback or fix forward

2. **Rollback Execution** (< 30 minutes)
   - [ ] Execute rollback procedure
   - [ ] Verify previous version healthy
   - [ ] Monitor metrics return to normal
   - [ ] Document incident

3. **Post-Incident** (< 24 hours)
   - [ ] Root cause analysis
   - [ ] Prevention steps documented
   - [ ] Team retro/learning session
   - [ ] Updates to deployment procedure

---

**Note**: This checklist is comprehensive. Not all items apply to every release.
Customize based on your risk profile and service criticality.

Related Skills

autonomous-agent-readiness

16
from diegosouzapw/awesome-omni-skill

Assess a codebase's readiness for autonomous agent development and provide tailored recommendations. Use when asked to evaluate how well a project supports unattended agent execution, assess development practices for agent autonomy, audit infrastructure for agent reliability, or improve a codebase for autonomous agent workflows. Triggers on requests like "assess this project for agent readiness", "how autonomous-ready is this codebase", "evaluate agent infrastructure", or "improve development practices for agents".

editing-checklist

16
from diegosouzapw/awesome-omni-skill

Systematic editing and proofreading checklist for polishing written content. Use this skill when reviewing, editing, or proofreading drafts before publishing.

ai-video-production-master

16
from diegosouzapw/awesome-omni-skill

Expert in script-to-video production pipelines for Apple Silicon Macs. Specializes in hybrid local/cloud workflows, LoRA training for character consistency, motion graphics generation, and artist commissioning. Activate on 'AI video production', 'script to video', 'video generation pipeline', 'character consistency', 'LoRA training', 'cloud GPU', 'motion graphics', 'Wan I2V', 'InVideo alternative'. NOT for real-time video editing, video compositing (use DaVinci/Premiere), audio production, or 3D modeling (use Blender/Maya).

Release Checklist Gate

16
from diegosouzapw/awesome-omni-skill

Checklist gate for production release - must pass all items before deploying to production.

bgo

10
from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

large-data-with-dask

16
from diegosouzapw/awesome-omni-skill

Specific optimization strategies for Python scripts working with larger-than-memory datasets via Dask.

langsmith-fetch

16
from diegosouzapw/awesome-omni-skill

Debug LangChain and LangGraph agents by fetching execution traces from LangSmith Studio. Use when debugging agent behavior, investigating errors, analyzing tool calls, checking memory operations, or examining agent performance. Automatically fetches recent traces and analyzes execution patterns. Requires langsmith-fetch CLI installed.

langchain-tool-calling

16
from diegosouzapw/awesome-omni-skill

How chat models call tools - includes bind_tools, tool choice strategies, parallel tool calling, and tool message handling

langchain-notes

16
from diegosouzapw/awesome-omni-skill

LangChain 框架学习笔记 - 快速查找概念、代码示例和最佳实践。包含 Core components、Middleware、Advanced usage、Multi-agent patterns、RAG retrieval、Long-term memory 等主题。当用户询问 LangChain、Agent、RAG、向量存储、工具使用、记忆系统时使用此 Skill。

langchain-js

16
from diegosouzapw/awesome-omni-skill

Builds LLM-powered applications with LangChain.js for chat, agents, and RAG. Use when creating AI applications with chains, memory, tools, and retrieval-augmented generation in JavaScript.

langchain-agents

16
from diegosouzapw/awesome-omni-skill

Expert guidance for building LangChain agents with proper tool binding, memory, and configuration. Use when creating agents, configuring models, or setting up tool integrations in LangConfig.

lang-python

16
from diegosouzapw/awesome-omni-skill

Python 3.13+ development specialist covering FastAPI, Django, async patterns, data science, testing with pytest, and modern Python features. Use when developing Python APIs, web applications, data pipelines, or writing tests.