production-principles

Production-ready development principles balancing simplicity with reliability for 10-100 MSP scale

16 stars

Best use case

production-principles is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Production-ready development principles balancing simplicity with reliability for 10-100 MSP scale

Teams using production-principles should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/production-principles/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/product/production-principles/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/production-principles/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How production-principles Compares

Feature / Agentproduction-principlesStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Production-ready development principles balancing simplicity with reliability for 10-100 MSP scale

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Production Development Principles

These are universal production development principles for any project.

> **Philosophy**: Simple, scalable, maintainable. Not MVP shortcuts, not enterprise bloat.

## Golden Rules (ALWAYS Follow)

1. **If it works reliably, ship it** - Perfect is still the enemy of done
2. **YAGNI until you need it** - Don't build for hypothetical futures
3. **Simple files > Complex architecture** - Start simple, extract when needed
4. **Direct > Abstract** - Prefer direct solutions, abstract when patterns emerge
5. **Start hardcoded, extract when needed** - Make it configurable after the 3rd use
6. **Quality matters now** - We have paying customers, but over-engineering still hurts
7. **200 lines before extracting** - Functions/features can be larger now, but extract at 200 lines

## Reality Check (Where We Are)

- **your current customer base**: Optimize for this scale, not millions
- **Production beta**: Real customers, but still learning
- **Multi-tenant**: Each customer has unique needs
- **Speed + Reliability**: Ship fast, but don't break things
- **Technical debt payback**: Fix issues that impact customers NOW

## Patterns to Avoid (Nuanced for Our Scale)

### ❌ Avoid Unless Justified

These patterns add complexity. Only use if you meet the criteria:

- **Factory patterns**: Avoid unless you have 5+ different implementations
- **Dependency injection frameworks**: Avoid unless team size >5 developers (Go interfaces are fine)
- **Abstract base classes**: Avoid unless you have 3+ concrete implementations
- **Event sourcing / CQRS**: Avoid unless you have audit requirements or >10,000 events/day
- **Microservices**: Avoid unless monolith is >100k LOC or team >10 developers
- **Complex repository patterns**: Avoid unless you have 5+ data sources (direct queries + transactions are fine)
- **Service meshes**: Avoid unless you have >20 services
- **API gateways**: Avoid unless you have >10 backend services (nginx is enough)
- **Custom frameworks**: Avoid unless you're doing the same thing 10+ times

### 🚫 Still Completely Banned

- **Premature optimization**: Never optimize before measuring
- **Speculative generality**: Never build for "what if" scenarios
- **Gold plating**: Never add features "because it's cool"
- **Resume-driven development**: Never use tech "to learn it"

## Patterns to Use (Production-Ready)

✅ **Strongly Encouraged**:
- Simple functions with clear names
- Direct database queries with transactions for multi-step operations
- Configuration files/env vars (not hardcoded secrets)
- Defensive coding (validation, error handling, retries)
- Logging and monitoring (errors, performance, business metrics)
- Inline code when <3 uses, extract when 3+ uses (Rule of Three)
- Database migrations (not raw SQL changes)
- Basic caching when queries are measured as slow (>500ms)
- Polling with smart intervals (not webhooks unless push is required)
- Functions up to 200 lines (extract at 200, not 50)

## Production Concerns (NEW)

### 🚨 Must Haves for Production

1. **Error Handling**
   - All external calls wrapped in try/catch
   - Errors logged with context (user, customer, operation)
   - User-friendly error messages
   - Retry logic for transient failures (network, rate limits)

2. **Data Integrity**
   - Use database transactions for multi-step operations
   - Validate inputs before writing to database
   - Backups run daily (already set up)
   - Soft deletes for critical data (tickets, users)

3. **Observability**
   - Log all errors with stack traces
   - Log slow operations (>2s)
   - Monitor API response times
   - Track business metrics relevant to your product

4. **Security**
   - Never log secrets/API keys (use last 4 chars only)
   - Validate + sanitize user inputs
   - Rate limiting on public endpoints
   - Keep dependencies updated (monthly review)

5. **Multi-tenancy** (if applicable)
   - Every query includes tenant ID filter
   - Test with multiple tenants
   - No cross-tenant data leaks

### ⚖️ Production vs Speed Balance

#### Ship Fast (do these)
- Inline validation (no validation framework)
- Direct SQL queries (no ORM)
- Environment variables for config
- Simple retry logic (3 attempts, exponential backoff)
- File-based logs (rotate daily)

#### Take Time (do these right)
- Database migrations (use migrate tool)
- Authentication/authorization (test thoroughly)
- Data export/import (customers depend on this)
- Email delivery (use queue + retries)
- Payment processing (never cut corners)

## Decision Framework (Updated for Production)

Before ANY architectural decision, ask:

### Question 1: Is this reliable for production?
- **If yes**: Proceed
- **If no**: What's missing? (error handling, validation, logging)

### Question 2: Will 100 customers break this?
- **If no**: Ship it
- **If yes**: What's the bottleneck? Add specific fix (caching, indexing, pagination)

### Question 3: Can another dev maintain this in 6 months?
- **If yes**: Good complexity level
- **If no**: Add comments, extract functions, simplify

### Question 4: What's the blast radius if this fails?
- **One user**: Ship it, fix if it breaks
- **One customer**: Add error handling + logging
- **All customers**: Add retry logic, monitoring, fallbacks

## When to Add Abstraction (NEW)

### Triggers for Abstraction

Extract to function/class when:
- **Rule of Three**: Same logic used 3+ times
- **Domain complexity**: Business logic gets complicated (AI logic, ticket routing)
- **Testing**: Hard to test without extraction
- **Multiple implementations**: 3+ ways to do something (Zendesk, Jira, email)
- **File size**: Function/feature exceeds 200 lines

### Extraction Examples

#### ✅ Good Abstractions (Justified)

- **Extract after 3rd duplicate**: A validation function used by 3+ handlers
- **Extract complex business logic**: When a single function exceeds 200 lines with conditional logic
- **Extract when 3+ implementations exist**: e.g., EmailProvider, SlackProvider, TeamsProvider — three implementations justify an interface

#### ❌ Still Over-Engineering

- **Abstract factories** when you only have 1 implementation
- **Generic repository patterns** when direct queries work fine
- **Configuration managers** when environment variables are enough

## Simplicity Checkpoints (Updated)

### Before Starting
- [ ] Is this the simplest RELIABLE approach?
- [ ] Do we need this for your current customer base (not 10,000)?
- [ ] Can this be 1-5 files?
- [ ] Is error handling included?
- [ ] Is this easily testable?

### During Implementation
- [ ] Am I adding abstraction before 3rd use?
- [ ] Am I creating >10 files? (Consolidate related logic)
- [ ] Did I add error handling + logging?
- [ ] Would another dev understand this in 6 months?
- [ ] Is this function >200 lines? (Extract if yes)

### Before Committing
- [ ] Does this handle failures gracefully?
- [ ] Are errors logged with context?
- [ ] Is complex logic tested (unit tests)?
- [ ] Can I deploy this without breaking existing customers?

## Scaling Triggers (When to Refactor)

### Refactor When You Hit These Limits

1. **Performance** (actual, not hypothetical)
   - API responses >2s consistently
   - Database queries >500ms
   - Memory usage growing unbounded
   - CPU consistently >70%

2. **Maintainability** (team pain)
   - Same bug appears 3+ times (extract + fix once)
   - Code duplicated 5+ times (extract + reuse)
   - New feature takes 2x longer than expected
   - Onboarding new dev takes >1 week

3. **Scale** (customer impact)
   - Customer count exceeding what your current architecture handles
   - Request volume exceeding what your database/server can handle
   - Database size requiring optimization or sharding

4. **Customer complaints** (real problems)
   - Specific feature requested by 5+ customers
   - Same issue reported 3+ times
   - Security concern raised by customer
   - Competitor has feature we don't

### Don't Refactor For

- "Clean code" principles (if it works reliably)
- Hypothetical scale (until you're at 80% of limit)
- Latest framework/library (unless security fix)
- Personal preferences (consistency > perfection)

## Mantras (Updated for Production)

- "Simple + Reliable beats complex + perfect"
- "Scale when you hit limits, not before"
- "Make it work, make it right, make it fast - IN THAT ORDER"
- "Abstract after 3rd duplicate, not before"
- "Add what you need, remove what you don't"
- "Customers don't care about architecture"
- "200 lines before extracting, not 50"

## When to Add "Enterprise" Patterns

Use enterprise patterns **ONLY when you meet ALL criteria**:

| Pattern | Minimum Requirements |
|---------|---------------------|
| Factory Pattern | 5+ different implementations |
| DI Framework | Team of 5+ developers |
| Microservices | Monolith >100k LOC OR team >10 developers |
| Event Sourcing | Audit requirement OR >10k events/day |
| CQRS | Read/write performance measured as bottleneck |
| Service Mesh | 20+ microservices |
| API Gateway | 10+ backend services |
| Repository Pattern | 5+ different data sources |

**Until you hit these thresholds**: Keep it simple

## The Prime Directive (Updated)

> **Build the simplest reliable thing that works for your current customer base. Then ship it.**

If you find yourself:
- Creating >10 files for a feature
- Writing >200 lines without extracting
- Thinking about "1000+ customer scalability"
- Adding abstraction before 3rd use
- Building generic frameworks

**STOP and ask**:

> **"What's the simplest RELIABLE way to make this work for 100 customers?"**

## Remember

You're not building for:
- ❌ Millions of users (unless you actually have them)
- ❌ Fortune 500 enterprise (unless you are one)
- ❌ Infinite scale (you need finite, measured scale)

You're building for:
- ✅ Your actual current user/customer count
- ✅ Fast iteration based on real feedback
- ✅ Reliable service for paying customers
- ✅ Maintainable codebase that your team can work on

**Ship working, reliable code. Ship it fast. Iterate based on customer feedback.**

Related Skills

production-scheduling

16
from diegosouzapw/awesome-omni-skill

Codified expertise for production scheduling, job sequencing, line balancing, changeover optimisation, and bottleneck resolution in discrete and batch manufacturing.

android-workflow-production

16
from diegosouzapw/awesome-omni-skill

Generate GitHub Actions workflows for production deployment with staged rollout

schwartz_principles_surgery_qa_rationalization

16
from diegosouzapw/awesome-omni-skill

Answer medical and surgery questions using Schwartz Principles of Surgery as the primary reference, providing detailed rationales for each multiple-choice option and explicit source attribution.

production-dockerfile

16
from diegosouzapw/awesome-omni-skill

Generate production-ready Dockerfiles with multi-stage builds, security best practices, and optimization. Use when containerizing Python applications for production deployment.

gitops-principles-skill

16
from diegosouzapw/awesome-omni-skill

Comprehensive GitOps methodology and principles skill for cloud-native operations. Use when (1) Designing GitOps architecture for Kubernetes deployments, (2) Implementing declarative infrastructure with Git as single source of truth, (3) Setting up continuous deployment pipelines with ArgoCD/Flux/Kargo, (4) Establishing branching strategies and repository structures, (5) Troubleshooting drift, sync failures, or reconciliation issues, (6) Evaluating GitOps tooling decisions, (7) Teaching or explaining GitOps concepts and best practices, (8) Deploying ArgoCD on Azure Arc-enabled Kubernetes or AKS with workload identity. Covers the 4 pillars of GitOps (OpenGitOps), patterns, anti-patterns, tooling ecosystem, Azure Arc integration, and operational guidance.

deploy-production

16
from diegosouzapw/awesome-omni-skill

Deploy to production environments with safety checks. Use when releasing code to production. Not for staging deploys or local builds unless specifically requested.

cloudflare-d1-migrations-and-production-seeding

16
from diegosouzapw/awesome-omni-skill

Use this skill whenever the user wants to design, run, or refine Cloudflare D1 schema management, migrations, and data seeding for dev/staging/production environments, especially in conjunction with Hono/Workers apps.

testing-principles

16
from diegosouzapw/awesome-omni-skill

Language-agnostic testing principles including TDD, test quality, coverage standards, and test design patterns. Use when writing tests, designing test strategies, or reviewing test quality.

production-code-audit

16
from diegosouzapw/awesome-omni-skill

Autonomously deep-scan entire codebase line-by-line, understand architecture and patterns, then systematically transform it to production-grade, corporate-level professional quality with optimizations

principles

16
from diegosouzapw/awesome-omni-skill

Provides development principles, guidelines, and VibeCoder guidance. Use when user mentions 原則, principles, ガイドライン, guidelines, VibeCoder, 安全性, safety, 差分編集, diff-aware. Triggers: 原則, principles, ガイドライン, VibeCoder, 安全性, 差分編集. Do not use for actual implementation - use impl skill instead.

nextjs-production-debugger

16
from diegosouzapw/awesome-omni-skill

Advanced debugging guide for Next.js App Router production issues including SSR/CSR bugs, hydration errors, runtime mismatches, performance, and caching.

new-saas-production

16
from diegosouzapw/awesome-omni-skill

Production-ready SaaS boilerplate generator. Creates complete project with public landing pages, client dashboard, admin panel, T-Bank payment integration, and v12-style design system. FastAPI + Jinja2 + PostgreSQL stack.