ClaudeSoftware Engineering

agentic-engineering

Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing. Use when AI agents perform most implementation work and humans enforce quality and risk controls.

144,923 stars

byaffaan-m

Complexity: easy

View on GitHub Installation ↓

About this skill

This skill provides a foundational operating framework for AI agents to perform complex software engineering tasks with high quality and efficiency. It instills an "agentic engineering" mindset, guiding the AI to autonomously manage implementation work while facilitating human oversight for quality and risk. Key tenets include defining clear completion criteria upfront, breaking down large problems into manageable, agent-sized units, strategically selecting AI model tiers based on task complexity and cost, and rigorously validating outputs through a continuous eval-first and regression testing loop. This approach ensures that agents produce reliable, production-ready code and systems, minimizing errors and maximizing developer productivity.

Best use case

Ideal for software development projects where AI agents are expected to handle a significant portion of the implementation, from drafting code and designing components to executing and validating solutions. It's particularly useful for building robust APIs, refactoring codebases, or developing new features where systematic, quality-driven development is paramount.

More robust, higher-quality code and system implementations from AI agents. Reduced need for human intervention in repetitive coding tasks, freeing up engineers for higher-level work. Optimized computational costs by intelligently routing tasks to appropriate AI models. Faster iteration cycles due to the integrated eval-first approach, catching issues early. Improved adherence to software engineering best practices within AI-driven development workflows.

Practical example

Example input

You are an agentic engineer. Your task is to implement a RESTful API endpoint for user profile management. Begin by defining clear completion criteria, then decompose the task, and use eval-first development. Ensure robust error handling and cost-aware model routing.

Example output

// Agent's internal thought process and actions:
THINK: Applying agentic engineering principles.
1. Define Completion Criteria:
   - API endpoint /users/{id} supports GET, PUT, DELETE.
   - Data validation for PUT requests.
   - Proper error handling (400, 404, 500).
   - Unit tests cover all endpoints and edge cases.
   - Performance within 100ms response time.
2. Decompose Task:
   - Subtask 1: Design API schema for User.
   - Subtask 2: Implement GET /users/{id} endpoint.
   - Subtask 3: Implement PUT /users/{id} endpoint with validation.
   - Subtask 4: Implement DELETE /users/{id} endpoint.
   - Subtask 5: Write unit tests for all endpoints.
   - Subtask 6: Integrate with database (mock initially).
3. Eval-First Loop (for Subtask 2 - GET endpoint):
   - Define Eval: Test if GET /users/1 returns user data correctly, and 404 for non-existent.
   - Run Baseline: (Attempt to run before implementation, capture 'failure signatures' - e.g., endpoint not found).
   - Execute Implementation: Generate code for GET endpoint. (Route to Claude-3-Opus for complex logic, Claude-3-Sonnet for boilerplate).
   - Re-run Eval: Verify GET /users/1 passes. If not, debug and iterate.
... (continue for all subtasks with appropriate model routing and evals)
Final Output: Deployed API endpoint with test suite and detailed implementation report.

When to use this skill

When AI agents are tasked with performing most of the coding and implementation work.
For projects requiring high-quality, production-grade software development practices (e.g., robust API design, error handling, versioning).
When human engineers need to focus on architecture, quality assurance, and risk control rather than granular implementation.
To optimize AI model usage by routing tasks to the most cost-effective and capable models.

When not to use this skill

For trivial, one-off scripting tasks that don't require extensive decomposition or rigorous evaluation.
In highly exploratory or research-oriented coding where the primary goal is rapid prototyping without immediate production concerns.
When the human operator prefers direct, step-by-step control over the agent's execution rather than an autonomous, agentic workflow.
In environments where access to multiple model tiers or evaluation frameworks is not available or practical.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/agentic-engineering/SKILL.md --create-dirs "https://raw.githubusercontent.com/affaan-m/everything-claude-code/main/.kiro/skills/agentic-engineering/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/agentic-engineering/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How agentic-engineering Compares

Feature / Agent	agentic-engineering	Standard Approach
Platform Support	Claude	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	easy	N/A

Frequently Asked Questions

What does this skill do?

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

SKILL.md Source

# Agentic Engineering

Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls.

## Operating Principles

1. Define completion criteria before execution.
2. Decompose work into agent-sized units.
3. Route model tiers by task complexity.
4. Measure with evals and regression checks.

## Eval-First Loop

1. Define capability eval and regression eval.
2. Run baseline and capture failure signatures.
3. Execute implementation.
4. Re-run evals and compare deltas.

**Example workflow:**
```
1. Write test that captures desired behavior (eval)
2. Run test → capture baseline failures
3. Implement feature
4. Re-run test → verify improvements
5. Check for regressions in other tests
```

## Task Decomposition

Apply the 15-minute unit rule:
- Each unit should be independently verifiable
- Each unit should have a single dominant risk
- Each unit should expose a clear done condition

**Good decomposition:**
```
Task: Add user authentication
├─ Unit 1: Add password hashing (15 min, security risk)
├─ Unit 2: Create login endpoint (15 min, API contract risk)
├─ Unit 3: Add session management (15 min, state risk)
└─ Unit 4: Protect routes with middleware (15 min, auth logic risk)
```

**Bad decomposition:**
```
Task: Add user authentication (2 hours, multiple risks)
```

## Model Routing

Choose model tier based on task complexity:

- **Haiku**: Classification, boilerplate transforms, narrow edits
- Example: Rename variable, add type annotation, format code

- **Sonnet**: Implementation and refactors
- Example: Implement feature, refactor module, write tests

- **Opus**: Architecture, root-cause analysis, multi-file invariants
- Example: Design system, debug complex issue, review architecture

**Cost discipline:** Escalate model tier only when lower tier fails with a clear reasoning gap.

## Session Strategy

- **Continue session** for closely-coupled units
- Example: Implementing related functions in same module

- **Start fresh session** after major phase transitions
- Example: Moving from implementation to testing

- **Compact after milestone completion**, not during active debugging
- Example: After feature complete, before starting next feature

## Review Focus for AI-Generated Code

Prioritize:
- Invariants and edge cases
- Error boundaries
- Security and auth assumptions
- Hidden coupling and rollout risk

Do not waste review cycles on style-only disagreements when automated format/lint already enforce style.

**Review checklist:**
- [ ] Edge cases handled (null, empty, boundary values)
- [ ] Error handling comprehensive
- [ ] Security assumptions validated
- [ ] No hidden coupling between modules
- [ ] Rollout risk assessed (breaking changes, migrations)

## Cost Discipline

Track per task:
- Model tier used
- Token estimate
- Retries needed
- Wall-clock time
- Success/failure outcome

**Example tracking:**
```
Task: Implement user login
Model: Sonnet
Tokens: ~5k input, ~2k output
Retries: 1 (initial implementation had auth bug)
Time: 8 minutes
Outcome: Success
```

## When to Use This Skill

144923

from affaan-m/everything-claude-code

Build reusable Manim explainers for technical concepts, graphs, system diagrams, and product walkthroughs, then hand off to the wider ECC video stack if needed. Use when the user wants a clean animated explainer rather than a generic talking-head script.

DevelopmentClaude