ai-engineering-skill

Practical guide for building production ML systems based on Chip Huyen's AI Engineering book. Use when users ask about model evaluation, deployment strategies, monitoring, data pipelines, feature engineering, cost optimization, or MLOps. Covers metrics, A/B testing, serving patterns, drift detection, and production best practices.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

ai-engineering-skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using ai-engineering-skill should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ai-engineering-skill/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/data-ai/ai-engineering-skill/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ai-engineering-skill/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ai-engineering-skill Compares

Feature / Agent	ai-engineering-skill	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# AI Engineering Guide

## Overview

**What this skill covers**

- Designing and scaling AI applications, including system architecture and user feedback mechanisms.
- Building and maintaining high-quality datasets for model training and finetuning.
- Evaluating AI systems with appropriate criteria and methods.
- Implementing effective evaluation methodologies for open-ended AI systems.
- Deciding between prompting, RAG, and finetuning, and configuring finetuning methods.
- Selecting and deploying foundation models, and optimizing model outputs.
- Optimizing LLM inference systems for latency and cost efficiency.
- Designing effective prompts and implementing safety measures in prompt engineering.
- Implementing RAG systems and designing AI agents.

**When to use this skill**

- When designing, scaling, and monitoring AI applications.
- When building, verifying, and maintaining datasets for model training.
- When selecting or evaluating AI systems and defining success metrics.
- When designing evaluations for open-ended AI systems.
- When deciding on and configuring finetuning methods.
- When selecting foundation models and planning compute resources.
- When optimizing inference systems for performance and cost.
- When constructing prompts and managing context in AI applications.
- When implementing RAG systems and designing AI agents.

## Reference Files Guide

### ai-engineering-architecture-and-user-feedback.md

**When to use:** Use this file for queries related to designing, scaling, and monitoring AI applications, implementing system architectures, and collecting user feedback. It is particularly useful for questions about decision frameworks, architecture progression, and feedback mechanisms in AI systems.

**Key topics:**

- System Architecture: Progression and components
- Context Enhancement: RAG and tools
- Guardrails: Input and output safety
- Model Routing: Routers and gateways
- Caching: Latency and cost reduction

**Example queries:**

- "How do I decide when to add new components to my AI system architecture?"
- "What are best practices for implementing guardrails in AI applications?"
- "How can I optimize caching to reduce latency and cost in AI models?"
- "What strategies should I use for collecting and utilizing user feedback in AI systems?"
- "When should I introduce an orchestrator in my AI pipeline?"

### dataset-engineering.md

**When to use:** Use this file when you need guidance on building, verifying, and maintaining high-quality datasets for pretraining or finetuning models. It is particularly useful for decisions involving dataset composition, synthetic data use, and data quality verification.

**Key topics:**

- Dataset Composition: Coverage strategy
- Finetuning Approaches: PEFT vs full finetuning
- Synthetic Data: Best practices and risks
- Data Quality: Verification and filtering methods
- Annotation Guidelines: Best practices

**Example queries:**

- "How do I decide between PEFT and full finetuning?"
- "What are best practices for using synthetic data?"
- "How can I verify the quality of my dataset?"
- "What should I include in annotation guidelines?"
- "How do I ensure dataset coverage for multiple languages?"

### evaluate-ai-systems.md

**When to use:** Use this file when designing, selecting, or evaluating AI systems, particularly when needing guidance on evaluation criteria, methods, and workflows for AI models.

**Key topics:**

- Evaluation Criteria: Defining success metrics
- Automation: Verification and monitoring
- Model Selection: Build vs buy decisions
- Safety: Screening for harmful content
- Instruction-Following: Adherence checks

**Example queries:**

- "How do I evaluate the success of my AI model?"
- "What criteria should I use to select an AI system?"
- "How can I automate the verification of AI outputs?"
- "What are the best practices for monitoring AI models in production?"
- "How do I ensure my AI system follows instructions accurately?"

### evaluation-methodology.md

**When to use:** Use this reference when designing, implementing, or operating evaluations for open-ended AI systems, especially when selecting evaluation methods, computing language modeling metrics, or using AI as a judge.

**Key topics:**

- Evaluation Methods: Choosing techniques
- Language Modeling: Perplexity and metrics
- Functional Correctness: Execution accuracy
- Similarity Evaluation: Lexical and semantic
- AI-as-Judge: LLM-based evaluation

**Example queries:**

- "How do I choose the right evaluation method for my AI model?"
- "What is perplexity and how do I compute it?"
- "How can I evaluate the functional correctness of generated code?"
- "What are the best practices for using AI as a judge in evaluations?"
- "How do I measure semantic similarity for text outputs?"

### finetuning.md

**When to use:** Use this file when you need guidance on deciding between prompting, retrieval-augmented generation (RAG), and finetuning, or when configuring and deploying finetuning methods like LoRA and QLoRA.

**Key topics:**

- Finetuning decision-making
- Hardware sizing and memory
- Method selection (LoRA, QLoRA)
- Hyperparameter configuration
- Model merging techniques

**Example queries:**

- "Should I use RAG or finetuning for my model?"
- "How do I configure LoRA for finetuning?"
- "What hardware do I need for finetuning a large model?"
- "How can I merge multiple finetuned models?"
- "What are the best practices for hyperparameter tuning in finetuning?"

### foundation-models.md

**When to use:** Use this file when you need guidance on selecting and deploying foundation models, planning compute resources, curating training data, or optimizing model outputs for specific tasks and domains.

**Key topics:**

- Model selection criteria
- Compute resource planning
- Training data strategy
- Post-training alignment
- Sampling and decoding techniques

**Example queries:**

- "How do I choose a model for non-English text processing?"
- "What are the best practices for planning compute resources for model training?"
- "How can I curate training data for a domain-specific model?"
- "What techniques can I use to align model outputs with human preferences?"
- "How do I optimize sampling parameters for reliable model outputs?"

### inference-optimization.md

**When to use:** Use this file when optimizing LLM inference systems for lower latency and cost, diagnosing bottlenecks, or implementing specific optimization techniques for model serving.

**Key topics:**

- Optimization Order: Step-by-step process
- Core Metrics: Latency and throughput
- Bottleneck Diagnosis: Compute vs bandwidth
- High-ROI Optimizations: Quantization, batching
- Parallelism Strategies: Scaling techniques

**Example queries:**

- "How can I reduce latency in LLM inference?"
- "What are the best practices for optimizing model throughput?"
- "How do I diagnose bottlenecks in my inference pipeline?"
- "What quantization techniques should I use for LLMs?"
- "How can I implement parallelism to scale my model serving?"

### introduction.md

**When to use:** Use this reference file when you need guidance on building AI applications using foundation models, including decision-making frameworks, evaluation techniques, and deployment strategies. It is particularly useful for queries related to adapting foundation models, selecting AI techniques, and optimizing AI application performance.

**Key topics:**

- Decision Frameworks: AI application viability
- Technique Selection: Prompting vs RAG vs Finetuning
- Evaluation Essentials: Metrics and datasets
- Deployment Strategies: Internal vs external-facing
- Inference Optimization: Latency and cost reduction

**Example queries:**

- "How do I decide whether to build or buy an AI solution?"
- "What are the best practices for evaluating AI model performance?"
- "How can I optimize the latency of my AI application?"
- "When should I use RAG over finetuning for my AI model?"
- "What metrics should I track for AI deployment success?"

### prompt-engineering.md

**When to use:** Use this file for queries related to designing effective prompts, optimizing prompt structures, and implementing safety measures in prompt engineering. It is particularly useful for tasks involving prompt construction, context management, and defensive strategies against prompt injection.

**Key topics:**

- Prompt Anatomy: Core components
- Chat Templates: Correctness and guardrails
- In-Context Learning: Zero-shot vs few-shot
- Context Efficiency: Length and structure
- Defensive Engineering: Threats and defenses

**Example queries:**

- "How do I structure a prompt for few-shot learning?"
- "What are best practices for managing long context in prompts?"
- "How can I prevent prompt injection attacks?"
- "What is the correct chat template format for my model?"
- "How do I implement chain-of-thought reasoning in prompts?"

### rag-and-agents.md

**When to use:** Use this reference file when dealing with queries related to implementing Retrieval-Augmented Generation (RAG) systems, optimizing retrieval algorithms, or designing and deploying AI agents. It is particularly useful for questions about retrieval strategies, agent architectures, and memory management in AI systems.

**Key topics:**

- RAG Pipeline: Essential components
- Retrieval Algorithms: Hybrid search
- Chunking Strategy: Defaults and variants
- Vector Search: Index selection
- Agent Architecture: Planning and execution

**Example queries:**

- "How do I implement a RAG system to reduce hallucinations?"
- "What are the best practices for hybrid search using BM25 and vector search?"
- "How should I design an AI agent to handle multi-step tasks?"
- "What chunking strategy should I use for optimal retrieval performance?"
- "How can I manage memory effectively in a RAG system?"

Related Skills

data-engineering-data-pipeline

from diegosouzapw/awesome-omni-skill

You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.

context-engineering

from diegosouzapw/awesome-omni-skill

Use when designing agent system prompts, optimizing RAG retrieval, or when context is too expensive or slow. Reduces tokens while maintaining quality through strategic positioning and attention-aware design.

Build Your Data Engineering Skill

from diegosouzapw/awesome-omni-skill

Create your LLMOps data engineering skill in one prompt, then learn to improve it throughout the chapter

ai-data-engineering

from diegosouzapw/awesome-omni-skill

Data pipelines, feature stores, and embedding generation for AI/ML systems. Use when building RAG pipelines, ML feature serving, or data transformations. Covers feature stores (Feast, Tecton), embedding pipelines, chunking strategies, orchestration (Dagster, Prefect, Airflow), dbt transformations, data versioning (LakeFS), and experiment tracking (MLflow, W&B).

Data Engineering Data Driven Feature

from diegosouzapw/awesome-omni-skill

World-class data science skill for statistical modeling, experimentation, causal inference, and advanced analytics. Expertise in Python (NumPy, Pandas, Scikit-learn), R, SQL, statistical methods, A/B testing, time series, and business intelligence. Includes experiment design, feature engineering, model evaluation, and stakeholder communication.

ai-marketing-engineering

from diegosouzapw/awesome-omni-skill

AI-powered marketing engineering skill based on Alon Huri's framework. Transforms marketing from copywriting to engineering discipline through 10 agentic mechanisms: infinite creative generation, adaptive budget management, LTV signal hunting, contextual data layers, AEO optimization, dynamic quizzes, behavior-driven activation, personalized video at scale, competitor weakness targeting, and active churn prevention. Use when building marketing automation systems, designing growth engineering workflows, creating AI-powered marketing agents, optimizing ad creatives at scale, implementing AEO (Answer Engine Optimization), or architecting data-driven marketing infrastructure.

u0542-engineering-multi-agent-negotiation-mediator

from diegosouzapw/awesome-omni-skill

Operate the "Engineering Multi-Agent Negotiation Mediator" capability in production for workflows. Use when mission execution explicitly requires this capability and outcomes must be reproducible, policy-gated, and handoff-ready.

prompt-engineering

from diegosouzapw/awesome-omni-skill

Write effective prompts for AI coding agents. Use when crafting system prompts, implementing chain-of-thought reasoning, building few-shot examples, adding guardrails, configuring tool use, or designing agentic prompt patterns. Covers CoT, few-shot, guardrails, and function calling.

Prompt Engineering Skill

from diegosouzapw/awesome-omni-skill

Craft effective prompts that get the best results from language models.

prompt-engineering-patterns

from diegosouzapw/awesome-omni-skill

Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability in production. Use when optimizing prompts, improving LLM outputs, or designing production prompt templates.

engineering-ai

from diegosouzapw/awesome-omni-skill

Builds the AI intelligence layer including LLM integrations, agentic workflows, MCP servers, and intelligent automation in Python. Activates when adding AI features, building agents, implementing MCP servers, integrating LLMs, creating prompts, or adding intelligence to the app. Does not handle core business logic or APIs (backend-developer), frontend UI (frontend-developer), infrastructure (devops), or security review (security).

context-engineering-collection

from diegosouzapw/awesome-omni-skill

A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems. Use when building, optimizing, or debugging agent systems that require effective context management.