../../../engineering/agenthub/skills/eval/SKILL.md

25 stars

Best use case

../../../engineering/agenthub/skills/eval/SKILL.md is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using ../../../engineering/agenthub/skills/eval/SKILL.md should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/eval/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/alirezarezvani/claude-skills/eval/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/eval/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ../../../engineering/agenthub/skills/eval/SKILL.md Compares

Feature / Agent	../../../engineering/agenthub/skills/eval/SKILL.md	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

This skill provides specific capabilities for your AI agent. See the About section for full details.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

../../../engineering/agenthub/skills/eval/SKILL.md

Related Skills

evaluating-machine-learning-models

from ComeOnOliver/skillshub

This skill allows Claude to evaluate machine learning models using a comprehensive suite of metrics. It should be used when the user requests model performance analysis, validation, or testing. Claude can use this skill to assess model accuracy, precision, recall, F1-score, and other relevant metrics. Trigger this skill when the user mentions "evaluate model", "model performance", "testing metrics", "validation results", or requests a comprehensive "model evaluation".

model-evaluation-metrics

from ComeOnOliver/skillshub

Model Evaluation Metrics - Auto-activating skill for ML Training. Triggers on: model evaluation metrics, model evaluation metrics Part of the ML Training skill category.

engineering-features-for-machine-learning

from ComeOnOliver/skillshub

This skill empowers Claude to perform feature engineering tasks for machine learning. It creates, selects, and transforms features to improve model performance. Use this skill when the user requests feature creation, feature selection, feature transformation, or any request that involves improving the features used in a machine learning model. Trigger terms include "feature engineering", "feature selection", "feature transformation", "create features", "select features", "transform features", "improve model performance", and similar phrases related to feature manipulation.

feature-engineering-helper

from ComeOnOliver/skillshub

Feature Engineering Helper - Auto-activating skill for ML Training. Triggers on: feature engineering helper, feature engineering helper Part of the ML Training skill category.

conducting-chaos-engineering

from ComeOnOliver/skillshub

This skill enables Claude to design and execute chaos engineering experiments to test system resilience. It is used when the user requests help with failure injection, latency simulation, resource exhaustion testing, or resilience validation. The skill is triggered by discussions of chaos experiments (GameDays), failure injection strategies, resilience testing, and validation of recovery mechanisms like circuit breakers and retry logic. It leverages tools like Chaos Mesh, Gremlin, Toxiproxy, and AWS FIS to simulate real-world failures and assess system behavior.

skill-evaluator

from ComeOnOliver/skillshub

Evaluates agent skills against Anthropic's best practices. Use when asked to review, evaluate, assess, or audit a skill for quality. Analyzes SKILL.md structure, naming conventions, description quality, content organization, and identifies anti-patterns. Produces actionable improvement recommendations.

suggest-awesome-github-copilot-skills

from ComeOnOliver/skillshub

Suggest relevant GitHub Copilot skills from the awesome-copilot repository based on current repository context and chat history, avoiding duplicates with existing skills in this repository, and identifying outdated skills that need updates.

eval-driven-dev

from ComeOnOliver/skillshub

Add instrumentation, build golden datasets, write eval-based tests, run them, root-cause failures, and iterate — Ensure your Python LLM application works correctly. Make sure to use this skill whenever a user is developing, testing, QA-ing, evaluating, or benchmarking a Python project that calls an LLM. Use for making sure an LLM application works correctly, catching regressions after prompt changes, fixing unexpected behavior, or validating output quality before shipping.

Testing Handbook Skills

from ComeOnOliver/skillshub

Comprehensive security testing toolkit generated from the [Trail of Bits Application Security Testing Handbook](https://appsec.guide/).

ROS 2 Engineering Skills

from ComeOnOliver/skillshub

A progressive-disclosure skill for ROS 2 development — from first workspace to

using-dbt-for-analytics-engineering

from ComeOnOliver/skillshub

Builds and modifies dbt models, writes SQL transformations using ref() and source(), creates tests, and validates results with dbt show. Use when doing any dbt work - building or modifying models, debugging errors, exploring unfamiliar data sources, writing tests, or evaluating impact of changes.

auditing-skills

from ComeOnOliver/skillshub

Use when checking skills for security or quality issues, reviewing audit results from skills.sh or Tessl, or remediating findings across published skills.