benchmark-suite-manager

Manage benchmarks for algorithm engineering experiments and evaluations

509 stars

Best use case

benchmark-suite-manager is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Manage benchmarks for algorithm engineering experiments and evaluations

Teams using benchmark-suite-manager should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/benchmark-suite-manager/SKILL.md --create-dirs "https://raw.githubusercontent.com/a5c-ai/babysitter/main/library/specializations/domains/science/computer-science/skills/benchmark-suite-manager/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/benchmark-suite-manager/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How benchmark-suite-manager Compares

Feature / Agent	benchmark-suite-manager	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Manage benchmarks for algorithm engineering experiments and evaluations

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Benchmark Suite Manager

## Purpose

Provides expert guidance on managing benchmark suites for algorithm engineering and experimental evaluation.

## Capabilities

- Standard benchmark suite access (DIMACS, TSPLIB, etc.)
- Instance generation for specific problem classes
- Statistical analysis of results
- Performance comparison tables
- Visualization of scaling behavior
- Reproducibility support

## Usage Guidelines

1. **Suite Selection**: Choose appropriate benchmark suite
2. **Instance Selection**: Select representative instances
3. **Execution**: Run experiments systematically
4. **Analysis**: Perform statistical analysis
5. **Reporting**: Generate comparison tables and plots

## Tools/Libraries

- DIMACS
- TSPLIB
- SuiteSparse Matrix Collection
- Statistical tools

Related Skills

Burp Suite/Web Security Skill

509

from a5c-ai/babysitter

Web application security testing with Burp Suite integration

plugin-registry-manager

509

from a5c-ai/babysitter

Manage SDK plugin discovery and registration

performance-benchmark-suite

509

from a5c-ai/babysitter

SDK performance benchmarking and regression detection

deprecation-manager

509

from a5c-ai/babysitter

Manage API and SDK deprecation lifecycle

api-key-manager

509

from a5c-ai/babysitter

API key generation, rotation, and management system

gpu-benchmarking

509

from a5c-ai/babysitter

Expert skill for automated GPU performance benchmarking and regression detection. Design micro-benchmarks, measure kernel execution time with CUDA events, calculate achieved vs theoretical performance, generate comparison reports, detect regressions in CI/CD, and profile power/thermal characteristics.