llamacpp-bench
Run llama.cpp benchmarks on GGUF models to measure prompt processing (pp) and token generation (tg) performance. Use when the user wants to benchmark LLM models, compare model performance, test inference speed, or run llama-bench on GGUF files. Supports Vulkan, CUDA, ROCm, and CPU backends.
Best use case
llamacpp-bench is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Run llama.cpp benchmarks on GGUF models to measure prompt processing (pp) and token generation (tg) performance. Use when the user wants to benchmark LLM models, compare model performance, test inference speed, or run llama-bench on GGUF files. Supports Vulkan, CUDA, ROCm, and CPU backends.
Teams using llamacpp-bench should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/llamacpp-bench/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How llamacpp-bench Compares
| Feature / Agent | llamacpp-bench | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Run llama.cpp benchmarks on GGUF models to measure prompt processing (pp) and token generation (tg) performance. Use when the user wants to benchmark LLM models, compare model performance, test inference speed, or run llama-bench on GGUF files. Supports Vulkan, CUDA, ROCm, and CPU backends.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
SKILL.md Source
# llamacpp-bench Run standardized benchmarks on GGUF models using llama.cpp's `llama-bench` tool. ## Quick Start ```bash # Basic benchmark llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99 # With specific backend LLAMA_BACKEND=vulkan llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99 ``` ## Benchmark Parameters | Parameter | Description | Default | |-----------|-------------|---------| | `-m` | Model path (GGUF file) | required | | `-p` | Prompt sizes to test | 512 | | `-n` | Generation lengths to test | 128 | | `-ngl` | GPU layers to offload | 99 | | `-t` | CPU threads | auto | | `-dev` | Device selection | auto | ## Standard Test Suite For consistent comparisons across models, use: ```bash -p 512,1024,2048 -n 128,256 -ngl 99 ``` This tests: - **Prompt processing**: 512, 1024, 2048 tokens - **Token generation**: 128, 256 tokens ## Interpreting Results | Metric | Meaning | Good Performance | |--------|---------|------------------| | `pp512` | Prompt processing speed at 512 tokens | >1000 t/s | | `pp1024` | Prompt processing speed at 1024 tokens | >1000 t/s | | `pp2048` | Prompt processing speed at 2048 tokens | >1000 t/s | | `tg128` | Token generation speed (128 tokens) | >50 t/s | | `tg256` | Token generation speed (256 tokens) | >50 t/s | ## Backend Selection llama-bench auto-detects available backends. Priority order: 1. CUDA (NVIDIA GPUs) 2. ROCm (AMD GPUs) 3. Vulkan (cross-platform GPU) 4. CPU (fallback) To force a backend, set environment variable or check build: ```bash # Check available backends llama-bench --help | grep -i "backend\|cuda\|rocm\|vulkan" ``` ## Batch Benchmarking Use the provided script for benchmarking multiple models: ```bash ./scripts/benchmark_models.sh /path/to/models/*.gguf ``` ## Saving Results Output can be redirected to a file: ```bash llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99 > results.txt ``` Or use the benchmark script which auto-saves to timestamped files. ## Common Issues 1. **Out of memory**: Reduce `-ngl` (GPU layers) or test smaller prompt sizes 2. **Slow CPU performance**: Ensure `-t` matches CPU core count 3. **Backend not found**: Check llama.cpp was built with the desired backend ## Building / Updating llama.cpp ### Check Current Version ```bash ./scripts/build_llamacpp.sh -v ``` Shows: - Current Git commit and branch - Build date - Whether behind upstream - Available backends ### Build or Update ```bash # Interactive mode (prompts for backend selection) ./scripts/build_llamacpp.sh -u # Specify backend directly ./scripts/build_llamacpp.sh -u -b vulkan # Vulkan (AMD/Intel GPUs) ./scripts/build_llamacpp.sh -u -b cuda # CUDA (NVIDIA GPUs) ./scripts/build_llamacpp.sh -u -b rocm # ROCm (AMD GPUs) ./scripts/build_llamacpp.sh -u -b cpu # CPU only # Clean rebuild ./scripts/build_llamacpp.sh -c -b vulkan # Custom build directory ./scripts/build_llamacpp.sh -u -b cuda -d /custom/path ``` ### Build Options | Flag | Description | |------|-------------| | `-v` | Show version info and exit | | `-u` | Update to latest from GitHub | | `-c` | Clean build (remove existing) | | `-b` | Backend: vulkan, cuda, rocm, cpu | | `-d` | Build directory path | | `-j` | Parallel jobs (default: CPU count) | ## Finding llama-bench The benchmark script auto-detects llama-bench in these locations: - `/DATA/Benchmark/llama.cpp/build/bin/llama-bench` - `~/Repo/llama.cpp/build/bin/llama-bench` - `~/lab/build/bin/llama-bench` If not found, it will search your home directory or you can build it using the script above.
Related Skills
Compensation & Salary Benchmarking Planner
Build data-driven compensation structures that attract talent without overpaying. Covers base salary bands, equity/bonus frameworks, geographic differentials, and total rewards packaging.
ml-model-eval-benchmark
Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.
benchclaw
BenchClaw 是 OpenClaw Agent 的专业级“安兔兔”评测框架。它专注于对 AI Agent 进行多维度、 自动化的量化评估与能力基准测试,集成了任务分发、精准评分、可视化报表生成及热更新功能。 当需要量化 Agent 的推理规划、响应速度、Token 成本及安全性时使用。 **用户意图/指令**:跑分、跑个分、运行基准测试、评估 Agent 表现、生成评测报告、分析 Token 消耗。 **技术关键词**:跑分、跑个分、Agent 评测、基准测试、自动化打分、量化评估、性能报告、Token 成本、 TPS、OpenClaw。 BenchClaw is the "AnTuTu" for OpenClaw Agents—a professional-grade automated benchmarking framework. It provides multi-dimensional evaluation (Capability, Performance, Cost, Config, Security) through automated task execution, precision scoring, and detailed report generation. **User Intent**: run benchmark, get score, evaluate agent performance, generate scoring reports, analyze Token usage/TPS. **Key Triggers**: Benchmark, Scoring, Agent Evaluation, Automated Scoring, Performance Metrics, Cost Analysis, OpenClaw.
benchmark-lobster-forge
用元认知引导发现值得被做成小龙虾的机会点,并将其收敛为可开箱即用的基准 Agent 小龙虾。
benchmark
Performance regression detection using the browse daemon. Establishes baselines for page load times, Core Web Vitals, and resource sizes. Compares before/after on every PR.
visual-benchmarker
(元技能) 视觉对标视频搜索器,通过指导AI调用其他工具,为项目确认视觉风格。
skill-routing-benchmark
测试多个 Skill 描述是否会路由冲突,并生成正例、反例与负向触发语句。;use for skills, routing, benchmark workflows;do not use for 只给模糊建议, 忽略高度相近的 skill.
receipt-expense-workbench
Normalize receipts, reimbursement slips, and invoices into a clean expense ledger with category mapping and anomaly flags.
quote-invoice-workbench
Turn messy service pricing notes into professional quotes, SOW line items, and invoice drafts with assumptions clearly surfaced.
---
name: article-factory-wechat
humanizer
Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.
find-skills
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.