llamacpp-bench

Run llama.cpp benchmarks on GGUF models to measure prompt processing (pp) and token generation (tg) performance. Use when the user wants to benchmark LLM models, compare model performance, test inference speed, or run llama-bench on GGUF files. Supports Vulkan, CUDA, ROCm, and CPU backends.

3,891 stars

byopenclaw

View on GitHub Installation ↓

Best use case

llamacpp-bench is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using llamacpp-bench should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/llamacpp-bench/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/alexhegit/llamacpp-bench/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/llamacpp-bench/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How llamacpp-bench Compares

Feature / Agent	llamacpp-bench	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for ChatGPT

Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

SKILL.md Source

# llamacpp-bench

Run standardized benchmarks on GGUF models using llama.cpp's `llama-bench` tool.

## Quick Start

```bash
# Basic benchmark
llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99

# With specific backend
LLAMA_BACKEND=vulkan llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99
```

## Benchmark Parameters

| Parameter | Description | Default |
|-----------|-------------|---------|
| `-m` | Model path (GGUF file) | required |
| `-p` | Prompt sizes to test | 512 |
| `-n` | Generation lengths to test | 128 |
| `-ngl` | GPU layers to offload | 99 |
| `-t` | CPU threads | auto |
| `-dev` | Device selection | auto |

## Standard Test Suite

For consistent comparisons across models, use:

```bash
-p 512,1024,2048 -n 128,256 -ngl 99
```

This tests:
- **Prompt processing**: 512, 1024, 2048 tokens
- **Token generation**: 128, 256 tokens

## Interpreting Results

| Metric | Meaning | Good Performance |
|--------|---------|------------------|
| `pp512` | Prompt processing speed at 512 tokens | >1000 t/s |
| `pp1024` | Prompt processing speed at 1024 tokens | >1000 t/s |
| `pp2048` | Prompt processing speed at 2048 tokens | >1000 t/s |
| `tg128` | Token generation speed (128 tokens) | >50 t/s |
| `tg256` | Token generation speed (256 tokens) | >50 t/s |

## Backend Selection

llama-bench auto-detects available backends. Priority order:
1. CUDA (NVIDIA GPUs)
2. ROCm (AMD GPUs)
3. Vulkan (cross-platform GPU)
4. CPU (fallback)

To force a backend, set environment variable or check build:
```bash
# Check available backends
llama-bench --help | grep -i "backend\|cuda\|rocm\|vulkan"
```

## Batch Benchmarking

Use the provided script for benchmarking multiple models:

```bash
./scripts/benchmark_models.sh /path/to/models/*.gguf
```

## Saving Results

Output can be redirected to a file:
```bash
llama-bench -m model.gguf -p 512,1024,2048 -n 128,256 -ngl 99 > results.txt
```

Or use the benchmark script which auto-saves to timestamped files.

## Common Issues

1. **Out of memory**: Reduce `-ngl` (GPU layers) or test smaller prompt sizes
2. **Slow CPU performance**: Ensure `-t` matches CPU core count
3. **Backend not found**: Check llama.cpp was built with the desired backend

## Building / Updating llama.cpp

### Check Current Version

```bash
./scripts/build_llamacpp.sh -v
```

Shows:
- Current Git commit and branch
- Build date
- Whether behind upstream
- Available backends

### Build or Update

```bash
# Interactive mode (prompts for backend selection)
./scripts/build_llamacpp.sh -u

# Specify backend directly
./scripts/build_llamacpp.sh -u -b vulkan   # Vulkan (AMD/Intel GPUs)
./scripts/build_llamacpp.sh -u -b cuda     # CUDA (NVIDIA GPUs)
./scripts/build_llamacpp.sh -u -b rocm     # ROCm (AMD GPUs)
./scripts/build_llamacpp.sh -u -b cpu      # CPU only

# Clean rebuild
./scripts/build_llamacpp.sh -c -b vulkan

# Custom build directory
./scripts/build_llamacpp.sh -u -b cuda -d /custom/path
```

### Build Options

| Flag | Description |
|------|-------------|
| `-v` | Show version info and exit |
| `-u` | Update to latest from GitHub |
| `-c` | Clean build (remove existing) |
| `-b` | Backend: vulkan, cuda, rocm, cpu |
| `-d` | Build directory path |
| `-j` | Parallel jobs (default: CPU count) |

## Finding llama-bench

The benchmark script auto-detects llama-bench in these locations:
- `/DATA/Benchmark/llama.cpp/build/bin/llama-bench`
- `~/Repo/llama.cpp/build/bin/llama-bench`
- `~/lab/build/bin/llama-bench`

If not found, it will search your home directory or you can build it using the script above.

Related Skills

Compensation & Salary Benchmarking Planner

3891

from openclaw/skills

Build data-driven compensation structures that attract talent without overpaying. Covers base salary bands, equity/bonus frameworks, geographic differentials, and total rewards packaging.

HR & Compensation Management

ml-model-eval-benchmark

3891

from openclaw/skills

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

Machine Learning

benchclaw

3891

from openclaw/skills

BenchClaw 是 OpenClaw Agent 的专业级“安兔兔”评测框架。它专注于对 AI Agent 进行多维度、自动化的量化评估与能力基准测试，集成了任务分发、精准评分、可视化报表生成及热更新功能。当需要量化 Agent 的推理规划、响应速度、Token 成本及安全性时使用。 **用户意图/指令**：跑分、跑个分、运行基准测试、评估 Agent 表现、生成评测报告、分析 Token 消耗。 **技术关键词**：跑分、跑个分、Agent 评测、基准测试、自动化打分、量化评估、性能报告、Token 成本、 TPS、OpenClaw。 BenchClaw is the "AnTuTu" for OpenClaw Agents—a professional-grade automated benchmarking framework. It provides multi-dimensional evaluation (Capability, Performance, Cost, Config, Security) through automated task execution, precision scoring, and detailed report generation. **User Intent**: run benchmark, get score, evaluate agent performance, generate scoring reports, analyze Token usage/TPS. **Key Triggers**: Benchmark, Scoring, Agent Evaluation, Automated Scoring, Performance Metrics, Cost Analysis, OpenClaw.

benchmark-lobster-forge

3891

from openclaw/skills

用元认知引导发现值得被做成小龙虾的机会点，并将其收敛为可开箱即用的基准 Agent 小龙虾。

benchmark

3891

from openclaw/skills

Performance regression detection using the browse daemon. Establishes baselines for page load times, Core Web Vitals, and resource sizes. Compares before/after on every PR.

visual-benchmarker

3891

from openclaw/skills

(元技能) 视觉对标视频搜索器，通过指导AI调用其他工具，为项目确认视觉风格。

skill-routing-benchmark

3891

from openclaw/skills

测试多个 Skill 描述是否会路由冲突，并生成正例、反例与负向触发语句。；use for skills, routing, benchmark workflows；do not use for 只给模糊建议, 忽略高度相近的 skill.

receipt-expense-workbench

3891

from openclaw/skills

Normalize receipts, reimbursement slips, and invoices into a clean expense ledger with category mapping and anomaly flags.

quote-invoice-workbench

3891

from openclaw/skills

Turn messy service pricing notes into professional quotes, SOW line items, and invoice drafts with assumptions clearly surfaced.

---

3891

from openclaw/skills

name: article-factory-wechat

Content & Documentation

humanizer

3891

from openclaw/skills

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

Content & Documentation

find-skills

3891

from openclaw/skills

Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.

General Utilities

llamacpp-bench

Best use case

When to use this skill

When not to use this skill

Installation

How llamacpp-bench Compares

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

Related Guides

AI Agents for Coding

Best AI Skills for ChatGPT

Best AI Skills for Claude

SKILL.md Source

Related Skills

Compensation & Salary Benchmarking Planner

ml-model-eval-benchmark

benchclaw

benchmark-lobster-forge

benchmark

visual-benchmarker

skill-routing-benchmark

receipt-expense-workbench

quote-invoice-workbench

﻿---

humanizer

find-skills

---