AI Agent Skill HUB

human_test

Call real humans to test your product (URL or app). Get structured usability feedback with screen recordings, NPS scores, and AI-aggregated findings.

3,891 stars

View on GitHub Installation ↓

Best use case

human_test is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Call real humans to test your product (URL or app). Get structured usability feedback with screen recordings, NPS scores, and AI-aggregated findings.

Teams using human_test should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/human-test/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/avivahe326/human-test/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/human-test/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How human_test Compares

Feature / Agent	human_test	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Call real humans to test your product (URL or app). Get structured usability feedback with screen recordings, NPS scores, and AI-aggregated findings.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

SKILL.md Source

# human_test() — Real Human Feedback for AI Products

AI agents cannot judge human perception, emotion, or usability. This skill lets you call real humans to test any product URL and get structured feedback back.

## What it does

1. You call `human_test()` with a product URL or description (URL is optional — also works for mobile apps, desktop software, etc.)
2. AI auto-generates a structured test plan
3. Real human testers claim the task on the web platform
4. Each tester records their screen and microphone (up to 15 min) while completing a guided feedback flow — first impression, task steps, NPS rating
5. AI extracts key frames from each recording and uses vision AI to analyze usability issues, then aggregates all feedback into a structured report with severity-ranked findings

## Setup

### Option A: Hosted (zero setup)

Use the hosted version at **https://human-test.work** — no installation needed. Register to get an API key, then skip to **Create a test task** below using `BASE_URL=https://human-test.work`.

### Option B: Self-hosted (auto-install)

human_test() can run locally. Before creating a task, check if the server is reachable:

```bash
curl -s BASE_URL/api/config
```

If the server is not running, install and start it:

```bash
npm i -g humantest-app
cd /tmp && humantest init --non-interactive && cd humantest && humantest start
```

This auto-detects AI API keys from your environment (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `DEEPSEEK_API_KEY`, or `GEMINI_API_KEY`), creates a local SQLite database, builds the app, and starts it on port 3000.

A default admin user is created automatically — no registration needed.

**Set `BASE_URL`**: Ask the user once for their preferred base URL. Default: `http://localhost:3000`

## Quick start

### Create a test task

```bash
curl -X POST BASE_URL/api/skill/human-test \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-product.com",
    "focus": "Test the onboarding flow",
    "maxTesters": 5,
    "creator": "agent-name"
  }'
```

Response:
```json
{
  "taskId": "cm...",
  "status": "OPEN",
  "testPlan": { "steps": [...], "nps": true, "estimatedMinutes": 10 }
}
```

### Check progress and get the report

```bash
curl BASE_URL/api/skill/status/<taskId>
```

Response (when completed):
```json
{
  "taskId": "cm...",
  "status": "COMPLETED",
  "submittedCount": 5,
  "report": "## Executive Summary\n...",
  "reportStatus": "COMPLETED",
  "codeFixStatus": "COMPLETED",
  "codeFixPrUrl": "https://github.com/user/repo/pull/1"
}
```

> **Note for agents:** If `repoUrl` was provided, code fix generation starts automatically after the report is ready — no need to trigger it manually. Keep polling until `codeFixStatus` is `COMPLETED` or `FAILED`, or use `codeFixWebhookUrl` to get notified.

## Parameters

| Parameter | Required | Default | Description |
|-----------|----------|---------|-------------|
| `url` | No | — | Product URL to test (optional — leave empty for mobile apps or non-web products) |
| `title` | No | Auto from hostname | Task title |
| `focus` | No | — | What testers should focus on |
| `maxTesters` | No | 5 | Number of testers (1-50) |
| `estimatedMinutes` | No | 10 | Expected test duration |
| `creator` | No | admin | Name of the agent/user creating the task (auto-creates a user if needed) |
| `webhookUrl` | No | — | HTTPS URL to receive the report on completion |
| `codeFixWebhookUrl` | No | — | HTTPS URL to receive code fix results on completion |
| `repoUrl` | No | — | GitHub repo URL for code-level fix suggestions |
| `repoBranch` | No | repo default | Branch to analyze (only used with repoUrl) |
| `locale` | No | `en` | Report language: `en` (English) or `zh` (Chinese) |

## Async webhooks

There are two separate webhooks for the two stages:

### Report webhook (`webhookUrl`)

If you provide a `webhookUrl`, the platform will POST the report to that URL when it's ready:

```json
{
  "event": "report",
  "taskId": "...",
  "status": "COMPLETED",
  "title": "Test: example.com",
  "targetUrl": "https://example.com",
  "report": "## Executive Summary\n...",
  "completedAt": "2026-03-02T12:00:00Z"
}
```

### Code fix webhook (`codeFixWebhookUrl`)

If you provide a `codeFixWebhookUrl`, the platform will POST the code fix result when done:

```json
{
  "event": "code_fix",
  "taskId": "...",
  "status": "COMPLETED",
  "title": "Test: example.com",
  "targetUrl": "https://example.com",
  "codeFixStatus": "COMPLETED",
  "codeFixPrUrl": "https://github.com/user/repo/pull/1",
  "completedAt": "2026-03-02T12:30:00Z"
}
```

## Report format (structured for AI agents)

The report is returned as a markdown string in the `report` field. It uses a **consistent, machine-parseable structure** designed for AI agents to read and act on directly — for example, to automatically file issues, create PRs, or prioritize a fix backlog.

### Section structure

Every report contains these exact sections in order:

```markdown
## Metadata
| Field | Value |
|-------|-------|
| Product | ... |
| URL | ... |
| Testers | N |
| Avg NPS | X.X/10 |

## Executive Summary
(3-5 sentences, most critical finding first)

## Issues
### [CRITICAL] Issue title
- **Evidence:** (specific testers and observations)
- **Impact:** (effect on users)
- **Recommendation:** (actionable fix)

### [MAJOR] Issue title
- **Evidence:** ...
- **Impact:** ...
- **Recommendation:** ...

### [MINOR] Issue title
...

## Positive Highlights
(What worked well)

## NPS Analysis
(Score breakdown, interpretation)

## Recommendations
- **P0** (fix immediately): ... (references issue)
- **P1** (fix this sprint): ...
- **P2** (next sprint): ...
- **P3** (backlog): ...
```

### Parsing tips for agents

- **Severity levels**: `[CRITICAL]`, `[MAJOR]`, `[MINOR]` — always in brackets in issue headers
- **Priority tags**: `P0`, `P1`, `P2`, `P3` — in the Recommendations section
- **Each issue has 3 fields**: Evidence, Impact, Recommendation — always bolded labels
- **Metadata table**: always the first section, machine-readable key-value pairs
- **NPS scores**: appear in Metadata (average) and NPS Analysis (per-tester breakdown)

## Agent auto-fix workflow

The structured report format is designed for a closed-loop workflow: your agent calls `human_test()`, receives the report, and automatically fixes the issues found — no human intervention needed after testing.

### Recommended flow

1. Call `human_test()` with your product URL (include `webhookUrl` to get notified)
2. Wait for the report (poll `/api/skill/status/<taskId>` or receive webhook)
3. Parse the `## Issues` section — each issue has `[SEVERITY]`, Evidence, Impact, and Recommendation
4. For `[CRITICAL]` and `[MAJOR]` issues, use the **Recommendation** field to generate targeted code fixes
5. Create commits or PRs for each fix
6. (Optional) Call `human_test()` again to verify the fixes

Each issue's **Evidence** tells you what went wrong, **Impact** tells you why it matters, and **Recommendation** tells you exactly what to fix. This gives your agent enough context to write a targeted fix without guessing.

## Repo-aware code fix suggestions

If you pass a `repoUrl`, the platform automatically triggers code fix generation as soon as the report is ready. It clones your repo, analyzes the code against reported issues, and produces **file-level code fix suggestions** (with unified diffs) appended to the report as a `## Code Fix Suggestions` section.

### Two modes (auto-detected)

**Mode 1 — Read-only:** Grant GitHub user `avivahe326` read access to your repo. After the report, the platform clones the repo, analyzes the code against reported issues, and appends code-level diffs to the report.

**Mode 2 — Developer access:** Grant `avivahe326` write access. Same as Mode 1, plus: creates a branch `human-test/fixes-<taskId>`, applies the diffs, pushes, and opens a PR. The PR URL is returned in the webhook payload as `codeFixPrUrl` and in the status API.

### Example with repoUrl

```bash
curl -X POST BASE_URL/api/skill/human-test \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-product.com",
    "focus": "Test the checkout flow",
    "repoUrl": "https://github.com/your-org/your-repo",
    "repoBranch": "main",
    "webhookUrl": "https://your-server.com/webhook",
    "codeFixWebhookUrl": "https://your-server.com/code-fix-webhook"
  }'
```

## Links

- Web platform: https://human-test.work
- GitHub: https://github.com/avivahe326/humantest

Related Skills

humanizer

from openclaw/skills

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

Content & Documentation

ai-humanizer

from openclaw/skills

Rewrites AI-generated content to sound natural, human, and undetectable. Removes robotic patterns, adds voice variety, and preserves meaning.

Content & Documentation

human-writing

from openclaw/skills

Guidelines and standards for professional, human-like writing and documentation. Use this skill when generating READMEs, technical documentation, code comments, or any formal written output to avoid common AI 'tells', buzzwords, and stylistic tropes. Ensure content follows the 'Professional Human in the Field' standard: high precision, zero fluff, and no emojis in technical contexts.

Content & Documentation

backtester

from openclaw/skills

Professional backtesting framework for trading strategies. Tests SMA crossover, RSI, MACD, Bollinger Bands, and custom strategies on historical data. Generates equity curves, drawdown analysis, and performance metrics.

Finance & Investing

humanize-chinese

from openclaw/skills

Detect and humanize AI-generated Chinese text. 20+ detection categories, weighted 0-100 scoring with sentence-level analysis, 7 style transforms (casual/zhihu/xiaohongshu/wechat/academic/literary/weibo), sentence restructuring, context-aware replacement. Pure Python, no dependencies. v2.0.0

Content & Documentation

pentest-c2-operator

from openclaw/skills

Set up authorized C2 simulation workflows and measure defensive detection outcomes.

pentest-auth-bypass

from openclaw/skills

Test authentication and session management controls for bypass and account takeover scenarios.

pentest-api-attacker

from openclaw/skills

Test APIs against OWASP API Security Top 10 including discovery, auth abuse, and protocol-specific checks.

pentest-active-directory

from openclaw/skills

Assess Active Directory identity attack paths including roasting, relay, and delegation abuse.

nmap-pentest-scans

from openclaw/skills

Plan and orchestrate authorized Nmap host discovery, port and service enumeration, NSE profiling, and reporting artifacts for in-scope targets.

rust-testing-code-review

from openclaw/skills

Reviews Rust test code for unit test patterns, integration test structure, async testing, mocking approaches, and property-based testing. Use when reviewing _test.rs files,

skill-test-sandbox

from openclaw/skills

将用户给出的任意非技术话题用三行打油诗（每行字数相近、押韵或顺口）进行趣味总结。不调用任何工具。在用户要求测试 Skill、沙盒演示、打油诗总结、或明确说与充电业务无关的玩笑/练习时使用。