prediction-tracking

Track and evaluate AI predictions over time to assess accuracy. Use when reviewing past predictions to determine if they came true, failed, or remain uncertain.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

prediction-tracking is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Track and evaluate AI predictions over time to assess accuracy. Use when reviewing past predictions to determine if they came true, failed, or remain uncertain.

Teams using prediction-tracking should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/prediction-tracking/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/backend/prediction-tracking/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/prediction-tracking/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How prediction-tracking Compares

Feature / Agent	prediction-tracking	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Track and evaluate AI predictions over time to assess accuracy. Use when reviewing past predictions to determine if they came true, failed, or remain uncertain.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Prediction Tracking Skill

Track predictions made by AI researchers and critics, evaluate their accuracy over time.

## Prediction Recording

When recording a new prediction, capture:

### Required Fields
- **text**: The prediction as stated
- **author**: Who made it
- **madeAt**: When it was made
- **timeframe**: When they expect it to happen
- **topic**: What area of AI
- **confidence**: How confident they seemed

### Optional Fields
- **sourceUrl**: Where the prediction was made
- **targetDate**: Specific date if mentioned
- **conditions**: Any caveats or conditions
- **metrics**: How to measure success

## Evaluation Status

When evaluating predictions, assign one of:

### `verified`
Clearly came true as stated.
- The predicted capability/event occurred
- Within the stated timeframe
- Substantially as described

### `falsified`
Clearly did not come true.
- Timeframe passed without occurrence
- Contradictory evidence emerged
- Author retracted or modified claim

### `partially-verified`
Partially accurate.
- Some aspects came true, others didn't
- Capability exists but weaker than claimed
- Timeframe was off but direction correct

### `too-early`
Not enough time has passed.
- Still within stated timeframe
- No definitive evidence either way

### `unfalsifiable`
Cannot be objectively assessed.
- Too vague to measure
- No clear success criteria
- Moved goalposts

### `ambiguous`
Prediction was too vague to evaluate.
- Multiple interpretations possible
- Success criteria unclear

## Evaluation Process

For each prediction being evaluated:

### 1. Restate the prediction
What exactly was claimed?

### 2. Identify timeframe
Has enough time passed to evaluate?

### 3. Gather evidence
What has happened since?
- Relevant releases or announcements
- Benchmark results
- Real-world deployments
- Counter-evidence

### 4. Assess status
Which evaluation status applies?

### 5. Score accuracy
If verifiable, rate 0.0-1.0:
- 1.0: Exactly as predicted
- 0.7-0.9: Substantially correct
- 0.4-0.6: Partially correct
- 0.1-0.3: Mostly wrong
- 0.0: Completely wrong

### 6. Note lessons
What does this tell us about:
- The author's forecasting ability
- The topic's predictability
- Common prediction pitfalls

## Output Format

For evaluation:
```json
{
  "evaluations": [
    {
      "predictionId": "id",
      "status": "verified",
      "accuracyScore": 0.85,
      "evidence": "Description of evidence",
      "notes": "Additional context",
      "evaluatedAt": "timestamp"
    }
  ]
}
```

For accuracy statistics:
```json
{
  "author": "Author name",
  "totalPredictions": 15,
  "verified": 5,
  "falsified": 3,
  "partiallyVerified": 2,
  "pending": 4,
  "unfalsifiable": 1,
  "averageAccuracy": 0.62,
  "topicBreakdown": {
    "reasoning": { "predictions": 5, "accuracy": 0.7 },
    "agents": { "predictions": 3, "accuracy": 0.4 }
  },
  "calibration": "Assessment of how well-calibrated they are"
}
```

## Calibration Assessment

Evaluate whether predictors are well-calibrated:

### Well-Calibrated
- High-confidence predictions usually come true
- Low-confidence predictions have mixed results
- Acknowledges uncertainty appropriately

### Overconfident
- High-confidence predictions often fail
- Rarely expresses uncertainty
- Doesn't update on evidence

### Underconfident
- Low-confidence predictions often come true
- Hedges even on likely outcomes
- Too conservative

### Inconsistent
- Confidence doesn't correlate with accuracy
- Random relationship between stated and actual accuracy

## Tracking Notable Predictors

Keep running assessments of key voices:

| Predictor | Total | Accuracy | Calibration | Notes |
|-----------|-------|----------|-------------|-------|
| Sam Altman | 20 | 55% | Overconfident | Timeline optimism |
| Gary Marcus | 15 | 70% | Well-calibrated | Conservative |
| Dario Amodei | 12 | 65% | Slightly over | Safety-focused |

## Red Flags

Watch for prediction patterns that suggest bias:
- Always bullish regardless of topic
- Never acknowledges failed predictions
- Moves goalposts when wrong
- Predictions align suspiciously with financial interests
- Vague enough to claim credit for anything

Related Skills

asset-tracking

from diegosouzapw/awesome-omni-skill

Use when managing asset metadata, dependencies, and delivery workflows across teams.

analytics-tracking

from diegosouzapw/awesome-omni-skill

（中文）When the user wants to set up, improve, or audit analytics tracking and measurement. Also use when the user mentions "set up tracking," "GA4," "Google Analytics," "conversion tracking," "event tracking," "UTM parameters," "tag manager," "GTM," "analytics implementation," or "tracking plan." For A/B test measurement, see ab-test-setup.

aiwf:error-tracking

from diegosouzapw/awesome-omni-skill

Add Sentry v8 error tracking and performance monitoring to your project services. Use this skill when adding error handling, creating new controllers, instrumenting cron jobs, or tracking database performance. ALL ERRORS MUST BE CAPTURED TO SENTRY - no exceptions.

artifact-tracking

from diegosouzapw/awesome-omni-skill

Token-efficient tracking for AI orchestration. CLI-first for status updates (~50 tokens), agent fallback for complex ops (~1KB). Use when: updating task status, querying blockers, creating progress files, validating phases.

agentic-kpi-tracking

from diegosouzapw/awesome-omni-skill

Track and measure agentic coding KPIs for ZTE progression. Use when measuring workflow effectiveness, tracking Size/Attempts/Streak/Presence metrics, or assessing readiness for autonomous operation.

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

mcp-create-declarative-agent

from diegosouzapw/awesome-omni-skill

Skill converted from mcp-create-declarative-agent.prompt.md

MCP Architecture Expert

from diegosouzapw/awesome-omni-skill

Design and implement Model Context Protocol servers for standardized AI-to-data integration with resources, tools, prompts, and security best practices

mathem-shopping

from diegosouzapw/awesome-omni-skill

Automatiserar att logga in på Mathem.se, söka och lägga till varor från en lista eller recept, hantera ersättningar enligt policy och reservera leveranstid, men lämnar varukorgen redo för manuell checkout.

math-modeling

from diegosouzapw/awesome-omni-skill

本技能应在用户要求"数学建模"、"建模比赛"、"数模论文"、"数学建模竞赛"、"建模分析"、"建模求解"或提及数学建模相关任务时使用。适用于全国大学生数学建模竞赛(CUMCM)、美国大学生数学建模竞赛(MCM/ICM)等各类数学建模比赛。

matchms

from diegosouzapw/awesome-omni-skill

Mass spectrometry analysis. Process mzML/MGF/MSP, spectral similarity (cosine, modified cosine), metadata harmonization, compound ID, for metabolomics and MS data processing.

managing-traefik

from diegosouzapw/awesome-omni-skill

Manages Traefik reverse proxy for local development. Use when routing domains to local services, configuring CORS, checking service health, or debugging connectivity issues.