prediction-tracking
Track and evaluate AI predictions over time to assess accuracy. Use when reviewing past predictions to determine if they came true, failed, or remain uncertain.
Best use case
prediction-tracking is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Track and evaluate AI predictions over time to assess accuracy. Use when reviewing past predictions to determine if they came true, failed, or remain uncertain.
Teams using prediction-tracking should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/prediction-tracking/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How prediction-tracking Compares
| Feature / Agent | prediction-tracking | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Track and evaluate AI predictions over time to assess accuracy. Use when reviewing past predictions to determine if they came true, failed, or remain uncertain.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Prediction Tracking Skill
Track predictions made by AI researchers and critics, evaluate their accuracy over time.
## Prediction Recording
When recording a new prediction, capture:
### Required Fields
- **text**: The prediction as stated
- **author**: Who made it
- **madeAt**: When it was made
- **timeframe**: When they expect it to happen
- **topic**: What area of AI
- **confidence**: How confident they seemed
### Optional Fields
- **sourceUrl**: Where the prediction was made
- **targetDate**: Specific date if mentioned
- **conditions**: Any caveats or conditions
- **metrics**: How to measure success
## Evaluation Status
When evaluating predictions, assign one of:
### `verified`
Clearly came true as stated.
- The predicted capability/event occurred
- Within the stated timeframe
- Substantially as described
### `falsified`
Clearly did not come true.
- Timeframe passed without occurrence
- Contradictory evidence emerged
- Author retracted or modified claim
### `partially-verified`
Partially accurate.
- Some aspects came true, others didn't
- Capability exists but weaker than claimed
- Timeframe was off but direction correct
### `too-early`
Not enough time has passed.
- Still within stated timeframe
- No definitive evidence either way
### `unfalsifiable`
Cannot be objectively assessed.
- Too vague to measure
- No clear success criteria
- Moved goalposts
### `ambiguous`
Prediction was too vague to evaluate.
- Multiple interpretations possible
- Success criteria unclear
## Evaluation Process
For each prediction being evaluated:
### 1. Restate the prediction
What exactly was claimed?
### 2. Identify timeframe
Has enough time passed to evaluate?
### 3. Gather evidence
What has happened since?
- Relevant releases or announcements
- Benchmark results
- Real-world deployments
- Counter-evidence
### 4. Assess status
Which evaluation status applies?
### 5. Score accuracy
If verifiable, rate 0.0-1.0:
- 1.0: Exactly as predicted
- 0.7-0.9: Substantially correct
- 0.4-0.6: Partially correct
- 0.1-0.3: Mostly wrong
- 0.0: Completely wrong
### 6. Note lessons
What does this tell us about:
- The author's forecasting ability
- The topic's predictability
- Common prediction pitfalls
## Output Format
For evaluation:
```json
{
"evaluations": [
{
"predictionId": "id",
"status": "verified",
"accuracyScore": 0.85,
"evidence": "Description of evidence",
"notes": "Additional context",
"evaluatedAt": "timestamp"
}
]
}
```
For accuracy statistics:
```json
{
"author": "Author name",
"totalPredictions": 15,
"verified": 5,
"falsified": 3,
"partiallyVerified": 2,
"pending": 4,
"unfalsifiable": 1,
"averageAccuracy": 0.62,
"topicBreakdown": {
"reasoning": { "predictions": 5, "accuracy": 0.7 },
"agents": { "predictions": 3, "accuracy": 0.4 }
},
"calibration": "Assessment of how well-calibrated they are"
}
```
## Calibration Assessment
Evaluate whether predictors are well-calibrated:
### Well-Calibrated
- High-confidence predictions usually come true
- Low-confidence predictions have mixed results
- Acknowledges uncertainty appropriately
### Overconfident
- High-confidence predictions often fail
- Rarely expresses uncertainty
- Doesn't update on evidence
### Underconfident
- Low-confidence predictions often come true
- Hedges even on likely outcomes
- Too conservative
### Inconsistent
- Confidence doesn't correlate with accuracy
- Random relationship between stated and actual accuracy
## Tracking Notable Predictors
Keep running assessments of key voices:
| Predictor | Total | Accuracy | Calibration | Notes |
|-----------|-------|----------|-------------|-------|
| Sam Altman | 20 | 55% | Overconfident | Timeline optimism |
| Gary Marcus | 15 | 70% | Well-calibrated | Conservative |
| Dario Amodei | 12 | 65% | Slightly over | Safety-focused |
## Red Flags
Watch for prediction patterns that suggest bias:
- Always bullish regardless of topic
- Never acknowledges failed predictions
- Moves goalposts when wrong
- Predictions align suspiciously with financial interests
- Vague enough to claim credit for anythingRelated Skills
asset-tracking
Use when managing asset metadata, dependencies, and delivery workflows across teams.
analytics-tracking
(中文)When the user wants to set up, improve, or audit analytics tracking and measurement. Also use when the user mentions "set up tracking," "GA4," "Google Analytics," "conversion tracking," "event tracking," "UTM parameters," "tag manager," "GTM," "analytics implementation," or "tracking plan." For A/B test measurement, see ab-test-setup.
aiwf:error-tracking
Add Sentry v8 error tracking and performance monitoring to your project services. Use this skill when adding error handling, creating new controllers, instrumenting cron jobs, or tracking database performance. ALL ERRORS MUST BE CAPTURED TO SENTRY - no exceptions.
artifact-tracking
Token-efficient tracking for AI orchestration. CLI-first for status updates (~50 tokens), agent fallback for complex ops (~1KB). Use when: updating task status, querying blockers, creating progress files, validating phases.
agentic-kpi-tracking
Track and measure agentic coding KPIs for ZTE progression. Use when measuring workflow effectiveness, tracking Size/Attempts/Streak/Presence metrics, or assessing readiness for autonomous operation.
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
mcp-create-declarative-agent
Skill converted from mcp-create-declarative-agent.prompt.md
MCP Architecture Expert
Design and implement Model Context Protocol servers for standardized AI-to-data integration with resources, tools, prompts, and security best practices
mathem-shopping
Automatiserar att logga in på Mathem.se, söka och lägga till varor från en lista eller recept, hantera ersättningar enligt policy och reservera leveranstid, men lämnar varukorgen redo för manuell checkout.
math-modeling
本技能应在用户要求"数学建模"、"建模比赛"、"数模论文"、"数学建模竞赛"、"建模分析"、"建模求解"或提及数学建模相关任务时使用。适用于全国大学生数学建模竞赛(CUMCM)、美国大学生数学建模竞赛(MCM/ICM)等各类数学建模比赛。
matchms
Mass spectrometry analysis. Process mzML/MGF/MSP, spectral similarity (cosine, modified cosine), metadata harmonization, compound ID, for metabolomics and MS data processing.
managing-traefik
Manages Traefik reverse proxy for local development. Use when routing domains to local services, configuring CORS, checking service health, or debugging connectivity issues.