karpathy-jobs-bls-visualizer
Research tool for visually exploring BLS Occupational Outlook Handbook data with an interactive treemap, LLM-powered scoring pipeline, and data scraping/parsing utilities.
Best use case
karpathy-jobs-bls-visualizer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Research tool for visually exploring BLS Occupational Outlook Handbook data with an interactive treemap, LLM-powered scoring pipeline, and data scraping/parsing utilities.
Teams using karpathy-jobs-bls-visualizer should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/karpathy-jobs-bls-visualizer/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How karpathy-jobs-bls-visualizer Compares
| Feature / Agent | karpathy-jobs-bls-visualizer | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Research tool for visually exploring BLS Occupational Outlook Handbook data with an interactive treemap, LLM-powered scoring pipeline, and data scraping/parsing utilities.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Startups
Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
SKILL.md Source
# karpathy/jobs — BLS Job Market Visualizer
> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.
A research tool for visually exploring Bureau of Labor Statistics [Occupational Outlook Handbook](https://www.bls.gov/ooh/) data across 342 occupations. The interactive treemap colors rectangles by employment size (area) and any chosen metric (color): BLS growth outlook, median pay, education requirements, or LLM-scored AI exposure. The pipeline is fully forkable — write a new prompt, re-run scoring, get a new color layer.
**Live demo:** [karpathy.ai/jobs](https://karpathy.ai/jobs/)
---
## Installation & Setup
```bash
# Clone the repo
git clone https://github.com/karpathy/jobs
cd jobs
# Install dependencies (uses uv)
uv sync
uv run playwright install chromium
```
Create a `.env` file with your OpenRouter API key (required only for LLM scoring):
```bash
OPENROUTER_API_KEY=your_openrouter_key_here
```
---
## Full Pipeline — Key Commands
Run these in order for a complete fresh build:
```bash
# 1. Scrape BLS pages (non-headless Playwright; BLS blocks bots)
# Results cached in html/ — only needed once
uv run python scrape.py
# 2. Convert raw HTML → clean Markdown in pages/
uv run python process.py
# 3. Extract structured fields → occupations.csv
uv run python make_csv.py
# 4. Score AI exposure via LLM (uses OpenRouter API, saves scores.json)
uv run python score.py
# 5. Merge CSV + scores → site/data.json for the frontend
uv run python build_site_data.py
# 6. Serve the visualization locally
cd site && python -m http.server 8000
# Open http://localhost:8000
```
---
## Key Files Reference
| File | Description |
|------|-------------|
| `occupations.json` | Master list of 342 occupations (title, URL, category, slug) |
| `occupations.csv` | Summary stats: pay, education, job count, growth projections |
| `scores.json` | AI exposure scores (0–10) + rationales for all 342 occupations |
| `prompt.md` | All data in one ~45K-token file for pasting into an LLM |
| `html/` | Raw HTML pages from BLS (~40MB, source of truth) |
| `pages/` | Clean Markdown versions of each occupation page |
| `site/index.html` | The treemap visualization (single HTML file) |
| `site/data.json` | Compact merged data consumed by the frontend |
| `score.py` | LLM scoring pipeline — fork this to write custom prompts |
---
## Writing a Custom LLM Scoring Layer
The most powerful feature: write any scoring prompt, run `score.py`, get a new treemap color layer.
### 1. Edit the prompt in `score.py`
```python
# score.py (simplified structure)
SYSTEM_PROMPT = """
You are evaluating occupations for exposure to humanoid robotics over the next 10 years.
Score each occupation from 0 to 10:
- 0 = no meaningful exposure (e.g., requires fine social judgment, non-physical)
- 5 = moderate exposure (some tasks automatable, but humans still central)
- 10 = high exposure (repetitive physical tasks, predictable environments)
Consider: physical task complexity, environment predictability, dexterity requirements,
cost of robot vs human, regulatory barriers.
Respond ONLY with JSON: {"score": <int 0-10>, "rationale": "<1-2 sentences>"}
"""
```
### 2. Run the scoring pipeline
```python
# The pipeline reads each occupation's Markdown from pages/,
# sends it to the LLM, and writes results to scores.json
# scores.json structure:
{
"software-developers": {
"score": 1,
"rationale": "Software development is digital and cognitive; humanoid robots provide no advantage."
},
"construction-laborers": {
"score": 7,
"rationale": "Physical, repetitive outdoor tasks are targets for humanoid robotics, though unstructured environments remain challenging."
}
// ... 342 occupations total
}
```
### 3. Rebuild site data
```bash
uv run python build_site_data.py
cd site && python -m http.server 8000
```
---
## Data Structures
### `occupations.json` entry
```json
{
"title": "Software Developers",
"url": "https://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm",
"category": "Computer and Information Technology",
"slug": "software-developers"
}
```
### `occupations.csv` columns
```
slug, title, category, median_pay, education, job_count, growth_percent, growth_outlook
```
Example row:
```
software-developers, Software Developers, Computer and Information Technology,
130160, Bachelor's degree, 1847900, 17, Much faster than average
```
### `site/data.json` entry (merged frontend data)
```json
{
"slug": "software-developers",
"title": "Software Developers",
"category": "Computer and Information Technology",
"median_pay": 130160,
"education": "Bachelor's degree",
"job_count": 1847900,
"growth_percent": 17,
"growth_outlook": "Much faster than average",
"ai_score": 9,
"ai_rationale": "AI is deeply transforming software development workflows..."
}
```
---
## Frontend Treemap (`site/index.html`)
The visualization is a single self-contained HTML file using D3.js.
### Color layers (toggle in UI)
| Layer | What it shows |
|-------|---------------|
| BLS Outlook | BLS projected growth category (green = fast growth) |
| Median Pay | Annual median wage (color gradient) |
| Education | Minimum education required |
| Digital AI Exposure | LLM-scored 0–10 AI impact estimate |
### Adding a new color layer to the frontend
```html
<!-- In site/index.html, find the layer toggle buttons -->
<button onclick="setLayer('ai_score')">Digital AI Exposure</button>
<!-- Add your new layer button -->
<button onclick="setLayer('robotics_score')">Humanoid Robotics</button>
```
```javascript
// In the colorScale function, add a case for your new field:
function getColor(d, layer) {
if (layer === 'robotics_score') {
// scores 0-10, blue = low exposure, red = high
return d3.interpolateRdYlBu(1 - d.robotics_score / 10);
}
// ... existing cases
}
```
Then update `build_site_data.py` to include your new score field in `data.json`.
---
## Generating the LLM-Ready Prompt File
Package all 342 occupations + aggregate stats into a single file for LLM chat:
```bash
uv run python make_prompt.py
# Produces prompt.md (~45K tokens)
# Paste into Claude, GPT-4, Gemini, etc. for data-grounded conversation
```
---
## Scraping Notes
The BLS blocks automated bots, so `scrape.py` uses **non-headless** Playwright (real visible browser window):
```python
# scrape.py key behavior
browser = await p.chromium.launch(headless=False) # Must be visible
# Pages saved to html/<slug>.html
# Already-scraped pages are skipped (cached)
```
If scraping fails or is rate-limited:
- The `html/` directory already contains cached pages in the repo
- You can skip scraping entirely and run from `process.py` onward
- If re-scraping, add delays between requests to avoid blocks
---
## Common Patterns
### Re-score only missing occupations
```python
import json, os
with open("scores.json") as f:
existing = json.load(f)
with open("occupations.json") as f:
all_occupations = json.load(f)
# Find gaps
missing = [o for o in all_occupations if o["slug"] not in existing]
print(f"Missing scores: {len(missing)}")
# Then run score.py with a filter for missing slugs
```
### Parse a single occupation page manually
```python
from parse_detail import parse_occupation_page
from pathlib import Path
html = Path("html/software-developers.html").read_text()
data = parse_occupation_page(html)
print(data["median_pay"]) # e.g. 130160
print(data["job_count"]) # e.g. 1847900
print(data["growth_outlook"]) # e.g. "Much faster than average"
```
### Load and query occupations.csv
```python
import pandas as pd
df = pd.read_csv("occupations.csv")
# Top 10 highest paying occupations
top_pay = df.nlargest(10, "median_pay")[["title", "median_pay", "growth_outlook"]]
print(top_pay)
# Filter: fast growth + high pay
high_value = df[
(df["growth_percent"] > 10) &
(df["median_pay"] > 80000)
].sort_values("median_pay", ascending=False)
```
### Combine CSV with AI scores for analysis
```python
import pandas as pd, json
df = pd.read_csv("occupations.csv")
with open("scores.json") as f:
scores = json.load(f)
df["ai_score"] = df["slug"].map(lambda s: scores.get(s, {}).get("score"))
df["ai_rationale"] = df["slug"].map(lambda s: scores.get(s, {}).get("rationale"))
# High AI exposure, high pay — reshaping, not disappearing
high_exposure_high_pay = df[
(df["ai_score"] >= 8) &
(df["median_pay"] > 100000)
][["title", "median_pay", "ai_score", "growth_outlook"]]
print(high_exposure_high_pay)
```
---
## Troubleshooting
**`playwright install` fails**
```bash
uv run playwright install --with-deps chromium
```
**BLS scraping blocked / returns empty pages**
- Ensure `headless=False` in `scrape.py` (already the default)
- Add manual delays; do not run in CI
- The cached `html/` directory in the repo can be used directly
**`score.py` OpenRouter errors**
- Verify `OPENROUTER_API_KEY` is set in `.env`
- Check your OpenRouter account has credits
- Default model is Gemini Flash — change `model` in `score.py` for a different LLM
**`site/data.json` not updating after re-scoring**
```bash
# Always rebuild site data after changing scores.json
uv run python build_site_data.py
```
**Treemap shows blank / no data**
- Confirm `site/data.json` exists and is valid JSON
- Serve with `python -m http.server` (not `file://` — CORS blocks local JSON fetch)
- Check browser console for fetch errors
---
## Important Caveats (from the project)
- **AI Exposure ≠ job disappearance.** A score of 9/10 means AI is *transforming* the work, not eliminating demand. Software developers score 9/10 but demand is growing.
- **Scores are rough LLM estimates** (Gemini Flash via OpenRouter), not rigorous economic predictions.
- The tool does **not** account for demand elasticity, latent demand, regulatory barriers, or social preferences for human workers.
- This is a **development/research tool**, not an economic publication.Related Skills
3d-wordcloud-visualizer
3D 词云可视化工具 - 将对话历史或其他文本数据自动转换为炫酷的 3D 地球词云,支持多格式文件导入(JSON/MD/TXT),自动中文分词和词频统计,生成 TOP30 高频词的 3D 可视化效果
mermaid-visualizer
Transform text content into professional Mermaid diagrams for presentations and documentation. Use when users ask to visualize concepts, create flowcharts, or make diagrams from text. Supports process flows, system architectures, comparisons, mindmaps, and more with built-in syntax error prevention.
Jobs Skill
Direct access:
---
name: article-factory-wechat
humanizer
Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.
find-skills
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
tavily-search
Use Tavily API for real-time web search and content extraction. Use when: user needs real-time web search results, research, or current information from the web. Requires Tavily API key.
baidu-search
Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.
agent-autonomy-kit
Stop waiting for prompts. Keep working.
Meeting Prep
Never walk into a meeting unprepared again. Your agent researches all attendees before calendar events—pulling LinkedIn profiles, recent company news, mutual connections, and conversation starters. Generates a briefing doc with talking points, icebreakers, and context so you show up informed and confident. Triggered automatically before meetings or on-demand. Configure research depth, advance timing, and output format. Walking into meetings blind is amateur hour—missed connections, generic small talk, zero leverage. Use when setting up meeting intelligence, researching specific attendees, generating pre-meeting briefs, or automating your prep workflow.
self-improvement
Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Claude ('No, that's wrong...', 'Actually...'), (3) User requests a capability that doesn't exist, (4) An external API or tool fails, (5) Claude realizes its knowledge is outdated or incorrect, (6) A better approach is discovered for a recurring task. Also review learnings before major tasks.
botlearn-healthcheck
botlearn-healthcheck — BotLearn autonomous health inspector for OpenClaw instances across 5 domains (hardware, config, security, skills, autonomy); triggers on system check, health report, diagnostics, or scheduled heartbeat inspection.