education-data-source-pseo
PSEO — Census data linking graduates to employment via LEHD wage records. Earnings percentiles at 1/5/10 years post-graduation by institution, degree, CIP. Use for graduate earnings analysis. Coverage: ~29% of graduates from ~31 states.
Best use case
education-data-source-pseo is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
PSEO — Census data linking graduates to employment via LEHD wage records. Earnings percentiles at 1/5/10 years post-graduation by institution, degree, CIP. Use for graduate earnings analysis. Coverage: ~29% of graduates from ~31 states.
Teams using education-data-source-pseo should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/education-data-source-pseo/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How education-data-source-pseo Compares
| Feature / Agent | education-data-source-pseo | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
PSEO — Census data linking graduates to employment via LEHD wage records. Earnings percentiles at 1/5/10 years post-graduation by institution, degree, CIP. Use for graduate earnings analysis. Coverage: ~29% of graduates from ~31 states.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# PSEO Data Source Reference
Postsecondary Employment Outcomes (PSEO) — Census Bureau experimental statistics linking college graduates to employment outcomes via UI wage records (LEHD program). Covers earnings (25th/50th/75th percentile, measured 1, 5, and 10 years post-graduation) and employment flows by institution, degree level, and CIP field. Use when comparing graduate earnings across programs or institutions, analyzing industry entry patterns, or studying geographic migration of graduates. Coverage limited to ~29% of graduates from ~31 participating states.
Postsecondary Employment Outcomes (PSEO) is an experimental data product from the U.S. Census Bureau that links college graduate records to national employment data, providing earnings and employment outcomes by institution, degree level, and field of study.
> **CRITICAL: Value Encoding**
>
> This document describes **Education Data Portal** integer encodings, which differ from Census API string codes. The Portal converts categorical variables to integers for consistency.
>
> | Context | Baccalaureate | Associates | Masters | Census Division Pacific |
> |---------|---------------|------------|---------|-------------------------|
> | **Portal (integers)** | `5` | `3` | `7` | `9` |
> | Census API (strings) | `05` | `03` | `07` | `9` |
>
> **Key differences:** Degree level uses simple integers (1-10), not string codes like "1C", "05". CIP codes are 2-digit integers (11 for Computer Science), not strings like "11.01".
>
> See `./references/variable-definitions.md` for complete encoding tables.
## What is PSEO?
- **Producer**: U.S. Census Bureau, LEHD program (Longitudinal Employer-Household Dynamics)
- **Coverage**: ~29% of all U.S. college graduates from 31 states + D.C. + Western Governors University
- **Content**: Links university transcript data with national UI wage records to track graduate employment outcomes
- **Two data types**: Graduate Earnings (percentile earnings) and Employment Flows (industry/geography)
- **Frequency**: Updated periodically; cohorts span 3-year (Bachelor's) or 5-year (all others) windows
- **Primary identifiers**: `unitid` (IPEDS Unit ID, integer), `opeid` (integer in Portal data)
- **Privacy method**: Differential privacy mechanisms protect individual data
- **Available through**: Education Data Portal mirrors (restructured from Census Bureau LEHD format with integer encodings and lowercase variable names)
## Reference File Structure
| File | Purpose | When to Read |
|------|---------|--------------|
| `lehd-methodology.md` | How LEHD produces tabulations, data matching process | Understanding data creation |
| `earnings-data.md` | Percentile earnings, cohort definitions, labor attachment | Analyzing graduate earnings |
| `geographic-flows.md` | Where graduates work by Census Division | Studying migration patterns |
| `industry-flows.md` | What industries graduates enter by NAICS sector | Career pathway analysis |
| `variable-definitions.md` | All variables, codes, and status flags | Building queries or interpreting values |
| `state-coverage.md` | Participating states, coverage rates, data partners | Understanding limitations |
## Decision Trees
### What type of outcome am I researching?
```
Graduate outcomes research?
├─ Earnings by program/institution
│ ├─ Median earnings → `p50_earnings` column, filter by `years_after_grad`
│ ├─ Earnings distribution → `p25_earnings`/`p50_earnings`/`p75_earnings`
│ └─ See ./references/earnings-data.md
├─ Where graduates work (geography)
│ ├─ Census Division of employment → `census_division` column
│ ├─ In-state vs out-of-state → `employed_instate_grads_count`
│ └─ See ./references/geographic-flows.md
├─ What industries graduates enter
│ ├─ NAICS sector employment → `industry` column (String)
│ └─ See ./references/industry-flows.md
└─ How many graduates are employed
├─ Employment counts → `employed_grads_count_f`
├─ Non-employed/marginal → `jobless_m_emp_grads_count`
└─ See ./references/variable-definitions.md
```
### What degree level am I researching?
```
Degree level?
├─ Certificate (<1 year) → degree_level=1
├─ Certificate (1-2 years) → degree_level=2
├─ Certificate (2-4 years) → degree_level=4
├─ Associate's → degree_level=3
├─ Bachelor's → degree_level=5 (default, 3-year cohorts)
├─ Post-Bacc Certificate → degree_level=6
├─ Master's → degree_level=7 (2-digit CIP only)
├─ Post-Masters Certificate → degree_level=8
├─ Doctoral-Research → degree_level=9 (2-digit CIP only)
└─ Doctoral-Professional Practice → degree_level=10
```
> **Note:** Portal uses integers 1-10. Census Bureau source data uses string codes like "05", "1C" -- these do not appear in Portal data.
### Is my institution/state covered?
```
Checking data availability?
├─ Which states participate → ./references/state-coverage.md
├─ Which institutions have data → Check PSEO Explorer or mirror data
├─ Coverage rate for state → ./references/state-coverage.md
└─ Why data might be missing
├─ Institution not partnered
├─ Cell suppressed (count < 30)
└─ Insufficient labor force attachment
```
## Quick Reference: PSEO Variables
### Earnings Variables
| Portal Variable | Description |
|-----------------|-------------|
| `p25_earnings` | 25th percentile earnings (2022 dollars) |
| `p50_earnings` | Median earnings (2022 dollars) |
| `p75_earnings` | 75th percentile earnings (2022 dollars) |
| `years_after_grad` | Years post-graduation: `1`, `5`, or `10` |
| `employed_grads_count_e` | Graduate count with earnings data |
| `total_grads_count` | Total IPEDS-reported graduates |
### Flows Variables
| Portal Variable | Description |
|-----------------|-------------|
| `employed_grads_count_f` | Employed graduates count |
| `employed_instate_grads_count` | Employed in institution's state |
| `jobless_m_emp_grads_count` | Non-employed or marginally employed |
| `industry` | 2-digit NAICS sector (String, e.g., `"54"`, `"31-33"`) |
| `census_division` | Census Division of employment (1-9, 99) |
> **Note:** Portal uses restructured schema with `years_after_grad` column instead of Census API's `Y1_*/Y5_*/Y10_*` naming. The `industry` column is String type because some NAICS sectors span ranges (e.g., `"31-33"` for Manufacturing, `"44-45"` for Retail Trade).
### Key Identifiers
| ID | Format | Level | Example | Notes |
|----|--------|-------|---------|-------|
| `unitid` | Integer | Institution | `100751` | IPEDS Unit ID (University of Alabama) |
| `opeid` | Integer | Institution | `105100` | Portal stores as integer (Census uses 8-digit zero-padded string) |
| `fips` | Integer | State | `48` | State of institution (Texas) |
| `cipcode` | 2-digit integer | Field of study | `11` | Computer Science; Portal uses integers, not "11.01" |
### Key Filters (Portal Integer Encoding)
| Parameter | Description | Example |
|-----------|-------------|---------|
| `degree_level` | Degree type integer | `5` (Bachelor's) |
| `pseo_cohort` | Graduation cohort | `"2016-2020"` or `"2019-2021"` (string format, full year range) |
| `years_after_grad` | Years post-graduation | `1`, `5`, or `10` |
### Cohort Definitions
| Degree Level | Cohort Years | Example Cohorts |
|--------------|--------------|-----------------|
| Bachelor's | 3-year | `"2001-2003"`, `"2004-2006"`, `"2007-2009"`, `"2010-2012"`, `"2013-2015"`, `"2016-2018"`, `"2019-2021"` |
| All others | 5-year | `"2001-2005"`, `"2006-2010"`, `"2011-2015"`, `"2016-2020"` |
### Missing Data Codes
| Code | Meaning | When Used |
|------|---------|-----------|
| `-1` | Missing/not reported | Primary missing data indicator; very common in earnings and flows columns |
| `-3` | Suppressed | Cell count < 30 graduates (differential privacy suppression) |
| `-2` | Not applicable | Item doesn't apply to this entity (Portal convention) |
> **Note:** PSEO data has **no null values** in the parquet files. All missing/suppressed data uses integer codes (`-1`, `-3`). Filter with `pl.col("p50_earnings") > 0` to get valid earnings, not `.is_not_null()`. PSEO uses differential privacy rather than traditional suppression. Cells with fewer than 30 graduates are suppressed entirely (coded as `-3`). Earnings values coded `-1` may indicate insufficient labor force attachment.
## Data Access
Datasets for PSEO are available via the Education Data Portal mirror system. See `datasets-reference.md` for canonical paths, `mirrors.yaml` for mirror configuration, and `fetch-patterns.md` for fetch code patterns.
| Dataset | Type | Path | Codebook |
|---------|------|------|----------|
| Earnings and Flows | Yearly (2001-2021) | `pseo/colleges_pseo_{year}` | `pseo/codebook_colleges_pseo` |
Codebooks are `.xls` files co-located with data in all mirrors. Use `get_codebook_url()` from `fetch-patterns.md` to construct download URLs.
> **Truth Hierarchy:** When interpreting variable values, apply this priority:
> 1. **Actual data file** (what you observe in the parquet/CSV) -- this IS the truth
> 2. **Live codebook** (.xls in mirror) -- authoritative documentation, may lag
> 3. **This skill documentation** -- convenient summary, may drift from codebook
>
> If this documentation contradicts the codebook, trust the codebook. If the codebook contradicts observed data, trust the data and investigate.
### Fetching PSEO Data
```python
import polars as pl
# PSEO is a yearly dataset -- fetch individual years
df = fetch_yearly_from_mirrors(
path_template="pseo/colleges_pseo_{year}",
years=[2018, 2019, 2020],
)
# Or fetch a single year
df = fetch_from_mirrors("pseo/colleges_pseo_2020")
```
### Filtering
```python
# Filter by institution
df.filter(pl.col("unitid") == 100751) # University of Alabama
# Filter by field of study
df.filter(pl.col("cipcode") == 11) # Computer Science
# Filter by cohort (note: full year range format)
df.filter(pl.col("pseo_cohort") == "2019-2021")
# Earnings rows only (exclude missing/suppressed)
df.filter(pl.col("p50_earnings") > 0)
# Filter by industry (String column, not integer)
df.filter(pl.col("industry") == "54") # Professional Services
```
### Additional Access Methods (Census Bureau Source)
1. **PSEO Explorer**: Interactive visualization tool at `https://lehd.ces.census.gov/data/pseo_explorer.html`
2. **Census bulk download**: CSV/XLS files at `https://lehd.ces.census.gov/data/pseo/`
3. **Census API**: `https://api.census.gov/data/timeseries/pseo/earnings` and `.../flows` (uses different variable naming and string codes; not used in this system)
## Common Pitfalls
| Pitfall | Issue | Solution |
|---------|-------|----------|
| Using Census string codes | Portal uses integers (e.g., `5` for Bachelor's), not Census strings (`"05"`) | Always check encoding; see variable-definitions.md |
| Ignoring suppression | Cells with <30 graduates are suppressed; missing data looks like no program exists | Check `total_grads_count` to confirm cell exists; null earnings may mean suppression |
| Cross-institution comparison without controlling degree/CIP | Institutions offer different program mixes; aggregate comparison is misleading | Always filter to same `degree_level` and `cipcode` when comparing institutions |
| Treating PSEO as comprehensive | Only ~29% of graduates covered; participating states differ systematically | Acknowledge selection bias; do not generalize to all U.S. graduates |
| Ignoring labor attachment | Workers need 3+ quarters above minimum wage threshold to appear in earnings data | Some graduates are employed but excluded; note this limitation |
| Treating Portal opeid as string | Portal stores `opeid` as integer (e.g., `105100`), not Census's 8-digit zero-padded string (`"00105100"`) | Use integer comparison in Portal data; only Census API uses string format |
| Mixing cohort spans | Bachelor's uses 3-year cohorts; all others use 5-year | Filter by `degree_level` first, then verify cohort format matches |
| Assuming inflation comparability | All earnings are in 2022 CPI-U dollars | No manual inflation adjustment needed; values are already real dollars |
## PSEO vs Other Data Sources
| Feature | PSEO | College Scorecard | State Systems |
|---------|------|-------------------|---------------|
| Coverage | Graduates only | All enrollees | Graduates only |
| Geographic scope | National (cross-state) | National | In-state only |
| Sample | All graduates from partners | Federal aid recipients | All graduates |
| Earnings detail | 25th/50th/75th percentile | Median only | Varies |
| Industry data | Yes (NAICS sector) | No | Varies |
| Geographic flows | Yes (Census Division) | No | No |
| Privacy method | Differential privacy | Traditional suppression | Varies |
## Common Use Cases
| Use Case | Data Needed | Key Considerations |
|----------|-------------|-------------------|
| Compare programs within institution | Earnings by CIPCODE | Check cell counts for suppression |
| Compare institutions for same program | Earnings by INSTITUTION | Ensure same degree level and CIP |
| Analyze brain drain/retention | Flows by division + in-state | Only 9 Census Divisions |
| Career pathway analysis | Flows by NAICS sector | 2-digit NAICS only |
| ROI by degree level | Earnings across DEGREE_LEVEL | Different cohort spans |
## Important Limitations
1. **Experimental status**: Not official Census statistics; methodology may change
2. **Partial coverage**: Only ~29% of graduates from participating institutions
3. **Selection bias**: Participating states/institutions may differ systematically
4. **Employment coverage**: Excludes self-employed, independent contractors, military, some federal
5. **Labor attachment requirement**: Workers must have 3+ quarters of earnings above minimum wage threshold
6. **Suppression**: Cells with fewer than 30 graduates are suppressed
7. **Earnings inflation-adjusted**: All earnings in 2022 dollars (CPI-U)
## Related Data Sources
| Source | Relationship | When to Use |
|--------|--------------|-------------|
| `education-data-source-scorecard` | Alternative earnings source (median only, all enrollees) | When PSEO coverage is insufficient or need non-graduate outcomes |
| `education-data-source-ipeds` | Institution characteristics, enrollment, graduation rates | Contextualizing PSEO institutions; join on `unitid` |
| `education-data-explorer` | Parent discovery skill | Finding available endpoints |
| `education-data-query` | Data fetching | Downloading parquet/CSV files |
## Topic Index
| Topic | Reference File |
|-------|---------------|
| LEHD program overview | `./references/lehd-methodology.md` |
| Data matching process | `./references/lehd-methodology.md` |
| Differential privacy | `./references/lehd-methodology.md` |
| Percentile earnings | `./references/earnings-data.md` |
| Labor force attachment | `./references/earnings-data.md` |
| Cohort definitions | `./references/earnings-data.md` |
| Census Division employment | `./references/geographic-flows.md` |
| In-state employment | `./references/geographic-flows.md` |
| NAICS sector employment | `./references/industry-flows.md` |
| Industry code reference | `./references/industry-flows.md` |
| Variable names and codes | `./references/variable-definitions.md` |
| Status flags | `./references/variable-definitions.md` |
| State participation | `./references/state-coverage.md` |
| Coverage rates | `./references/state-coverage.md` |
| Data partners | `./references/state-coverage.md` |
| Mirror-based data download | Data Access section above |
| Bulk data download | Data Access section above |Related Skills
election-data-source-countypres
County Presidential Returns 2000-2024 (MIT MEDSL). Vote shares, party trends, turnout by county_fips (joins census/education data). Requires HARVARD_DATAVERSE_API_KEY. Critical: mode='TOTAL' drops ~1K counties post-2020 — use 3-pattern reconstruction
education-data-source-scorecard
College Scorecard — post-enrollment outcomes linking aid records to IRS/Treasury earnings. Earnings, loan repayment, debt via six Portal sub-datasets. Use when tax-record-based earnings needed. Tracks only Title IV aid recipients, not all students.
education-data-source-saipe
SAIPE — annual Census poverty estimates for school districts (Portal; county/state not in Portal). Use for district poverty, Title I context, or trends. ~18-month lag. No race/ethnicity disaggregation at district level — use ACS 5-year for that.
education-data-source-nhgis
NHGIS — census geography crosswalks via Portal: links schools (ncessch) and colleges (unitid) to tracts, block groups, CBSAs (1990-2020). Census demographics NOT in Portal — access NHGIS directly. Use for linking education data to census geography.
education-data-source-nccs
NCCS — Form 990 data for private nonprofit colleges (Portal: IPEDS-matched, 1993-2016). Revenue, expenses, assets, endowment, governance beyond IPEDS. Use when IRS financial depth needed. Portal ends 2016; public institutions excluded (no Form 990).
education-data-source-nacubo
NACUBO endowment data (~650 institutions, 2012-2022). Portal: 7 columns only (total endowment, per-FTE, YoY change). Use for endowment size/trends. Full investment/spending needs direct NACUBO access. For all-institution coverage use IPEDS finance.
education-data-source-meps
MEPS — Urban Institute modeled school-level poverty (% at 100% FPL), from CCD + SAIPE (public schools, 2009-2022, 2-3yr lag). Use when FRPL is unreliable due to CEP. Consistent cross-state measurement. Public schools only.
education-data-source-ipeds
IPEDS — primary federal postsecondary data (~6,500 institutions, 1980-present): enrollment, completions, graduation rates, finance, aid, admissions, HR. For college/university analysis. Grad rates = first-time full-time; finance needs GASB/FASB care.
education-data-source-fsa
FSA — Title IV aid at institution level (~5,500 institutions, 1999-2021). Pell Grants, Direct/PLUS loans, campus-based aid, financial responsibility scores, 90/10 metrics. Use for aid distribution, loan volume, or for-profit analysis. By unitid.
education-data-source-edfacts
EDFacts — K-12 outcomes: assessment proficiency, ACGR graduation rates, ESSA accountability at school/district level (2009-2020). Within-state trends and subgroup gaps. Complements CCD with outcome data. Cannot compare across states — use NAEP.
education-data-source-eada
EADA — college athletics gender equity (~2,000+ institutions, 2002-2021). Participation, coaching, salaries, expenses, revenues, athletic aid by gender. Not Title IX compliance data. No sector column; join IPEDS on unitid for institution type.
education-data-source-crdc
CRDC — biennial OCR survey of all U.S. public schools (2011-2021). Discipline, course access, harassment, restraint/seclusion by race/sex/disability/EL. Use for civil rights and equity analysis. 2020-21 COVID-impacted; 2011-14 sampled, not universe.