education-data-source-edfacts

EDFacts — K-12 outcomes: assessment proficiency, ACGR graduation rates, ESSA accountability at school/district level (2009-2020). Within-state trends and subgroup gaps. Complements CCD with outcome data. Cannot compare across states — use NAEP.

160 stars

byDAAF-Contribution-Community

View on GitHub Installation ↓

Best use case

education-data-source-edfacts is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using education-data-source-edfacts should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/education-data-source-edfacts/SKILL.md --create-dirs "https://raw.githubusercontent.com/DAAF-Contribution-Community/daaf/main/.claude/skills/education-data-source-edfacts/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/education-data-source-edfacts/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How education-data-source-edfacts Compares

Feature / Agent	education-data-source-edfacts	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# EDFacts Data Source Reference

EDFacts — federal K-12 outcome data from State Education Agencies, covering state assessment proficiency rates, ACGR graduation rates, and ESSA accountability indicators at school and district level (assessments 2009-2020, graduation rates 2010-2019). Use when analyzing within-state achievement trends, subgroup proficiency gaps, or adjusted cohort graduation rates. Complements CCD (school characteristics) with outcome data. State assessment scores CANNOT be compared across states; use NAEP for cross-state comparisons.

EDFacts is the U.S. Department of Education's centralized data collection system for pre-K through grade 12 education data from State Education Agencies (SEAs). It provides state assessment proficiency rates, graduation rates, and accountability indicators — the authoritative federal source for state-level K-12 outcome data.

> **CRITICAL: Value Encoding**
>
> The Urban Institute Education Data Portal converts NCES string codes (e.g., `ALL`, `CWD`, `LEP`) to **integer codes**. Always verify actual data values before filtering — do not rely on documentation labels alone.
>
> | Context | Subgroup "All" | English Learner | Sex "Male" |
> |---------|----------------|-----------------|------------|
> | **Portal integer** | `99` | `1` | `1` |
> | NCES string | `ALL` | `LEP` | `M` |
>
> See `./references/variable-definitions.md` for complete encoding tables.

## What is EDFacts?

- **Collector**: U.S. Department of Education, via State Education Agencies (SEAs)
- **Coverage**: All public schools and districts in 50 states + DC
- **Content**: State assessment proficiency rates, ACGR graduation rates, participation rates, accountability indicators
- **Frequency**: Annual collection
- **Available years**: Assessments 2009-10 to present; Graduation rates 2010-11 to present
- **Primary identifiers**: `ncessch` (school ID, Int64), `leaid` (district ID, Int64), `fips` (state FIPS code, Int64)
- **Key limitation**: State assessment scores CANNOT be compared across states (different tests, different cut scores)
- **Available through**: Education Data Portal mirrors

## Reference File Structure

| File | Purpose | When to Read |
|------|---------|--------------|
| `accountability-context.md` | ESSA, NCLB history, accountability systems | Understanding policy context |
| `assessment-data.md` | Proficiency levels, test scores, limitations | Working with assessment data |
| `graduation-rates.md` | ACGR methodology, cohort definitions | Analyzing graduation data |
| `variable-definitions.md` | Key variables, suppression codes, special values | Interpreting specific variables |
| `data-quality.md` | Known issues, state variations, COVID impacts | Data cleaning, limitations |
| `subgroup-reporting.md` | Special populations, disaggregation | Analyzing by student groups |

## Decision Trees

### What type of analysis?

```
What EDFacts data do you need?
├─ Assessment/proficiency data
│   ├─ Within-state trends → Valid analysis
│   ├─ Cross-state comparison → INVALID - use NAEP instead
│   └─ Subgroup gaps → See ./references/subgroup-reporting.md
├─ Graduation rates (ACGR)
│   ├─ Understand methodology → See ./references/graduation-rates.md
│   ├─ Extended rates (5-year, 6-year) → See ./references/graduation-rates.md
│   └─ Subgroup rates → See ./references/subgroup-reporting.md
├─ Understanding variables
│   ├─ Missing/suppressed values → See ./references/variable-definitions.md
│   ├─ Range vs. exact values → See ./references/variable-definitions.md
│   └─ Subgroup codes → See ./references/subgroup-reporting.md
└─ Data quality concerns
    ├─ COVID-19 impacts (2019-20) → See ./references/data-quality.md
    ├─ State reporting changes → See ./references/data-quality.md
    └─ Suppression rates → See ./references/data-quality.md
```

### Is my comparison valid?

```
What are you comparing?
├─ Same state, different years
│   ├─ Same assessment system? → Valid
│   └─ Different tests? → Break in time series
├─ Schools within same state → Valid
├─ Districts within same state → Valid
├─ Subgroups within same school → Valid (check suppression)
├─ Different states
│   ├─ Proficiency rates → INVALID
│   ├─ Graduation rates (ACGR) → More comparable
│   └─ Use NAEP instead → Valid
└─ National ranking by proficiency → INVALID
```

## Quick Reference: EDFacts Data Elements

### Assessment Data

| Data Element | Description | Available Years |
|--------------|-------------|-----------------|
| Proficiency rates | % meeting state standards in reading/math | 2009-10 to present |
| Participation rates | % of students assessed | 2012-13 to present |
| Achievement levels | Below Basic, Basic, Proficient, Advanced | Varies by state |
| Grade levels | Grades 3-8, high school (varies) | 2009-10 to present |

### Graduation Data

| Data Element | Description | Available Years |
|--------------|-------------|-----------------|
| 4-year ACGR | Adjusted Cohort Graduation Rate | 2010-11 to present |
| 5-year ACGR | Extended graduation rate | 2011-12 to present |
| 6-year ACGR | Further extended rate | 2012-13 to present |
| Diploma types | Regular diploma only in ACGR | All years |

### Key Identifiers

> **Portal Data Types:** All identifiers are **Int64** in the Portal parquet files. The NCES source format (zero-padded strings) is shown for reference only. When joining with other Portal datasets, join on the integer columns directly.

| ID | Portal Type | NCES Source Format | Level | Example (Int64) |
|----|-------------|-------------------|-------|-----------------|
| `ncessch` | Int64 | 12-char zero-padded | School | `10000500870` |
| `ncessch_num` | Int64 | Same as ncessch | School | `10000500870` |
| `leaid` | Int64 | 7-char zero-padded | District/LEA | `100005` |
| `leaid_num` | Int64 | Same as leaid | District/LEA | `100005` |
| `fips` | Int64 | 2-digit | State | `1` (Alabama) |

### Data Levels

| Level | Identifier | Dataset Path Pattern |
|-------|------------|---------------------|
| School | `ncessch` (Int64) | `edfacts/schools_edfacts_*` |
| District/LEA | `leaid` (Int64) | `edfacts/districts_edfacts_*` |
| State | `fips` (Int64) | Aggregate from lower levels |

### Subgroups Reported

> **Note:** Not all subgroup columns are present in every dataset. Grad rates data does NOT have `sex`, `migrant`, or `military_connected` columns.

| Subgroup | NCES Code | Portal Integer | Column | Available In |
|----------|-----------|----------------|--------|--------------|
| All students | `ALL` | `99` | race, sex, lep, disability | Assessments, Grad Rates |
| Economically disadvantaged | `ECODIS` | `1` | econ_disadvantaged | Assessments, Grad Rates |
| Students with disabilities | `CWD` | `1` | disability | Assessments, Grad Rates |
| English learners | `LEP` | `1` | lep | Assessments, Grad Rates |
| Homeless | `HOM` | `1` | homeless | Assessments, Grad Rates |
| Foster care | `FCS` | `1` | foster_care | Assessments, Grad Rates |
| Migrant | `MIG` | `1` | migrant | Assessments only |
| Military connected | `MIL` | `1` | military_connected | Assessments only |
| Race/ethnicity | Multiple | `1-7, 99` | race | Assessments, Grad Rates |
| Sex | `M/F` | `1, 2, 99` | sex | Assessments only |

**EDFacts Filter Column Pattern:**
- Special population columns (lep, disability, homeless, etc.) use `1` = subgroup, `99` = total
- Race column uses integer codes (1=White, 2=Black, etc.)
- Sex column uses `1` = Male, `2` = Female, `99` = Total (assessments only)

### Grade Codes (grade_edfacts)

| Code | Grade Level |
|------|-------------|
| `3`-`8` | Grades 3-8 (individual) |
| `9` | Grades 9-12 combined |
| `99` | Total (all grades) |

### Race Codes

> **Empirically verified** from 2018 school assessment data. Only these values appear in the `race` column:

| Code | Category |
|------|----------|
| `1` | White |
| `2` | Black |
| `3` | Hispanic |
| `4` | Asian |
| `5` | American Indian/Alaska Native |
| `7` | Two or More Races |
| `99` | Total |

> **Note:** Code `6` (Native Hawaiian/Pacific Islander) is NOT observed in the data. Codes `8` (Nonresident alien), `9` (Unknown), `20` (Other), `-1`, `-2`, `-3` are also not observed in the race column. These codes may exist in other Portal sources but are absent from EDFacts.

### Sex Codes

| Code | Category |
|------|----------|
| `1` | Male |
| `2` | Female |
| `9` | Unknown |
| `99` | Total |

### Disability Codes

> **Empirically verified** from 2018 school assessment and 2019 grad rate data. Only `1` and `99` are observed in the `disability` column. The expanded codes (0-4) documented in other Portal sources are NOT present in EDFacts datasets.

| Code | Category |
|------|----------|
| `1` | Students with disabilities (IDEA-eligible) |
| `99` | Total (all students) |

### LEP Codes

| Code | Category |
|------|----------|
| `1` | Students who are limited English proficient |
| `99` | All students (total) |

### Special Population Columns

For `homeless`, `migrant`, `econ_disadvantaged`, `foster_care`, `military_connected`:

| Code | Category |
|------|----------|
| `1` | Yes (in subgroup) |
| `99` | Total (all students) |

### Missing Data Codes

| Code | Meaning | When Used |
|------|---------|-----------|
| `-1` | Missing/not applicable | Data not reported |
| `-2` | Not reported | Item doesn't apply to this entity |
| `-3` | Suppressed for privacy | Data suppressed for small N-size |
| `-9` | Rounds to zero | Value rounds to zero |
| Range values | Exact value suppressed | Range provided instead of exact value |
| `_midpt` suffix | Calculated midpoint of suppressed range | Use for analysis when exact values are suppressed |

**Always use `_midpt` variables for analysis when exact values are suppressed.**

## Data Access

All EDFacts data is fetched via the **Education Data Portal mirror system**. There is no API access.

**Key references:**
- **`mirrors.yaml`** -- Mirror definitions, URL templates, read strategies
- **`datasets-reference.md`** -- Canonical dataset paths (one path works for all mirrors)
- **`fetch-patterns.md`** -- `fetch_from_mirrors()` and `fetch_yearly_from_mirrors()` patterns

> **Truth Hierarchy:** When interpreting variable values, apply this priority:
> 1. **Actual data file** (what you observe in the parquet/CSV) — this IS the truth
> 2. **Live codebook** (.xls in mirror) — authoritative documentation, may lag
> 3. **This skill documentation** — convenient summary, may drift from codebook
>
> If this documentation contradicts the codebook, trust the codebook. If the codebook contradicts observed data, trust the data and investigate.

### Key Datasets

| Dataset | Path | Type | Columns |
|---------|------|------|---------|
| School Assessments | `edfacts/schools_edfacts_assessments_{year}` | Yearly (2009-2018, 2020) | 26 cols |
| School Grad Rates | `edfacts/schools_edfacts_grad_rates_{year}` | Yearly (2010-2019) | 18 cols |
| District Assessments | `edfacts/districts_edfacts_assessments_{year}` | Yearly (2009-2018, 2020) | 23 cols |
| District Grad Rates | `edfacts/districts_edfacts_grad_rates_{year}` | Yearly (2010-2019) | 15 cols |

> **Note:** 2019 assessment data is NOT available (at any level) due to COVID testing waivers.

### Codebooks

Codebook `.xls` files are available for both assessment and graduation rate datasets. Use `get_codebook_url()` from `fetch-patterns.md`:

```python
# Assessment codebooks:
url = get_codebook_url("edfacts/codebook_schools_edfacts_assessments")
url = get_codebook_url("edfacts/codebook_districts_edfacts_assessments")

# Graduation rate codebooks:
url = get_codebook_url("edfacts/codebook_schools_edfacts_graduation")
url = get_codebook_url("edfacts/codebook_districts_edfacts_graduation")
```

> **Codebook naming note:** Graduation rate codebooks use `_graduation` (not `_grad_rates`), while the data files use `_grad_rates`. This follows the same pattern as other Portal sources where codebook names differ from data file names. See `datasets-reference.md` for the authoritative path mapping.

### Dataset Column Differences

Assessment and graduation rate datasets have **different column sets**:

| Column | Assessments | Grad Rates |
|--------|-------------|------------|
| `sex` | Yes (1, 2, 99) | **No** |
| `migrant` | Yes (1, 99) | **No** |
| `military_connected` | Yes (1, 99) | **No** |
| `grade_edfacts` | Yes (3-9, 99) | **No** |
| `read_test_*` / `math_test_*` | Yes | **No** |
| `grad_rate_*` | **No** | Yes |
| `cohort_num` | **No** | Yes |
| `school_name` / `lea_name` | Yes | Yes |

### Filtering

```python
# Grade filtering: grade_edfacts uses integer codes
df = df.filter(pl.col("grade_edfacts") == 4)  # Grade 4
df = df.filter(pl.col("grade_edfacts") == 99)  # All grades combined

# Subgroup filtering: special population columns use 1/99 pattern
df_total = df.filter(pl.col("sex") == 99)  # All students (total)
df_econ = df.filter(pl.col("econ_disadvantaged") == 1)  # Economically disadvantaged only

# Race filtering: integer codes
df_black = df.filter(pl.col("race") == 2)  # Black students
```

## Common Pitfalls

| Pitfall | Issue | Solution |
|---------|-------|----------|
| Ranking states by proficiency | Different tests, different cut scores make comparisons meaningless | Use NAEP for cross-state comparisons |
| Comparing 2019-20 to other years | COVID testing waivers created data gaps | Note data gap, exclude year |
| Ignoring suppression | Results biased toward larger schools/subgroups | Document suppression rates, use `_midpt` variables |
| Assuming proficiency = same thing | State definitions of "proficient" vary widely | Clarify each state's definition |
| Pre/post ESSA comparison | Different accountability systems (NCLB vs ESSA) | Note policy change at 2015 boundary |
| Using string codes for filtering | Portal uses integer encoding, not NCES strings | Always check actual data values; see encoding tables above |

## Key Policy Context

| Law | Years | Key Features |
|-----|-------|--------------|
| NCLB | 2002-2015 | AYP, 100% proficiency goal, HQT |
| ESSA | 2015-present | State flexibility, multiple indicators |

- **AYP (Adequate Yearly Progress)**: NCLB requirement eliminated by ESSA
- **ESSA Accountability**: States design own systems with federal guardrails
- **N-size**: Minimum students required for reporting (varies by state, typically 10-30)

## CRITICAL WARNING: Cross-State Comparisons

**State assessment proficiency rates CANNOT be compared across states.**

| Factor | Why It Varies |
|--------|---------------|
| Assessment content | Each state creates its own tests |
| Proficiency cut scores | Each state sets own thresholds |
| Standards alignment | State academic standards differ |
| Test difficulty | Not calibrated nationally |

A student "proficient" in one state may score "below basic" in another state taking a harder test with higher cut scores. **Rankings of states by proficiency rates are meaningless.**

Use NAEP (National Assessment of Educational Progress) for valid cross-state comparisons.

### Valid vs. Invalid Analysis Examples

**Valid Analysis:**

```python
# Within-state trend analysis
state_df = df.filter(pl.col("fips") == 6)  # California only
trend = state_df.group_by("year").agg(
    pl.col("read_test_pct_prof_midpt").mean()
)
# Valid: Same state, same test system
```

**INVALID Analysis:**

```python
# DO NOT DO THIS - Cross-state comparison
# This comparison is MEANINGLESS
state_comparison = df.group_by("fips").agg(
    pl.col("read_test_pct_prof_midpt").mean()
).sort("read_test_pct_prof_midpt", descending=True)
# INVALID: Different tests, different standards
```

## Related Data Sources

| Source | Relationship | When to Use |
|--------|--------------|-------------|
| `education-data-source-ccd` | CCD provides school/district demographics | Combining outcome data with school characteristics |
| `education-data-source-crdc` | CRDC has discipline, AP, school climate data | Analyzing school equity alongside achievement |
| `education-data-source-saipe` | SAIPE provides district poverty estimates | Linking poverty to achievement |
| `education-data-source-meps` | MEPS provides school poverty estimates | School-level poverty and assessment analysis |
| `education-data-explorer` | Parent discovery skill | Finding available endpoints |
| `education-data-query` | Data fetching | Downloading via mirrors |

## Topic Index

| Topic | Reference File |
|-------|---------------|
| NCLB to ESSA transition | `./references/accountability-context.md` |
| State accountability systems | `./references/accountability-context.md` |
| Federal reporting requirements | `./references/accountability-context.md` |
| Proficiency levels | `./references/assessment-data.md` |
| Why states can't be compared | `./references/assessment-data.md` |
| NAEP comparison | `./references/assessment-data.md` |
| Assessment system changes | `./references/assessment-data.md` |
| ACGR calculation | `./references/graduation-rates.md` |
| Cohort adjustments | `./references/graduation-rates.md` |
| Extended graduation rates | `./references/graduation-rates.md` |
| Diploma types | `./references/graduation-rates.md` |
| Suppression codes | `./references/variable-definitions.md` |
| Missing data values | `./references/variable-definitions.md` |
| Range/midpoint variables | `./references/variable-definitions.md` |
| Participation rates | `./references/variable-definitions.md` |
| COVID-19 data gaps | `./references/data-quality.md` |
| State reporting variations | `./references/data-quality.md` |
| Known data issues | `./references/data-quality.md` |
| Time series breaks | `./references/data-quality.md` |
| Students with disabilities | `./references/subgroup-reporting.md` |
| English learners | `./references/subgroup-reporting.md` |
| Economically disadvantaged | `./references/subgroup-reporting.md` |
| Race/ethnicity reporting | `./references/subgroup-reporting.md` |
| Homeless/foster/migrant | `./references/subgroup-reporting.md` |
| N-size requirements | `./references/subgroup-reporting.md` |

Related Skills

election-data-source-countypres

160

from DAAF-Contribution-Community/daaf

County Presidential Returns 2000-2024 (MIT MEDSL). Vote shares, party trends, turnout by county_fips (joins census/education data). Requires HARVARD_DATAVERSE_API_KEY. Critical: mode='TOTAL' drops ~1K counties post-2020 — use 3-pattern reconstruction

education-data-source-scorecard

160

from DAAF-Contribution-Community/daaf

College Scorecard — post-enrollment outcomes linking aid records to IRS/Treasury earnings. Earnings, loan repayment, debt via six Portal sub-datasets. Use when tax-record-based earnings needed. Tracks only Title IV aid recipients, not all students.

education-data-source-saipe

160

from DAAF-Contribution-Community/daaf

SAIPE — annual Census poverty estimates for school districts (Portal; county/state not in Portal). Use for district poverty, Title I context, or trends. ~18-month lag. No race/ethnicity disaggregation at district level — use ACS 5-year for that.

education-data-source-pseo

160

from DAAF-Contribution-Community/daaf

PSEO — Census data linking graduates to employment via LEHD wage records. Earnings percentiles at 1/5/10 years post-graduation by institution, degree, CIP. Use for graduate earnings analysis. Coverage: ~29% of graduates from ~31 states.

education-data-source-nhgis

160

from DAAF-Contribution-Community/daaf

NHGIS — census geography crosswalks via Portal: links schools (ncessch) and colleges (unitid) to tracts, block groups, CBSAs (1990-2020). Census demographics NOT in Portal — access NHGIS directly. Use for linking education data to census geography.

education-data-source-nccs

160

from DAAF-Contribution-Community/daaf

NCCS — Form 990 data for private nonprofit colleges (Portal: IPEDS-matched, 1993-2016). Revenue, expenses, assets, endowment, governance beyond IPEDS. Use when IRS financial depth needed. Portal ends 2016; public institutions excluded (no Form 990).

education-data-source-nacubo

160

from DAAF-Contribution-Community/daaf

NACUBO endowment data (~650 institutions, 2012-2022). Portal: 7 columns only (total endowment, per-FTE, YoY change). Use for endowment size/trends. Full investment/spending needs direct NACUBO access. For all-institution coverage use IPEDS finance.

education-data-source-meps

160

from DAAF-Contribution-Community/daaf

MEPS — Urban Institute modeled school-level poverty (% at 100% FPL), from CCD + SAIPE (public schools, 2009-2022, 2-3yr lag). Use when FRPL is unreliable due to CEP. Consistent cross-state measurement. Public schools only.

education-data-source-ipeds

160

from DAAF-Contribution-Community/daaf

IPEDS — primary federal postsecondary data (~6,500 institutions, 1980-present): enrollment, completions, graduation rates, finance, aid, admissions, HR. For college/university analysis. Grad rates = first-time full-time; finance needs GASB/FASB care.

education-data-source-fsa

160

from DAAF-Contribution-Community/daaf

FSA — Title IV aid at institution level (~5,500 institutions, 1999-2021). Pell Grants, Direct/PLUS loans, campus-based aid, financial responsibility scores, 90/10 metrics. Use for aid distribution, loan volume, or for-profit analysis. By unitid.

education-data-source-eada

160

from DAAF-Contribution-Community/daaf

EADA — college athletics gender equity (~2,000+ institutions, 2002-2021). Participation, coaching, salaries, expenses, revenues, athletic aid by gender. Not Title IX compliance data. No sector column; join IPEDS on unitid for institution type.

education-data-source-crdc

160

from DAAF-Contribution-Community/daaf

CRDC — biennial OCR survey of all U.S. public schools (2011-2021). Discipline, course access, harassment, restraint/seclusion by race/sex/disability/EL. Use for civil rights and equity analysis. 2020-21 COVID-impacted; 2011-14 sampled, not universe.