data-analysis
End-to-end R data analysis workflow from exploration through regression to publication-ready tables and figures
Best use case
data-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
End-to-end R data analysis workflow from exploration through regression to publication-ready tables and figures
Teams using data-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/data-analysis/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How data-analysis Compares
| Feature / Agent | data-analysis | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
End-to-end R data analysis workflow from exploration through regression to publication-ready tables and figures
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
SKILL.md Source
# Data Analysis Workflow
Run an end-to-end data analysis in R: load, explore, analyze, and produce publication-ready output.
**Input:** `$ARGUMENTS` — a dataset path (e.g., `data/county_panel.csv`) or a description of the analysis goal (e.g., "regress wages on education with state fixed effects using CPS data").
---
## Constraints
- **Follow R code conventions** in `.claude/rules/r-code-conventions.md`
- **Save all scripts** to `scripts/R/` with descriptive names
- **Save all outputs** (figures, tables, RDS) to `output/`
- **Use `saveRDS()`** for every computed object — Quarto slides may need them
- **Use project theme** for all figures (check for custom theme in `.claude/rules/`)
- **Run r-reviewer** on the generated script before presenting results
---
## Workflow Phases
### Phase 1: Setup and Data Loading
1. Read `.claude/rules/r-code-conventions.md` for project standards
2. Create R script with proper header (title, author, purpose, inputs, outputs)
3. Load required packages at top (`library()`, never `require()`)
4. Set seed once at top: `set.seed(42)`
5. Load and inspect the dataset
### Phase 2: Exploratory Data Analysis
Generate diagnostic outputs:
- **Summary statistics:** `summary()`, missingness rates, variable types
- **Distributions:** Histograms for key continuous variables
- **Relationships:** Scatter plots, correlation matrices
- **Time patterns:** If panel data, plot trends over time
- **Group comparisons:** If treatment/control, compare pre-treatment means
Save all diagnostic figures to `output/diagnostics/`.
### Phase 3: Main Analysis
Based on the research question:
- **Regression analysis:** Use `fixest` for panel data, `lm`/`glm` for cross-section
- **Standard errors:** Cluster at the appropriate level (document why)
- **Multiple specifications:** Start simple, progressively add controls
- **Effect sizes:** Report standardized effects alongside raw coefficients
### Phase 4: Publication-Ready Output
**Tables:**
- Use `modelsummary` for regression tables (preferred) or `stargazer`
- Include all standard elements: coefficients, SEs, significance stars, N, R-squared
- Export as `.tex` for LaTeX inclusion and `.html` for quick viewing
**Figures:**
- Use `ggplot2` with project theme
- Set `bg = "transparent"` for Beamer compatibility
- Include proper axis labels (sentence case, units)
- Export with explicit dimensions: `ggsave(width = X, height = Y)`
- Save as both `.pdf` and `.png`
### Phase 5: Save and Review
1. `saveRDS()` for all key objects (regression results, summary tables, processed data)
2. Create `output/` subdirectories as needed with `dir.create(..., recursive = TRUE)`
3. Run the r-reviewer agent on the generated script:
```
Delegate to the r-reviewer agent:
"Review the script at scripts/R/[script_name].R"
```
4. Address any Critical or High issues from the review.
---
## Script Structure
Follow this template:
```r
# ============================================================
# [Descriptive Title]
# Author: [from project context]
# Purpose: [What this script does]
# Inputs: [Data files]
# Outputs: [Figures, tables, RDS files]
# ============================================================
# 0. Setup ----
library(tidyverse)
library(fixest)
library(modelsummary)
set.seed(42)
dir.create("output/analysis", recursive = TRUE, showWarnings = FALSE)
# 1. Data Loading ----
# [Load and clean data]
# 2. Exploratory Analysis ----
# [Summary stats, diagnostic plots]
# 3. Main Analysis ----
# [Regressions, estimation]
# 4. Tables and Figures ----
# [Publication-ready output]
# 5. Export ----
# [saveRDS for all objects, ggsave for all figures]
```
---
## Important
- **Reproduce, don't guess.** If the user specifies a regression, run exactly that.
- **Show your work.** Print summary statistics before jumping to regression.
- **Check for issues.** Look for multicollinearity, outliers, perfect prediction.
- **Use relative paths.** All paths relative to repository root.
- **No hardcoded values.** Use variables for sample restrictions, date ranges, etc.Related Skills
visual-audit
Perform adversarial visual audit of Quarto or Beamer slides checking for overflow, font consistency, box fatigue, and layout issues.
validate-bib
Validate bibliography entries against citations in all lecture files. Find missing entries and unused references.
translate-to-quarto
Translate Beamer LaTeX to Quarto RevealJS. Multi-phase workflow with TikZ extraction and QA.
slide-excellence
Multi-agent slide review (visual, pedagogy, proofreading). Use for comprehensive quality check before milestones.
review-r
Run the R code review protocol on R scripts. Checks code quality, reproducibility, domain correctness, and professional standards. Produces a report without editing files.
review-paper
Comprehensive manuscript review covering argument structure, econometric specification, citation completeness, and potential referee objections
research-ideation
Generate structured research questions, testable hypotheses, and empirical strategies from a topic or dataset
qa-quarto
Adversarial Quarto vs Beamer QA. Critic finds issues, fixer applies fixes, loops until APPROVED (max 5 rounds).
proofread
Run the proofreading protocol on lecture files. Checks grammar, typos, overflow, consistency, and academic writing quality. Produces a report without editing files.
pedagogy-review
Run holistic pedagogical review on lecture slides. Checks narrative arc, student prerequisites, worked examples, notation clarity, and deck pacing.
lit-review
Structured literature search and synthesis with citation extraction and gap identification
learn
Extract reusable knowledge from the current session into a persistent skill. Use when you discover something non-obvious, create a workaround, or develop a multi-step workflow that future sessions would benefit from.