data-analysis

Load, analyze, and visualize datasets using pandas with AG Grid display

42 stars

byZaoqu-Liu

View on GitHub Installation ↓

Best use case

data-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Load, analyze, and visualize datasets using pandas with AG Grid display

Teams using data-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/prismer-data-analysis/SKILL.md --create-dirs "https://raw.githubusercontent.com/Zaoqu-Liu/ScienceClaw/main/skills/prismer-data-analysis/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/prismer-data-analysis/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How data-analysis Compares

Feature / Agent	data-analysis	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Load, analyze, and visualize datasets using pandas with AG Grid display

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Data Analysis Skill

## Description
Load data files (CSV, XLSX, JSON, Parquet) into the AG Grid viewer, run pandas queries, save results, and generate visualizations.

## Tools Used

### Primary (Data Grid workflow)
- `data_list` - List available data files in /workspace/data/
- `data_load` - Load a data file into AG Grid (returns markdown preview for context)
- `data_query` - Execute pandas operations on loaded data (filter, aggregate, transform)
- `data_save` - Save the current DataFrame to a file

### Secondary (Jupyter workflow for visualization)
- `jupyter_execute` - Execute Python code in Jupyter kernel (for plots and complex analysis)
- `update_notebook` - Add cells to Jupyter notebook
- `update_gallery` - Display generated plots in the gallery

## Workflow

### Recommended: Data Grid Workflow
For tabular data exploration, use the data tools which provide a spreadsheet-like experience:

1. **List files**: `data_list` to see what's in /workspace/data/
2. **Load data**: `data_load` to read a file and display in AG Grid
   - You'll receive a markdown preview to understand columns and types
3. **Query/Filter**: `data_query` to run pandas operations
   - The `df` variable contains the loaded data
   - Set `result = ...` to define output
4. **Save results**: `data_save` to export to CSV/XLSX

### Alternative: Jupyter Workflow
For visualization, statistical analysis, or ML, use Jupyter tools:

1. Load data with `jupyter_execute` running pandas code
2. Create visualizations with matplotlib/seaborn
3. Display plots with `update_gallery`

## Usage Patterns

### Load and Explore Data
When user says: "Analyze this dataset" or "Show me the data"
1. `data_list` to find available files
2. `data_load` with the target file
3. Review the markdown preview to understand structure
4. `data_query` with `result = df.describe()` for statistics
5. Offer filtering, sorting, or visualization

### Filter and Transform
When user says: "Show only rows where X > Y" or "Group by category"
1. `data_query` with pandas filter/groupby code
2. Grid updates automatically with filtered results
3. Inform user of result count and preview

### Save Processed Data
When user says: "Export this" or "Save as Excel"
1. `data_save` with desired filename and format
2. Report file location and size

### Visualize Data
When user says: "Create a chart" or "Plot the distribution"
1. Use `jupyter_execute` with matplotlib/seaborn code
2. Save plot and display via `update_gallery`

## Code Snippets for data_query

### Filter rows
```python
result = df[df['score'] > 90]
```

### Group and aggregate
```python
result = df.groupby('category').agg({'value': ['mean', 'sum', 'count']}).reset_index()
```

### Sort by column
```python
result = df.sort_values('date', ascending=False)
```

### Add computed column
```python
df['ratio'] = df['value_a'] / df['value_b']
result = df
```

### Summary statistics
```python
result = df.describe()
```

### Handle missing values
```python
result = df.dropna(subset=['important_column'])
```

## Best Practices

1. **Start with data_list**: Always check what files are available first
2. **Use data_load first**: Load data to get markdown preview before querying
3. **Keep queries simple**: One operation per data_query call for clarity
4. **Save intermediate results**: Use data_save for important filtered datasets
5. **Switch to Jupyter for plots**: AG Grid is for tabular data, use Jupyter for visualizations

Related Skills

zinc-database

from Zaoqu-Liu/ScienceClaw

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

uspto-database

from Zaoqu-Liu/ScienceClaw

Access USPTO APIs for patent/trademark searches, examination history (PEDS), assignments, citations, office actions, TSDR, for IP analysis and prior art searches.

uniprot-database

from Zaoqu-Liu/ScienceClaw

Direct REST API access to UniProt. Protein searches, FASTA retrieval, ID mapping, Swiss-Prot/TrEMBL. For Python workflows with multiple databases, prefer bioservices (unified interface to 40+ services). Use this for direct HTTP/REST work or UniProt-specific control.

tooluniverse-variant-analysis

from Zaoqu-Liu/ScienceClaw

Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.

tooluniverse-structural-variant-analysis

from Zaoqu-Liu/ScienceClaw

Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.

tooluniverse-spatial-omics-analysis

from Zaoqu-Liu/ScienceClaw

Computational analysis framework for spatial multi-omics data integration. Given spatially variable genes (SVGs), spatial domain annotations, tissue type, and disease context from spatial transcriptomics/proteomics experiments (10x Visium, MERFISH, DBiTplus, SLIDE-seq, etc.), performs comprehensive biological interpretation including pathway enrichment, cell-cell interaction inference, druggable target identification, immune microenvironment characterization, and multi-modal integration. Produces a detailed markdown report with Spatial Omics Integration Score (0-100), domain-by-domain characterization, and validation recommendations. Uses 70+ ToolUniverse tools across 9 analysis phases. Use when users ask about spatial transcriptomics analysis, spatial omics interpretation, tissue heterogeneity, spatial gene expression patterns, tumor microenvironment mapping, tissue zonation, or cell-cell communication from spatial data.

tooluniverse-proteomics-analysis

from Zaoqu-Liu/ScienceClaw

Analyze mass spectrometry proteomics data including protein quantification, differential expression, post-translational modifications (PTMs), and protein-protein interactions. Processes MaxQuant, Spectronaut, DIA-NN, and other MS platform outputs. Performs normalization, statistical analysis, pathway enrichment, and integration with transcriptomics. Use when analyzing proteomics data, comparing protein abundance between conditions, identifying PTM changes, studying protein complexes, integrating protein and RNA data, discovering protein biomarkers, or conducting quantitative proteomics experiments.

Protein Interaction Network Analysis

from Zaoqu-Liu/ScienceClaw

Analyze protein-protein interaction networks using STRING, BioGRID, and SASBDB databases. Maps protein identifiers, retrieves interaction networks with confidence scores, performs functional enrichment analysis (GO/KEGG/Reactome), and optionally includes structural data. No API key required for core functionality (STRING). Use when analyzing protein networks, discovering interaction partners, identifying functional modules, or studying protein complexes.

tooluniverse-metabolomics-analysis

from Zaoqu-Liu/ScienceClaw

Analyze metabolomics data including metabolite identification, quantification, pathway analysis, and metabolic flux. Processes LC-MS, GC-MS, NMR data from targeted and untargeted experiments. Performs normalization, statistical analysis, pathway enrichment, metabolite-enzyme integration, and biomarker discovery. Use when analyzing metabolomics datasets, identifying differential metabolites, studying metabolic pathways, integrating with transcriptomics/proteomics, discovering metabolic biomarkers, performing flux balance analysis, or characterizing metabolic phenotypes in disease, drug response, or physiological conditions.

tooluniverse-immune-repertoire-analysis

from Zaoqu-Liu/ScienceClaw

Comprehensive immune repertoire analysis for T-cell and B-cell receptor sequencing data. Analyze TCR/BCR repertoires to assess clonality, diversity, V(D)J gene usage, CDR3 characteristics, convergence, and predict epitope specificity. Integrate with single-cell data for clonotype-phenotype associations. Use for adaptive immune response profiling, cancer immunotherapy research, vaccine response assessment, autoimmune disease studies, or repertoire diversity analysis in immunology research.

tooluniverse-image-analysis

from Zaoqu-Liu/ScienceClaw

Production-ready microscopy image analysis and quantitative imaging data skill for colony morphometry, cell counting, fluorescence quantification, and statistical analysis of imaging-derived measurements. Processes ImageJ/CellProfiler output (area, circularity, intensity, cell counts), performs Dunnett's test, Cohen's d effect size, power analysis, Shapiro-Wilk normality tests, two-way ANOVA, polynomial regression, natural spline regression with confidence intervals, and comparative morphometry. Supports CSV/TSV measurement tables, multi-channel fluorescence data, colony swarming assays, and neuron counting datasets. Use when analyzing microscopy measurement data, colony area/circularity, cell count statistics, swarming assays, co-culture ratio optimization, or answering questions about imaging-derived quantitative data.

tooluniverse-expression-data-retrieval

from Zaoqu-Liu/ScienceClaw

Retrieves gene expression and omics datasets from ArrayExpress and BioStudies with gene disambiguation, experiment quality assessment, and structured reports. Creates comprehensive dataset profiles with metadata, sample information, and download links. Use when users need expression data, omics datasets, or mention ArrayExpress (E-MTAB, E-GEOD) or BioStudies (S-BSST) accessions.