genomics-analysis

Orchestrates a genomics analysis workflow from gene query through expression analysis to pathway enrichment. Use when investigating gene function, analyzing expression data, or performing pathway-level interpretation. NOT for pure protein structure modeling or drug-target interaction analysis.

564 stars

bybeita6969

View on GitHub Installation ↓

Best use case

genomics-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using genomics-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/genomics-analysis/SKILL.md --create-dirs "https://raw.githubusercontent.com/beita6969/ScienceClaw/main/skills/genomics-analysis/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/genomics-analysis/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How genomics-analysis Compares

Feature / Agent	genomics-analysis	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Genomics Analysis (Meta Skill)

This meta-skill coordinates a complete genomics analysis pipeline by integrating
gene database queries, sequence analysis, expression profiling, and pathway
enrichment into a unified workflow. It combines three specialized skills to
deliver comprehensive gene-level and systems-level biological insights.

## Workflow

### Step 1: Gene Information Retrieval

Query NCBI Entrez for comprehensive gene details including official nomenclature,
genomic coordinates, transcript variants, and functional annotations. Retrieve
orthologs across model organisms for evolutionary context. Pull known variants
from ClinVar and dbSNP, noting pathogenic or pharmacogenomic associations.
Collect linked references from PubMed for recent literature context.

### Step 2: Sequence Analysis

Use BioPython to perform sequence-level analyses on retrieved gene and protein
sequences:
- Multiple sequence alignment of orthologs to identify conserved regions
- Motif discovery in promoter regions or protein domains
- Domain architecture mapping against Pfam/InterPro signatures
- Codon usage analysis for expression optimization studies
- Variant impact prediction based on conservation scores

### Step 3: Expression Analysis

Apply scanpy for expression data analysis, supporting both single-cell and
bulk RNA-seq workflows:
- For single-cell: quality control, normalization, clustering, marker gene
  identification, cell type annotation
- For bulk: differential expression analysis, volcano plots, heatmaps
- Cross-dataset comparison when multiple conditions are available
- Identification of co-expressed gene modules

### Step 4: Pathway Enrichment and Functional Annotation

Map differentially expressed or co-expressed genes to biological pathways:
- KEGG pathway mapping for metabolic and signaling context
- Gene Ontology enrichment (biological process, molecular function, cellular component)
- Reactome pathway analysis for detailed mechanistic understanding
- Network-based enrichment to identify hub genes and regulatory modules

### Step 5: Integrated Report Generation

Compile findings into a structured report with:
- Gene summary card with key identifiers and annotations
- Sequence conservation highlights and domain maps
- Expression analysis results with statistical summaries
- Enriched pathways ranked by significance
- Key findings synthesis connecting sequence, expression, and pathway data
- Publication-ready figures and supplementary tables

## Integration Points

- **ncbi-entrez** -- Gene records, variant data, orthologs, literature links
- **biopython-bio** -- Sequence alignment, motif search, domain analysis, format conversion
- **scanpy-singlecell** -- Expression quantification, clustering, differential expression, visualization

## Output Formats

- **Gene card**: Symbol, aliases, genomic location, function summary, disease associations
- **Alignment view**: Conserved regions highlighted across orthologs
- **Expression summary**: DE gene lists with fold change, p-values, FDR
- **Pathway table**: Enriched pathways with gene counts, p-values, leading-edge genes
- **Figures**: Heatmaps, volcano plots, UMAP embeddings, pathway diagrams

## Best Practices

1. Start with gene identifiers from a reliable source (NCBI Gene ID or HGNC symbol)
2. Verify gene nomenclature across databases to avoid confusion from aliases
3. Use appropriate normalization for the expression data type (TPM, CPM, SCTransform)
4. Apply multiple testing correction (Benjamini-Hochberg) for all enrichment analyses
5. Set biologically meaningful fold-change thresholds alongside statistical cutoffs
6. Include both up- and down-regulated gene sets in pathway analysis
7. Cross-reference pathway results with known biology to filter spurious enrichments
8. Report effect sizes and confidence intervals, not just p-values
9. Note species differences when translating findings from model organisms
10. Archive intermediate results for reproducibility and downstream re-analysis

Related Skills

statistical-analysis

564

from beita6969/ScienceClaw

Guided statistical analysis with test selection and reporting. Use when you need help choosing appropriate tests for your data, assumption checking, power analysis, and APA-formatted results. Best for academic research reporting, test selection guidance. For implementing specific models programmatically use statsmodels.

social-science-analysis

564

from beita6969/ScienceClaw

Social science research methods including survey design, qualitative analysis, content analysis, network analysis, psychometrics, and mixed methods. Covers sociology, psychology, political science, education, and communication studies. Use when user designs surveys, analyzes qualitative data, does content analysis, builds scales, or uses mixed methods. Triggers on "survey design", "qualitative analysis", "content analysis", "Likert scale", "thematic analysis", "grounded theory", "factor analysis", "SEM", "structural equation", "psychometrics", "interview coding".

scipy-analysis

564

from beita6969/ScienceClaw

Scientific computing and statistical analysis with SciPy, NumPy, and pandas. Use when: (1) statistical hypothesis testing, (2) optimization problems, (3) signal processing, (4) numerical integration, (5) data manipulation and analysis. NOT for: symbolic math (use sympy-math), machine learning (use sklearn directly), or visualization (use matplotlib-viz).

patent-analysis

564

from beita6969/ScienceClaw

Conducts patent landscape analysis including prior art searches, patent claim interpretation, freedom-to-operate assessment, and intellectual property strategy for scientific inventions; trigger when users discuss patents, prior art, IP protection, or technology licensing.

paper-analysis

564

from beita6969/ScienceClaw

Read, summarize, and critically analyze scientific papers. Extract key findings, methodology, limitations, and contributions. Use when user shares a paper (PDF/URL/DOI), asks to summarize a paper, critique methodology, extract data from a paper, compare papers, or do a critical review. Triggers on "summarize this paper", "analyze this study", "what does this paper say", "critique this methodology", "extract findings from".

nlp-analysis

564

from beita6969/ScienceClaw

Natural language processing for research including text mining, sentiment analysis, topic modeling, named entity recognition, text classification, and corpus analysis. Use when user needs to analyze text data, extract information from documents, do sentiment analysis, topic modeling, or text classification for research purposes. Triggers on "text mining", "sentiment analysis", "topic modeling", "NER", "named entity", "text classification", "word embeddings", "LDA", "corpus analysis", "word frequency", "TF-IDF".

meta-analysis

564

from beita6969/ScienceClaw

Perform quantitative meta-analysis with effect size calculation, forest plots, funnel plots, and heterogeneity assessment. Use when: user asks to combine results from multiple studies, calculate pooled effect sizes, assess publication bias, or create forest/funnel plots. NOT for: systematic review protocol (use systematic-review) or single-study statistics (use statsmodels-stats).

linguistics-analysis

564

from beita6969/ScienceClaw

Analyze language structures, typological features, and semantic change across languages

legal-analysis

564

from beita6969/ScienceClaw

Analyze legal contracts, extract clauses, and perform legal research with structured frameworks

geospatial-analysis

564

from beita6969/ScienceClaw

Performs geospatial data analysis including GIS operations, spatial statistics, remote sensing image processing, geocoding, and cartographic visualization; trigger when users discuss maps, coordinates, satellite imagery, spatial patterns, or geographic data.

genome-analysis

564

from beita6969/ScienceClaw

Performs genomics analyses including gene expression profiling, BLAST sequence alignment, GWAS interpretation, variant calling, and genome assembly tasks; trigger when the user mentions DNA/RNA sequences, SNPs, gene panels, or comparative genomics.

exploratory-data-analysis

564

from beita6969/ScienceClaw

Perform comprehensive exploratory data analysis on scientific data files across 200+ file formats. This skill should be used when analyzing any scientific data file to understand its structure, content, quality, and characteristics. Automatically detects file type and generates detailed markdown reports with format-specific analysis, quality metrics, and downstream analysis recommendations. Covers chemistry, bioinformatics, microscopy, spectroscopy, proteomics, metabolomics, and general scientific data formats.