Skill: Gene Regulatory Network Analysis

**GRN inference starts with: which TF regulates which gene?** Direct evidence (ChIP-seq binding) is stronger than indirect (co-expression correlation). A TF binding near a gene doesn't prove regulation — check if expression changes when the TF is perturbed. JASPAR provides binding motifs but motif presence in a promoter is only computational evidence (T3); ENCODE ChIP-seq data that places the TF at the locus in the relevant cell type is stronger (T1). eQTLs from GTEx show which variants affect expression but don't identify the upstream regulator — combine with TF motif disruption analysis for mechanistic insight.

1,202 stars

Best use case

Skill: Gene Regulatory Network Analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

**GRN inference starts with: which TF regulates which gene?** Direct evidence (ChIP-seq binding) is stronger than indirect (co-expression correlation). A TF binding near a gene doesn't prove regulation — check if expression changes when the TF is perturbed. JASPAR provides binding motifs but motif presence in a promoter is only computational evidence (T3); ENCODE ChIP-seq data that places the TF at the locus in the relevant cell type is stronger (T1). eQTLs from GTEx show which variants affect expression but don't identify the upstream regulator — combine with TF motif disruption analysis for mechanistic insight.

Teams using Skill: Gene Regulatory Network Analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/tooluniverse-gene-regulatory-networks/SKILL.md --create-dirs "https://raw.githubusercontent.com/mims-harvard/ToolUniverse/main/skills/tooluniverse-gene-regulatory-networks/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/tooluniverse-gene-regulatory-networks/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Skill: Gene Regulatory Network Analysis Compares

Feature / AgentSkill: Gene Regulatory Network AnalysisStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

**GRN inference starts with: which TF regulates which gene?** Direct evidence (ChIP-seq binding) is stronger than indirect (co-expression correlation). A TF binding near a gene doesn't prove regulation — check if expression changes when the TF is perturbed. JASPAR provides binding motifs but motif presence in a promoter is only computational evidence (T3); ENCODE ChIP-seq data that places the TF at the locus in the relevant cell type is stronger (T1). eQTLs from GTEx show which variants affect expression but don't identify the upstream regulator — combine with TF motif disruption analysis for mechanistic insight.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Skill: Gene Regulatory Network Analysis

**GRN inference starts with: which TF regulates which gene?** Direct evidence (ChIP-seq binding) is stronger than indirect (co-expression correlation). A TF binding near a gene doesn't prove regulation — check if expression changes when the TF is perturbed. JASPAR provides binding motifs but motif presence in a promoter is only computational evidence (T3); ENCODE ChIP-seq data that places the TF at the locus in the relevant cell type is stronger (T1). eQTLs from GTEx show which variants affect expression but don't identify the upstream regulator — combine with TF motif disruption analysis for mechanistic insight.

**LOOK UP DON'T GUESS**: never assume JASPAR matrix IDs, Enrichr library names, or GTEx tissue identifiers — always search JASPAR by TF name and verify library names before calling enrichr.

## When to Use

Activate this skill when the user asks about:
- Transcription factor (TF) binding sites, motifs, or target genes
- Gene regulatory networks or transcriptional regulation
- Chromatin state and histone modifications in regulatory context
- TF-target relationships and co-regulation
- eQTL effects on gene regulation
- Protein-protein interactions among regulatory factors

## COMPUTE, DON'T DESCRIBE
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.

## Workflow

### Phase 0: Input Disambiguation

Determine:
- Is the query about a specific TF (e.g., "TP53 regulatory network") or a target gene (e.g., "what regulates CDKN1A")?
- Is a specific tissue/cell type relevant?
- Should the analysis focus on direct binding (motifs) or functional targets (ChIP-seq, enrichment)?

### Phase 1: TF Motif Lookup (JASPAR)

Search JASPAR for the TF's position weight matrix (PWM) and binding motif profile.

**Tool: `jaspar_search_matrices`**
```
Parameters:
  search   string   TF name to search (e.g., "TP53")
  limit    integer  Max results (default 10)
  collection string JASPAR collection filter (e.g., "CORE")
  species  string   Taxonomy ID filter (e.g., "9606" for human)
```

Example:
```json
{"search": "TP53", "limit": 5}
```

Returns `{status, data: {count, results: [{matrix_id, name, collection, base_id, version, sequence_logo}]}}`.

**Tool: `jaspar_get_matrix`** (for detailed motif info)
```
Parameters:
  matrix_id  string  JASPAR matrix ID (e.g., "MA0106.3")
```

Returns PFM (position frequency matrix), species, TF class, UniProt IDs.

### Phase 2: TF Target Genes (Enrichr)

Identify target genes from ChIP-seq experiments via Enrichr.

**Tool: `enrichr_gene_enrichment_analysis`**
```
Parameters:
  gene_list  array   List of gene symbols (REQUIRED)
  library    string  Enrichr library name (default "GO_Biological_Process_2023")
  top_n      integer Top enriched terms to return (default 10)
```

Key libraries for regulatory network analysis:
- `"ENCODE_TF_ChIP-seq_2015"` -- TF binding from ENCODE ChIP-seq
- `"ChEA_2022"` -- ChIP-seq enrichment analysis (broader coverage)
- `"TRRUST_Transcription_Factors_2019"` -- Literature-curated TF-target relationships
- `"ARCHS4_TFs_Coexp"` -- TF co-expression from RNA-seq

Example (find which TFs bind your gene set):
```json
{
  "gene_list": ["CDKN1A", "BAX", "MDM2", "GADD45A", "BBC3"],
  "library": "ENCODE_TF_ChIP-seq_2015",
  "top_n": 10
}
```

Returns `{status, data: {library, gene_count, enriched_terms: [{rank, term, p_value, combined_score, overlapping_genes, adjusted_p_value}]}}`.

**IMPORTANT**: Enrichr takes a gene list and tells you what TFs are enriched. To find targets OF a TF, use the TRRUST library or look up TF ChIP-seq targets directly.

### Phase 3: Regulatory Element Context

#### 3a: Histone Modifications (ENCODE)

**Tool: `ENCODE_search_histone_experiments`**
```
Parameters:
  target   string   Histone mark (e.g., "H3K27ac", "H3K4me3", "H3K27me3")
  tissue   string   Tissue/cell type (e.g., "liver", "brain")
  limit    integer  Max results (default 10)
```

Common histone marks and their meaning:
- `H3K27ac` -- Active enhancers and promoters
- `H3K4me3` -- Active promoters
- `H3K4me1` -- Poised/active enhancers
- `H3K27me3` -- Polycomb-repressed regions
- `H3K9me3` -- Heterochromatin

Example:
```json
{"target": "H3K27ac", "tissue": "liver", "limit": 5}
```

Returns `{status, data: {total, experiments: [{accession, histone_mark, biosample_summary, status, lab}]}}`.

#### 3b: Expression QTLs (GTEx)

**Tool: `GTEx_query_eqtl`**
```
Parameters:
  gene_symbol  string  Gene symbol (e.g., "TP53"). REQUIRED.
```

Returns eQTL SNPs across tissues, showing genetic variants that affect gene expression.

Example:
```json
{"gene_symbol": "TP53"}
```

Returns `{status, data: {singleTissueEqtl: [{snpId, variantId, geneSymbol, pValue, tissueSiteDetailId, nes}]}}`. `nes` = normalized effect size; negative = lower expression with alt allele.

#### 3c: Regulatory Variant Annotation (RegulomeDB)

**Tool: `RegulomeDB_query_variant`**
```
Parameters:
  rsid  string  dbSNP rsID (e.g., "rs7412")
```

Returns regulatory score (1a-7), tissue-specific scores, and overlapping regulatory features.

### Phase 4: Protein Interaction Network

#### 4a: STRING Database

**Tool: `STRING_get_interaction_partners`**
```
Parameters:
  identifiers     string   Protein/gene name (REQUIRED, e.g., "TP53")
  species         integer  NCBI taxonomy ID (default 9606 for human)
  limit           integer  Max partners to return
  required_score  integer  Min combined score 0-1000 (400=medium, 700=high, 900=highest)
```

Example:
```json
{"identifiers": "TP53", "species": 9606, "limit": 10}
```

Returns array of `{preferredName_A, preferredName_B, score, escore, dscore, tscore, ascore}`. Score components: `escore` (experimental), `dscore` (database), `tscore` (text-mining), `ascore` (coexpression).

#### 4b: IntAct Interactions

**Tool: `intact_get_interaction_network`**
```
Parameters:
  gene_symbol  string   Gene symbol (REQUIRED)
  limit        integer  Max results
```

Returns experimentally validated molecular interactions from IntAct.

#### 4c: BioGRID Interactions

**Tool: `BioGRID_get_interactions`**
```
Parameters:
  gene_symbol  string   Gene symbol (REQUIRED)
  limit        integer  Max results
```

Returns physical and genetic interactions with experimental system details.

### Phase 5: Literature Context

**Tool: `EuropePMC_search_articles`**
```
Parameters:
  query  string   Search query (REQUIRED)
  limit  integer  Max results (default 10)
```

Example:
```json
{"query": "TP53 transcription factor regulatory network", "limit": 5}
```

**Tool: `PubMed_search_articles`**
```
Parameters:
  query  string   Search query (REQUIRED)
  limit  integer  Max results (default 10)
```

### Phase 6: Ontology Annotation (Optional)

**Tool: `ols_search_terms`**
```
Parameters:
  query     string  Search term (REQUIRED)
  ontology  string  Ontology ID (e.g., "so" for Sequence Ontology, "go" for Gene Ontology)
  limit     integer Max results
```

Example for regulatory element types:
```json
{"query": "transcription factor binding site", "ontology": "so", "limit": 5}
```

### Phase 7: Functional Enrichment of Network

**Tool: `STRING_functional_enrichment`**
```
Parameters:
  identifiers  string  Comma-separated gene names (REQUIRED)
  species      integer NCBI taxonomy ID (default 9606)
```

Performs GO, KEGG, Reactome enrichment on a gene set from the network.

## Common Mistakes

1. **JASPAR tool name**: Use `jaspar_search_matrices` (lowercase, plural), NOT `jaspar_get_matrix`.

2. **JASPAR search param**: The parameter is `search` (NOT `query` or `name`).

3. **STRING identifiers param**: Use `identifiers` as a **string** (NOT an array). For multiple proteins, use `STRING_get_network` with array `identifiers`.

4. **Enrichr direction**: `enrichr_gene_enrichment_analysis` takes a gene SET and finds enriched TFs/pathways. To find targets of a TF, use `"TRRUST_Transcription_Factors_2019"` library with known target genes, or consult ENCODE ChIP-seq data directly.

5. **Enrichr `gene_list` is required**: Must be a JSON array of strings, not a single string.

6. **GTEx uses `gene_symbol`**: NOT Ensembl ID. The tool resolves it internally.

7. **ENCODE tissue names**: Use lowercase tissue names like `"liver"`, `"brain"`, `"heart"`. Complex queries may fail -- keep tissue names simple.

8. **BioGRID returns interactions as dict**: Keys are interaction IDs, values contain `OFFICIAL_SYMBOL_A` and `OFFICIAL_SYMBOL_B`.

9. **RegulomeDB rsID format**: Must include the "rs" prefix (e.g., `"rs7412"` not `"7412"`).

10. **No TRRUST direct tool**: TRRUST data is accessed via Enrichr library `"TRRUST_Transcription_Factors_2019"`, not a standalone tool.

## Common Use Patterns

### Pattern 1: "What does TF X regulate?"
1. `jaspar_search_matrices` -- Get motif info for TF X
2. `enrichr_gene_enrichment_analysis` with `TRRUST_Transcription_Factors_2019` library -- Use known targets
3. `STRING_get_interaction_partners` -- Find interacting proteins
4. `EuropePMC_search_articles` -- Literature on TF X targets

### Pattern 2: "What regulates gene Y?"
1. `enrichr_gene_enrichment_analysis` with gene Y's co-regulated genes + `ENCODE_TF_ChIP-seq_2015` library
2. `GTEx_query_eqtl` -- Find eQTLs affecting gene Y expression
3. `ENCODE_search_histone_experiments` -- Chromatin context at gene Y locus
4. `RegulomeDB_query_variant` -- Annotate regulatory variants near gene Y

### Pattern 3: "Build a regulatory network around gene set Z"
1. `enrichr_gene_enrichment_analysis` with gene set Z + multiple TF libraries
2. `STRING_get_interaction_partners` for hub genes
3. `STRING_functional_enrichment` -- Pathway context
4. `BioGRID_get_interactions` -- Experimental validation
5. `EuropePMC_search_articles` -- Supporting literature

### Pattern 4: "Tissue-specific regulation of gene X"
1. `GTEx_query_eqtl` -- Tissue-specific eQTLs for gene X
2. `ENCODE_search_histone_experiments` with specific tissue -- Active regulatory marks
3. `RegulomeDB_query_variant` -- Tissue-specific regulatory scores for eQTL SNPs
4. `enrichr_gene_enrichment_analysis` -- Identify TFs active in that tissue

### Pattern 5: "Is variant rs##### regulatory?"
1. `RegulomeDB_query_variant` -- Regulatory score and overlapping features
2. `GTEx_query_eqtl` -- Is this variant an eQTL?
3. `ENCODE_search_histone_experiments` -- Chromatin context at variant locus
4. `EuropePMC_search_articles` -- Literature on the variant

## Evidence Grading

- **T1**: ENCODE ChIP-seq, JASPAR validated motifs, GTEx significant eQTLs
- **T2**: BioGRID/IntAct interactions, TRRUST curated relationships
- **T3**: STRING predicted interactions, Enrichr statistical enrichment
- **T4**: Sequence Ontology terms, literature mentions

Related Skills

tooluniverse-variant-analysis

1202
from mims-harvard/ToolUniverse

Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.

tooluniverse-structural-variant-analysis

1202
from mims-harvard/ToolUniverse

Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.

tooluniverse-spatial-omics-analysis

1202
from mims-harvard/ToolUniverse

Computational analysis framework for spatial multi-omics data integration. Given spatially variable genes (SVGs), spatial domain annotations, tissue type, and disease context from spatial transcriptomics/proteomics experiments (10x Visium, MERFISH, DBiTplus, SLIDE-seq, etc.), performs comprehensive biological interpretation including pathway enrichment, cell-cell interaction inference, druggable target identification, immune microenvironment characterization, and multi-modal integration. Produces a detailed markdown report with Spatial Omics Integration Score (0-100), domain-by-domain characterization, and validation recommendations. Uses 70+ ToolUniverse tools across 9 analysis phases. Use when users ask about spatial transcriptomics analysis, spatial omics interpretation, tissue heterogeneity, spatial gene expression patterns, tumor microenvironment mapping, tissue zonation, or cell-cell communication from spatial data.

tooluniverse-sequence-analysis

1202
from mims-harvard/ToolUniverse

Retrieve and analyze biological sequences -- gene/protein sequences from NCBI, Ensembl, and UniProt. Search nucleotide databases, fetch by accession, find orthologs, get gene summaries. Use when users ask about DNA/RNA/protein sequences, gene lookups, ortholog searches, or sequence retrieval.

tooluniverse-regulatory-variant-analysis

1202
from mims-harvard/ToolUniverse

Regulatory variant interpretation -- GWAS association lookup, eQTL analysis, chromatin state annotation, regulatory element overlap, and trait ontology resolution. Connects GWAS Catalog, GTEx, ENCODE, RegulomeDB, OpenTargets, OLS ontology, and Ensembl regulatory features. Use when users ask about non-coding variants, GWAS hits, eQTLs, regulatory elements, enhancer/promoter variants, or trait-associated SNPs.

tooluniverse-regulatory-genomics

1202
from mims-harvard/ToolUniverse

Investigate transcription factor binding, cis-regulatory elements, chromatin accessibility, and regulatory variant annotation. Use when asked about TF binding sites, enhancers, promoters, ChIP-seq data, ATAC-seq signals, candidate cis-regulatory elements (cCREs), or the regulatory impact of genomic variants.

tooluniverse-proteomics-analysis

1202
from mims-harvard/ToolUniverse

Analyze mass spectrometry proteomics data including protein quantification, differential expression, post-translational modifications (PTMs), and protein-protein interactions. Processes MaxQuant, Spectronaut, DIA-NN, and other MS platform outputs. Performs normalization, statistical analysis, pathway enrichment, and integration with transcriptomics. Use when analyzing proteomics data, comparing protein abundance between conditions, identifying PTM changes, studying protein complexes, integrating protein and RNA data, discovering protein biomarkers, or conducting quantitative proteomics experiments.

tooluniverse-protein-modification-analysis

1202
from mims-harvard/ToolUniverse

Analyze post-translational modifications (PTMs) of proteins — modification sites, types, proteoforms, functional effects at PTM sites, and PTM-dependent protein interactions. Integrates iPTMnet, ProtVar, UniProt, and STRING databases. Use when asked about protein phosphorylation, ubiquitination, acetylation, glycosylation, methylation, SUMOylation, or other PTMs; proteoform diversity; PTM-regulated interactions; or functional impact of PTM sites.

Protein Interaction Network Analysis

1202
from mims-harvard/ToolUniverse

Analyze protein-protein interaction networks using STRING, BioGRID, and SASBDB databases. Maps protein identifiers, retrieves interaction networks with confidence scores, performs functional enrichment analysis (GO/KEGG/Reactome), and optionally includes structural data. No API key required for core functionality (STRING). Use when analyzing protein networks, discovering interaction partners, identifying functional modules, or studying protein complexes.

Skill: Population Genetics Analysis

1202
from mims-harvard/ToolUniverse

**MC Strategy**: Population genetics MC questions often test whether you know a specific theorem or result. COMPUTE the answer first (use popgen_calculator.py or write Python), then match to options. Don't try to reason about which option "sounds right."

tooluniverse-population-genetics-1000genomes

1202
from mims-harvard/ToolUniverse

Population genetics research using the 1000 Genomes Project (IGSR) -- search populations by superpopulation ancestry (AFR, AMR, EAS, EUR, SAS), retrieve samples by population code, list available data collections, and integrate with GWAS tools for population stratification analysis. Use when users ask about 1000 Genomes populations, sample ancestry, allele frequency variation across continental groups, population-specific GWAS interpretation, or IGSR data collections like the 30x high-coverage resequencing or HGSVC.

tooluniverse-phylogenetics

1202
from mims-harvard/ToolUniverse

Production-ready phylogenetics and sequence analysis skill for alignment processing, tree analysis, and evolutionary metrics. Computes treeness, RCV, treeness/RCV, parsimony informative sites, evolutionary rate, DVMC, tree length, alignment gap statistics, GC content, and bootstrap support using PhyKIT, Biopython, and DendroPy. Performs NJ/UPGMA/parsimony tree construction, Robinson-Foulds distance, Mann-Whitney U tests, and batch analysis across gene families. Integrates with ToolUniverse for sequence retrieval (NCBI, UniProt, Ensembl) and tree annotation. Use when processing FASTA/PHYLIP/Nexus/Newick files, computing phylogenetic metrics, comparing taxa groups, or answering questions about alignments, trees, parsimony, or molecular evolution.