tooluniverse-phylogenetics
Production-ready phylogenetics and sequence analysis skill for alignment processing, tree analysis, and evolutionary metrics. Computes treeness, RCV, treeness/RCV, parsimony informative sites, evolutionary rate, DVMC, tree length, alignment gap statistics, GC content, and bootstrap support using PhyKIT, Biopython, and DendroPy. Performs NJ/UPGMA/parsimony tree construction, Robinson-Foulds distance, Mann-Whitney U tests, and batch analysis across gene families. Integrates with ToolUniverse for sequence retrieval (NCBI, UniProt, Ensembl) and tree annotation. Use when processing FASTA/PHYLIP/Nexus/Newick files, computing phylogenetic metrics, comparing taxa groups, or answering questions about alignments, trees, parsimony, or molecular evolution.
Best use case
tooluniverse-phylogenetics is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Production-ready phylogenetics and sequence analysis skill for alignment processing, tree analysis, and evolutionary metrics. Computes treeness, RCV, treeness/RCV, parsimony informative sites, evolutionary rate, DVMC, tree length, alignment gap statistics, GC content, and bootstrap support using PhyKIT, Biopython, and DendroPy. Performs NJ/UPGMA/parsimony tree construction, Robinson-Foulds distance, Mann-Whitney U tests, and batch analysis across gene families. Integrates with ToolUniverse for sequence retrieval (NCBI, UniProt, Ensembl) and tree annotation. Use when processing FASTA/PHYLIP/Nexus/Newick files, computing phylogenetic metrics, comparing taxa groups, or answering questions about alignments, trees, parsimony, or molecular evolution.
Teams using tooluniverse-phylogenetics should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/tooluniverse-phylogenetics/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How tooluniverse-phylogenetics Compares
| Feature / Agent | tooluniverse-phylogenetics | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Production-ready phylogenetics and sequence analysis skill for alignment processing, tree analysis, and evolutionary metrics. Computes treeness, RCV, treeness/RCV, parsimony informative sites, evolutionary rate, DVMC, tree length, alignment gap statistics, GC content, and bootstrap support using PhyKIT, Biopython, and DendroPy. Performs NJ/UPGMA/parsimony tree construction, Robinson-Foulds distance, Mann-Whitney U tests, and batch analysis across gene families. Integrates with ToolUniverse for sequence retrieval (NCBI, UniProt, Ensembl) and tree annotation. Use when processing FASTA/PHYLIP/Nexus/Newick files, computing phylogenetic metrics, comparing taxa groups, or answering questions about alignments, trees, parsimony, or molecular evolution.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
AI Agent for Product Research
Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.
SKILL.md Source
# Phylogenetics and Sequence Analysis
Comprehensive phylogenetics and sequence analysis using PhyKIT, Biopython, and DendroPy. Designed for bioinformatics questions about multiple sequence alignments, phylogenetic trees, parsimony, molecular evolution, and comparative genomics.
**IMPORTANT**: This skill handles complex phylogenetic workflows. Most implementation details have been moved to `references/` for progressive disclosure. This document focuses on high-level decision-making and workflow orchestration.
---
## When to Use This Skill
Apply when users:
- Have FASTA alignment files and ask about parsimony informative sites, gaps, or alignment quality
- Have Newick tree files and ask about treeness, tree length, evolutionary rate, or DVMC
- Ask about treeness/RCV, RCV, or relative composition variability
- Need to compare phylogenetic metrics between groups (fungi vs animals, etc.)
- Ask about PhyKIT functions (treeness, rcv, dvmc, evo_rate, parsimony_informative, tree_length)
- Have gene family data with paired alignments and trees
- Need Mann-Whitney U tests or other statistical comparisons of phylogenetic metrics
- Ask about bootstrap support, branch lengths, or tree topology
- Need to build trees (NJ, UPGMA, parsimony) from alignments
- Ask about Robinson-Foulds distance or tree comparison
**BixBench Coverage**: 33 questions across 8 projects (bix-4, bix-11, bix-12, bix-25, bix-35, bix-38, bix-45, bix-60)
**NOT for** (use other skills instead):
- Multiple sequence alignment generation → Use external tools (MUSCLE, MAFFT, ClustalW)
- Maximum Likelihood tree construction → Use IQ-TREE, RAxML, or PhyML
- Bayesian phylogenetics → Use MrBayes or BEAST
- Ancestral state reconstruction → Use separate tools
---
## Core Principles
1. **Data-first approach** - Discover and validate all input files (alignments, trees) before any analysis
2. **PhyKIT-compatible** - Use PhyKIT functions for treeness, RCV, DVMC, parsimony, evolutionary rate (matches BixBench expected outputs)
3. **Format-flexible** - Support FASTA, PHYLIP, Nexus, Newick, and auto-detect formats
4. **Batch processing** - Process hundreds of gene alignments/trees in a single analysis
5. **Statistical rigor** - Mann-Whitney U, medians, percentiles, standard deviations with scipy.stats
6. **Precision awareness** - Match rounding to 4 decimal places (PhyKIT default) or as requested
7. **Group comparison** - Compare metrics between taxa groups (e.g., fungi vs animals)
8. **Question-driven** - Parse exactly what is asked and return the specific number/statistic
---
## Required Python Packages
```python
# Core (MUST be installed)
import numpy as np
import pandas as pd
from scipy import stats
from Bio import AlignIO, Phylo, SeqIO
from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor
# PhyKIT (primary computation engine)
from phykit.services.tree.treeness import Treeness
from phykit.services.tree.total_tree_length import TotalTreeLength
from phykit.services.tree.evolutionary_rate import EvolutionaryRate
from phykit.services.tree.dvmc import DVMC
from phykit.services.tree.treeness_over_rcv import TreenessOverRCV
from phykit.services.alignment.parsimony_informative_sites import ParsimonyInformative
from phykit.services.alignment.rcv import RelativeCompositionVariability
# DendroPy (for advanced tree operations)
import dendropy
# ToolUniverse (for sequence retrieval)
from tooluniverse import ToolUniverse
```
**Installation**:
```bash
pip install phykit dendropy biopython pandas numpy scipy
```
---
## High-Level Workflow Decision Tree
```
START: User question about phylogenetic data
│
├─ Q1: What type of analysis is needed?
│ │
│ ├─ ALIGNMENT ANALYSIS (FASTA/PHYLIP files)
│ │ ├─ Parsimony informative sites → phykit_parsimony_informative()
│ │ ├─ RCV score → phykit_rcv()
│ │ ├─ Gap percentage → alignment_gap_percentage()
│ │ ├─ GC content → alignment_statistics()
│ │ └─ See: references/sequence_alignment.md
│ │
│ ├─ TREE ANALYSIS (Newick files)
│ │ ├─ Treeness → phykit_treeness()
│ │ ├─ Tree length → phykit_tree_length()
│ │ ├─ Evolutionary rate → phykit_evolutionary_rate()
│ │ ├─ DVMC → phykit_dvmc()
│ │ ├─ Bootstrap support → extract_bootstrap_support()
│ │ └─ See: references/tree_building.md
│ │
│ ├─ COMBINED ANALYSIS (alignment + tree)
│ │ └─ Treeness/RCV → phykit_treeness_over_rcv()
│ │
│ ├─ TREE CONSTRUCTION (build from alignment)
│ │ ├─ Neighbor-Joining → build_nj_tree()
│ │ ├─ UPGMA → build_upgma_tree()
│ │ ├─ Parsimony → build_parsimony_tree()
│ │ └─ See: references/tree_building.md
│ │
│ ├─ GROUP COMPARISON (fungi vs animals, etc.)
│ │ ├─ Batch compute metrics per group
│ │ ├─ Mann-Whitney U test
│ │ ├─ Summary statistics (median, mean, percentiles)
│ │ └─ See: references/parsimony_analysis.md
│ │
│ └─ TREE COMPARISON
│ ├─ Robinson-Foulds distance → robinson_foulds_distance()
│ └─ Bootstrap consensus → bootstrap_analysis()
│
├─ Q2: What data format is available?
│ ├─ FASTA (.fa, .fasta, .faa, .fna)
│ ├─ PHYLIP (.phy, .phylip) - Use phylip-relaxed for long names
│ ├─ Nexus (.nex, .nexus)
│ ├─ Newick (.nwk, .newick, .tre, .tree)
│ └─ Auto-detect with load_alignment() or load_tree()
│
└─ Q3: Is this a batch analysis?
├─ Single gene → Run metric function once
├─ Multiple genes → Use batch_compute_metric()
└─ Group comparison → Use discover_gene_files() + compare_groups()
```
---
## Quick Reference: Common Metrics
| Metric | Function | Input | Description |
|--------|----------|-------|-------------|
| **Treeness** | `phykit_treeness(tree_file)` | Newick | Internal branch length / Total branch length |
| **RCV** | `phykit_rcv(aln_file)` | FASTA/PHYLIP | Relative Composition Variability |
| **Treeness/RCV** | `phykit_treeness_over_rcv(tree, aln)` | Both | Treeness divided by RCV |
| **Tree Length** | `phykit_tree_length(tree_file)` | Newick | Sum of all branch lengths |
| **Evolutionary Rate** | `phykit_evolutionary_rate(tree_file)` | Newick | Total branch length / num terminals |
| **DVMC** | `phykit_dvmc(tree_file)` | Newick | Degree of Violation of Molecular Clock |
| **Parsimony Sites** | `phykit_parsimony_informative(aln_file)` | FASTA/PHYLIP | Sites with ≥2 chars appearing ≥2 times |
| **Gap Percentage** | `alignment_gap_percentage(aln_file)` | FASTA/PHYLIP | Percentage of gap characters |
See `scripts/tree_statistics.py` for implementation.
---
## Common Analysis Patterns (BixBench)
### Pattern 1: Single Metric Across Groups
**Question**: "What is the median DVMC for fungi vs animals?"
**Workflow**:
```python
# 1. Discover files
fungi_genes = discover_gene_files("data/fungi")
animal_genes = discover_gene_files("data/animals")
# 2. Compute metric
fungi_dvmc = batch_dvmc(fungi_genes)
animal_dvmc = batch_dvmc(animal_genes)
# 3. Compare
fungi_values = list(fungi_dvmc.values())
animal_values = list(animal_dvmc.values())
print(f"Fungi median DVMC: {np.median(fungi_values):.4f}")
print(f"Animal median DVMC: {np.median(animal_values):.4f}")
```
**See**: `references/parsimony_analysis.md` for full implementation
### Pattern 2: Statistical Comparison
**Question**: "What is the Mann-Whitney U statistic comparing treeness between groups?"
**Workflow**:
```python
from scipy import stats
# Compute treeness for both groups
group1_treeness = batch_treeness(group1_genes)
group2_treeness = batch_treeness(group2_genes)
# Mann-Whitney U test (two-sided)
u_stat, p_value = stats.mannwhitneyu(
list(group1_treeness.values()),
list(group2_treeness.values()),
alternative='two-sided'
)
print(f"U statistic: {u_stat:.0f}")
print(f"P-value: {p_value:.4e}")
```
### Pattern 3: Filtering + Metric
**Question**: "What is the treeness/RCV for alignments with <5% gaps?"
**Workflow**:
```python
# 1. Filter by gap percentage
valid_genes = []
for entry in gene_files:
if 'aln_file' in entry:
gap_pct = alignment_gap_percentage(entry['aln_file'])
if gap_pct < 5.0:
valid_genes.append(entry)
# 2. Compute metric on filtered set
results = batch_treeness_over_rcv(valid_genes)
# 3. Report
values = [r[0] for r in results.values()] # treeness/rcv ratio
print(f"Median treeness/RCV: {np.median(values):.4f}")
```
### Pattern 4: Specific Gene Lookup
**Question**: "What is the evolutionary rate for gene X?"
**Workflow**:
```python
# Find gene file
gene_files = discover_gene_files("data/")
gene_entry = [g for g in gene_files if g['gene_id'] == 'X'][0]
# Compute metric
evo_rate = phykit_evolutionary_rate(gene_entry['tree_file'])
print(f"Evolutionary rate for gene X: {evo_rate:.4f}")
```
---
## Choosing Methods: When to Use What
### Alignment Methods
**When building alignments** (use external tools, not this skill):
| Method | Speed | Accuracy | Use Case |
|--------|-------|----------|----------|
| **ClustalW** | Slow | Medium | Small datasets (<100 sequences), educational |
| **MUSCLE** | Fast | High | Medium datasets (100-1000 sequences) |
| **MAFFT** | Very Fast | Very High | **Recommended** - Large datasets (>1000 sequences) |
**For this skill**: Work with pre-aligned sequences. Use `load_alignment()` to read any format.
### Tree Building Methods
**When to use which tree method:**
| Method | Speed | Accuracy | Use Case |
|--------|-------|----------|----------|
| **Neighbor-Joining** | Fast | Medium | Quick trees, large datasets, exploratory |
| **UPGMA** | Fast | Low | Assumes molecular clock, special cases only |
| **Maximum Parsimony** | Medium | Medium | Small datasets, discrete characters |
| **Maximum Likelihood** | Slow | High | **Use external tools** (IQ-TREE, RAxML) for production |
**Implementation in this skill**:
```python
# Fast distance-based trees
tree = build_nj_tree("alignment.fa") # Neighbor-Joining
tree = build_upgma_tree("alignment.fa") # UPGMA
# Parsimony (for small alignments)
tree = build_parsimony_tree("alignment.fa")
```
**For production ML trees**: Use IQ-TREE or RAxML externally, then analyze with this skill.
See `references/tree_building.md` for detailed implementations.
---
## Batch Processing
### Discovering Gene Files
```python
# Auto-discover paired alignment + tree files
gene_files = discover_gene_files("data/")
# Result: list of dicts with 'gene_id', 'aln_file', 'tree_file'
# [
# {'gene_id': 'gene1', 'aln_file': 'gene1.fa', 'tree_file': 'gene1.nwk'},
# {'gene_id': 'gene2', 'aln_file': 'gene2.fa', 'tree_file': 'gene2.nwk'},
# ...
# ]
```
### Computing Metrics in Batch
```python
# Tree metrics
treeness_results = batch_treeness(gene_files)
tree_length_results = batch_tree_length(gene_files)
dvmc_results = batch_dvmc(gene_files)
evo_rate_results = batch_evolutionary_rate(gene_files)
# Alignment metrics
rcv_results = batch_rcv(gene_files)
pi_results = batch_parsimony_informative(gene_files)
gap_results = batch_gap_percentage(gene_files)
# Combined metrics
treeness_rcv_results = batch_treeness_over_rcv(gene_files)
# All return dict: {gene_id: value}
```
### Statistical Analysis
```python
# Summary statistics
stats = summary_stats(list(treeness_results.values()))
# Returns: {'mean': ..., 'median': ..., 'std': ..., 'min': ..., 'max': ...}
# Group comparison
comparison = compare_groups(
list(fungi_treeness.values()),
list(animal_treeness.values()),
group1_name="Fungi",
group2_name="Animals"
)
# Returns: {'u_statistic': ..., 'p_value': ..., 'Fungi': {...}, 'Animals': {...}}
```
See `references/parsimony_analysis.md` for full workflow.
---
## Answer Extraction for BixBench
| Question Pattern | Extraction Method |
|-----------------|-------------------|
| "What is the median X?" | `np.median(values)` |
| "What is the maximum X?" | `np.max(values)` |
| "What is the difference between median X for A vs B?" | `abs(np.median(a) - np.median(b))` |
| "What percentage of X have Y above Z?" | `sum(v > Z for v in values) / len(values) * 100` |
| "What is the Mann-Whitney U statistic?" | `stats.mannwhitneyu(a, b)[0]` |
| "What is the p-value?" | `stats.mannwhitneyu(a, b)[1]` |
| "What is the X value for gene Y?" | `results[gene_id]` |
| "What is the fold-change in median X?" | `np.median(a) / np.median(b)` |
| "multiplied by 1000" | `round(value * 1000)` |
### Rounding Rules
- **PhyKIT default**: 4 decimal places
- **Percentages**: Match question format (e.g., "35%" → integer, "3.5%" → 1 decimal)
- **P-values**: Scientific notation for very small values
- **U statistics**: Integer (no decimals)
- **Always check question wording**: "rounded to 3 decimal places" overrides defaults
---
## BixBench Question Coverage
| Project | Questions | Metrics |
|---------|-----------|---------|
| **bix-4** | 7 | DVMC analysis (fungi vs animals) |
| **bix-11** | 6 | Treeness analysis (median, percentages, Mann-Whitney U) |
| **bix-12** | 5 | Parsimony informative sites (counts, percentages, ratios) |
| **bix-25** | 2 | Treeness/RCV with gap filtering |
| **bix-35** | 4 | Evolutionary rate (specific genes, comparisons) |
| **bix-38** | 5 | Tree length (fold-change, variance, paired ratios) |
| **bix-45** | 4 | RCV (Mann-Whitney U, medians, paired differences) |
| **bix-60** | 1 | Average treeness across multiple trees |
---
## ToolUniverse Integration
### Sequence Retrieval
```python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
# Get sequences from NCBI
result = tu.tools.NCBI_get_sequence(accession="NP_000546")
# Get gene tree from Ensembl
tree_result = tu.tools.EnsemblCompara_get_gene_tree(gene="ENSG00000141510")
# Get species tree from OpenTree
tree_result = tu.tools.OpenTree_get_induced_subtree(ott_ids="770315,770319")
```
---
## File Structure
```
tooluniverse-phylogenetics/
├── SKILL.md # This file (workflow orchestration)
├── QUICK_START.md # Quick reference
├── test_phylogenetics.py # Comprehensive test suite
├── references/
│ ├── sequence_alignment.md # Alignment analysis details
│ ├── tree_building.md # Tree construction methods
│ ├── parsimony_analysis.md # Statistical comparison workflows
│ └── troubleshooting.md # Common issues and solutions
└── scripts/
├── format_alignment.py # Alignment format conversion
└── tree_statistics.py # Core metric implementations
```
---
## Completeness Checklist
Before returning your answer, verify:
- [ ] Identified all input files (alignments and/or trees)
- [ ] Detected group structure (fungi/animals/etc.) if applicable
- [ ] Used correct PhyKIT function for the requested metric
- [ ] Processed ALL genes in each group (not just a sample)
- [ ] Applied correct statistical test if comparison requested
- [ ] Used correct rounding (4 decimals default, or as specified)
- [ ] Returned the specific statistic asked for (median, max, U stat, p-value, etc.)
- [ ] For percentage questions, confirmed whether answer is integer or decimal
- [ ] For "difference" questions, confirmed direction (A - B vs abs difference)
- [ ] For Mann-Whitney U, used `alternative='two-sided'` (default in scipy)
---
## Next Steps
- For detailed alignment analysis workflows → See `references/sequence_alignment.md`
- For tree construction methods → See `references/tree_building.md`
- For statistical comparison examples → See `references/parsimony_analysis.md`
- For common errors and solutions → See `references/troubleshooting.md`
- For script implementations → See `scripts/tree_statistics.py`
---
## Support
For issues with:
- **PhyKIT functions**: Check PhyKIT documentation at https://jlsteenwyk.com/PhyKIT/
- **Biopython tree/alignment parsing**: See https://biopython.org/wiki/Phylo
- **DendroPy operations**: See https://dendropy.org/
- **ToolUniverse integration**: Check ToolUniverse documentation
## License
Same as ToolUniverse framework license.Related Skills
tooluniverse-variant-interpretation
Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.
tooluniverse-variant-analysis
Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.
tooluniverse-target-research
Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.
tooluniverse-systems-biology
Comprehensive systems biology and pathway analysis using multiple pathway databases (Reactome, KEGG, WikiPathways, Pathway Commons, BioModels). Performs pathway enrichment, protein-pathway mapping, keyword searches, and systems-level analysis. Use when analyzing gene sets, exploring biological pathways, or investigating systems-level biology.
tooluniverse-structural-variant-analysis
Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.
tooluniverse-statistical-modeling
Perform statistical modeling and regression analysis on biomedical datasets. Supports linear regression, logistic regression (binary/ordinal/multinomial), mixed-effects models, Cox proportional hazards survival analysis, Kaplan-Meier estimation, and comprehensive model diagnostics. Extracts odds ratios, hazard ratios, confidence intervals, p-values, and effect sizes. Designed to solve BixBench statistical reasoning questions involving clinical/experimental data. Use when asked to fit regression models, compute odds ratios, perform survival analysis, run statistical tests, or interpret model coefficients from provided data.
tooluniverse-spatial-transcriptomics
Analyze spatial transcriptomics data to map gene expression in tissue architecture. Supports 10x Visium, MERFISH, seqFISH, Slide-seq, and imaging-based platforms. Performs spatial clustering, domain identification, cell-cell proximity analysis, spatial gene expression patterns, tissue architecture mapping, and integration with single-cell data. Use when analyzing spatial transcriptomics datasets, studying tissue organization, identifying spatial expression patterns, mapping cell-cell interactions in tissue context, characterizing tumor microenvironment spatial structure, or integrating spatial and single-cell RNA-seq data for comprehensive tissue analysis.
tooluniverse-spatial-omics-analysis
Computational analysis framework for spatial multi-omics data integration. Given spatially variable genes (SVGs), spatial domain annotations, tissue type, and disease context from spatial transcriptomics/proteomics experiments (10x Visium, MERFISH, DBiTplus, SLIDE-seq, etc.), performs comprehensive biological interpretation including pathway enrichment, cell-cell interaction inference, druggable target identification, immune microenvironment characterization, and multi-modal integration. Produces a detailed markdown report with Spatial Omics Integration Score (0-100), domain-by-domain characterization, and validation recommendations. Uses 70+ ToolUniverse tools across 9 analysis phases. Use when users ask about spatial transcriptomics analysis, spatial omics interpretation, tissue heterogeneity, spatial gene expression patterns, tumor microenvironment mapping, tissue zonation, or cell-cell communication from spatial data.
tooluniverse-single-cell
Production-ready single-cell and expression matrix analysis using scanpy, anndata, and scipy. Performs scRNA-seq QC, normalization, PCA, UMAP, Leiden/Louvain clustering, differential expression (Wilcoxon, t-test, DESeq2), cell type annotation, per-cell-type statistical analysis, gene-expression correlation, batch correction (Harmony), trajectory inference, and cell-cell communication analysis. NEW: Analyzes ligand-receptor interactions between cell types using OmniPath (CellPhoneDB, CellChatDB), scores communication strength, identifies signaling cascades, and handles multi-subunit receptor complexes. Integrates with ToolUniverse gene annotation tools (HPA, Ensembl, MyGene, UniProt) and enrichment tools (gseapy, PANTHER, STRING). Supports h5ad, 10X, CSV/TSV count matrices, and pre-annotated datasets. Use when analyzing single-cell RNA-seq data, studying cell-cell interactions, performing cell type differential expression, computing gene-expression correlations by cell type, analyzing tumor-immune communication, or answering questions about scRNA-seq datasets.
tooluniverse-sequence-retrieval
Retrieves biological sequences (DNA, RNA, protein) from NCBI and ENA with gene disambiguation, accession type handling, and comprehensive sequence profiles. Creates detailed reports with sequence metadata, cross-database references, and download options. Use when users need nucleotide sequences, protein sequences, genome data, or mention GenBank, RefSeq, EMBL accessions.
tooluniverse-rnaseq-deseq2
Production-ready RNA-seq differential expression analysis using PyDESeq2. Performs DESeq2 normalization, dispersion estimation, Wald testing, LFC shrinkage, and result filtering. Handles multi-factor designs, multiple contrasts, batch effects, and integrates with gene enrichment (gseapy) and ToolUniverse annotation tools (UniProt, Ensembl, OpenTargets). Supports CSV/TSV/H5AD input formats and any organism. Use when analyzing RNA-seq count matrices, identifying DEGs, performing differential expression with statistical rigor, or answering questions about gene expression changes.
tooluniverse-rare-disease-diagnosis
Provide differential diagnosis for patients with suspected rare diseases based on phenotype and genetic data. Matches symptoms to HPO terms, identifies candidate diseases from Orphanet/OMIM, prioritizes genes for testing, interprets variants of uncertain significance. Use when clinician asks about rare disease diagnosis, unexplained phenotypes, or genetic testing interpretation.