tooluniverse

Use this skill when working with scientific research tools and workflows across bioinformatics, cheminformatics, genomics, structural biology, proteomics, and drug discovery. This skill provides access to 600+ scientific tools including machine learning models, datasets, APIs, and analysis packages. Use when searching for scientific tools, executing computational biology workflows, composing multi-step research pipelines, accessing databases like OpenTargets/PubChem/UniProt/PDB/ChEMBL, performing tool discovery for research tasks, or integrating scientific computational resources into LLM workflows.

7 stars

bysanand0

View on GitHub Installation ↓

Best use case

tooluniverse is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using tooluniverse should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/tooluniverse/SKILL.md --create-dirs "https://raw.githubusercontent.com/sanand0/scientific-research/main/.claude/skills/tooluniverse/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/tooluniverse/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How tooluniverse Compares

Feature / Agent	tooluniverse	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# ToolUniverse

## Overview

ToolUniverse is a unified ecosystem that enables AI agents to function as research scientists by providing standardized access to 600+ scientific resources. Use this skill to discover, execute, and compose scientific tools across multiple research domains including bioinformatics, cheminformatics, genomics, structural biology, proteomics, and drug discovery.

**Key Capabilities:**
- Access 600+ scientific tools, models, datasets, and APIs
- Discover tools using natural language, semantic search, or keywords
- Execute tools through standardized AI-Tool Interaction Protocol
- Compose multi-step workflows for complex research problems
- Integration with Claude Desktop/Code via Model Context Protocol (MCP)

## When to Use This Skill

Use this skill when:
- Searching for scientific tools by function or domain (e.g., "find protein structure prediction tools")
- Executing computational biology workflows (e.g., disease target identification, drug discovery, genomics analysis)
- Accessing scientific databases (OpenTargets, PubChem, UniProt, PDB, ChEMBL, KEGG, etc.)
- Composing multi-step research pipelines (e.g., target discovery → structure prediction → virtual screening)
- Working with bioinformatics, cheminformatics, or structural biology tasks
- Analyzing gene expression, protein sequences, molecular structures, or clinical data
- Performing literature searches, pathway enrichment, or variant annotation
- Building automated scientific research workflows

## Quick Start

### Basic Setup
```python
from tooluniverse import ToolUniverse

# Initialize and load tools
tu = ToolUniverse()
tu.load_tools()  # Loads 600+ scientific tools

# Discover tools
tools = tu.run({
    "name": "Tool_Finder_Keyword",
    "arguments": {
        "description": "disease target associations",
        "limit": 10
    }
})

# Execute a tool
result = tu.run({
    "name": "OpenTargets_get_associated_targets_by_disease_efoId",
    "arguments": {"efoId": "EFO_0000537"}  # Hypertension
})
```

### Model Context Protocol (MCP)
For Claude Desktop/Code integration:
```bash
tooluniverse-smcp
```

## Core Workflows

### 1. Tool Discovery

Find relevant tools for your research task:

**Three discovery methods:**
- `Tool_Finder` - Embedding-based semantic search (requires GPU)
- `Tool_Finder_LLM` - LLM-based semantic search (no GPU required)
- `Tool_Finder_Keyword` - Fast keyword search

**Example:**
```python
# Search by natural language description
tools = tu.run({
    "name": "Tool_Finder_LLM",
    "arguments": {
        "description": "Find tools for RNA sequencing differential expression analysis",
        "limit": 10
    }
})

# Review available tools
for tool in tools:
    print(f"{tool['name']}: {tool['description']}")
```

**See `references/tool-discovery.md` for:**
- Detailed discovery methods and search strategies
- Domain-specific keyword suggestions
- Best practices for finding tools

### 2. Tool Execution

Execute individual tools through the standardized interface:

**Example:**
```python
# Execute disease-target lookup
targets = tu.run({
    "name": "OpenTargets_get_associated_targets_by_disease_efoId",
    "arguments": {"efoId": "EFO_0000616"}  # Breast cancer
})

# Get protein structure
structure = tu.run({
    "name": "AlphaFold_get_structure",
    "arguments": {"uniprot_id": "P12345"}
})

# Calculate molecular properties
properties = tu.run({
    "name": "RDKit_calculate_descriptors",
    "arguments": {"smiles": "CCO"}  # Ethanol
})
```

**See `references/tool-execution.md` for:**
- Real-world execution examples across domains
- Tool parameter handling and validation
- Result processing and error handling
- Best practices for production use

### 3. Tool Composition and Workflows

Compose multiple tools for complex research workflows:

**Drug Discovery Example:**
```python
# 1. Find disease targets
targets = tu.run({
    "name": "OpenTargets_get_associated_targets_by_disease_efoId",
    "arguments": {"efoId": "EFO_0000616"}
})

# 2. Get protein structures
structures = []
for target in targets[:5]:
    structure = tu.run({
        "name": "AlphaFold_get_structure",
        "arguments": {"uniprot_id": target['uniprot_id']}
    })
    structures.append(structure)

# 3. Screen compounds
hits = []
for structure in structures:
    compounds = tu.run({
        "name": "ZINC_virtual_screening",
        "arguments": {
            "structure": structure,
            "library": "lead-like",
            "top_n": 100
        }
    })
    hits.extend(compounds)

# 4. Evaluate drug-likeness
drug_candidates = []
for compound in hits:
    props = tu.run({
        "name": "RDKit_calculate_drug_properties",
        "arguments": {"smiles": compound['smiles']}
    })
    if props['lipinski_pass']:
        drug_candidates.append(compound)
```

**See `references/tool-composition.md` for:**
- Complete workflow examples (drug discovery, genomics, clinical)
- Sequential and parallel tool composition patterns
- Output processing hooks
- Workflow best practices

## Scientific Domains

ToolUniverse supports 600+ tools across major scientific domains:

**Bioinformatics:**
- Sequence analysis, alignment, BLAST
- Gene expression (RNA-seq, DESeq2)
- Pathway enrichment (KEGG, Reactome, GO)
- Variant annotation (VEP, ClinVar)

**Cheminformatics:**
- Molecular descriptors and fingerprints
- Drug discovery and virtual screening
- ADMET prediction and drug-likeness
- Chemical databases (PubChem, ChEMBL, ZINC)

**Structural Biology:**
- Protein structure prediction (AlphaFold)
- Structure retrieval (PDB)
- Binding site detection
- Protein-protein interactions

**Proteomics:**
- Mass spectrometry analysis
- Protein databases (UniProt, STRING)
- Post-translational modifications

**Genomics:**
- Genome assembly and annotation
- Copy number variation
- Clinical genomics workflows

**Medical/Clinical:**
- Disease databases (OpenTargets, OMIM)
- Clinical trials and FDA data
- Variant classification

**See `references/domains.md` for:**
- Complete domain categorization
- Tool examples by discipline
- Cross-domain applications
- Search strategies by domain

## Reference Documentation

This skill includes comprehensive reference files that provide detailed information for specific aspects:

- **`references/installation.md`** - Installation, setup, MCP configuration, platform integration
- **`references/tool-discovery.md`** - Discovery methods, search strategies, listing tools
- **`references/tool-execution.md`** - Execution patterns, real-world examples, error handling
- **`references/tool-composition.md`** - Workflow composition, complex pipelines, parallel execution
- **`references/domains.md`** - Tool categorization by domain, use case examples
- **`references/api_reference.md`** - Python API documentation, hooks, protocols

**Workflow:** When helping with specific tasks, reference the appropriate file for detailed instructions. For example, if searching for tools, consult `references/tool-discovery.md` for search strategies.

## Example Scripts

Two executable example scripts demonstrate common use cases:

**`scripts/example_tool_search.py`** - Demonstrates all three discovery methods:
- Keyword-based search
- LLM-based search
- Domain-specific searches
- Getting detailed tool information

**`scripts/example_workflow.py`** - Complete workflow examples:
- Drug discovery pipeline (disease → targets → structures → screening → candidates)
- Genomics analysis (expression data → differential analysis → pathways)

Run examples to understand typical usage patterns and workflow composition.

## Best Practices

1. **Tool Discovery:**
   - Start with broad searches, then refine based on results
   - Use `Tool_Finder_Keyword` for fast searches with known terms
   - Use `Tool_Finder_LLM` for complex semantic queries
   - Set appropriate `limit` parameter (default: 10)

2. **Tool Execution:**
   - Always verify tool parameters before execution
   - Implement error handling for production workflows
   - Validate input data formats (SMILES, UniProt IDs, gene symbols)
   - Check result types and structures

3. **Workflow Composition:**
   - Test each step individually before composing full workflows
   - Implement checkpointing for long workflows
   - Consider rate limits for remote APIs
   - Use parallel execution when tools are independent

4. **Integration:**
   - Initialize ToolUniverse once and reuse the instance
   - Call `load_tools()` once at startup
   - Cache frequently used tool information
   - Enable logging for debugging

## Key Terminology

- **Tool**: A scientific resource (model, dataset, API, package) accessible through ToolUniverse
- **Tool Discovery**: Finding relevant tools using search methods (Finder, LLM, Keyword)
- **Tool Execution**: Running a tool with specific arguments via `tu.run()`
- **Tool Composition**: Chaining multiple tools for multi-step workflows
- **MCP**: Model Context Protocol for integration with Claude Desktop/Code
- **AI-Tool Interaction Protocol**: Standardized interface for LLM-tool communication

## Resources

- **Official Website**: https://aiscientist.tools
- **GitHub**: https://github.com/mims-harvard/ToolUniverse
- **Documentation**: https://zitniklab.hms.harvard.edu/ToolUniverse/
- **Installation**: `uv uv pip install tooluniverse`
- **MCP Server**: `tooluniverse-smcp`

Related Skills

zinc-database

from sanand0/scientific-research

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

from sanand0/scientific-research

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

vaex

from sanand0/scientific-research

Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that don't fit in memory.

uspto-database

from sanand0/scientific-research

Access USPTO APIs for patent/trademark searches, examination history (PEDS), assignments, citations, office actions, TSDR, for IP analysis and prior art searches.

uniprot-database

from sanand0/scientific-research

Direct REST API access to UniProt. Protein searches, FASTA retrieval, ID mapping, Swiss-Prot/TrEMBL. For Python workflows with multiple databases, prefer bioservices (unified interface to 40+ services). Use this for direct HTTP/REST work or UniProt-specific control.

umap-learn

from sanand0/scientific-research

UMAP dimensionality reduction. Fast nonlinear manifold learning for 2D/3D visualization, clustering preprocessing (HDBSCAN), supervised/parametric UMAP, for high-dimensional data.

transformers

from sanand0/scientific-research

This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.

torchdrug

from sanand0/scientific-research

Graph-based drug discovery toolkit. Molecular property prediction (ADMET), protein modeling, knowledge graph reasoning, molecular generation, retrosynthesis, GNNs (GIN, GAT, SchNet), 40+ datasets, for PyTorch-based ML on molecules, proteins, and biomedical graphs.

torch-geometric

from sanand0/scientific-research

Graph Neural Networks (PyG). Node/graph classification, link prediction, GCN, GAT, GraphSAGE, heterogeneous graphs, molecular property prediction, for geometric deep learning.

sympy

from sanand0/scientific-research

Use this skill when working with symbolic mathematics in Python. This skill should be used for symbolic computation tasks including solving equations algebraically, performing calculus operations (derivatives, integrals, limits), manipulating algebraic expressions, working with matrices symbolically, physics calculations, number theory problems, geometry computations, and generating executable code from mathematical expressions. Apply this skill when the user needs exact symbolic results rather than numerical approximations, or when working with mathematical formulas that contain variables and parameters.

string-database

from sanand0/scientific-research

Query STRING API for protein-protein interactions (59M proteins, 20B interactions). Network analysis, GO/KEGG enrichment, interaction discovery, 5000+ species, for systems biology.

statsmodels

from sanand0/scientific-research

Statistical modeling toolkit. OLS, GLM, logistic, ARIMA, time series, hypothesis tests, diagnostics, AIC/BIC, for rigorous statistical inference and econometric analysis.