tooluniverse-chemical-safety
Comprehensive chemical safety and toxicology assessment integrating ADMET-AI predictions, CTD toxicogenomics, FDA label safety data, DrugBank safety profiles, and STITCH chemical-protein interactions. Performs predictive toxicology (AMES, DILI, LD50, carcinogenicity), organ/system toxicity profiling, chemical-gene-disease relationship mapping, regulatory safety extraction, and environmental hazard assessment. Use when asked about chemical toxicity, drug safety profiling, ADMET properties, environmental health risks, chemical hazard assessment, or toxicogenomic analysis.
Best use case
tooluniverse-chemical-safety is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Comprehensive chemical safety and toxicology assessment integrating ADMET-AI predictions, CTD toxicogenomics, FDA label safety data, DrugBank safety profiles, and STITCH chemical-protein interactions. Performs predictive toxicology (AMES, DILI, LD50, carcinogenicity), organ/system toxicity profiling, chemical-gene-disease relationship mapping, regulatory safety extraction, and environmental hazard assessment. Use when asked about chemical toxicity, drug safety profiling, ADMET properties, environmental health risks, chemical hazard assessment, or toxicogenomic analysis.
Teams using tooluniverse-chemical-safety should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/tooluniverse-chemical-safety/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How tooluniverse-chemical-safety Compares
| Feature / Agent | tooluniverse-chemical-safety | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Comprehensive chemical safety and toxicology assessment integrating ADMET-AI predictions, CTD toxicogenomics, FDA label safety data, DrugBank safety profiles, and STITCH chemical-protein interactions. Performs predictive toxicology (AMES, DILI, LD50, carcinogenicity), organ/system toxicity profiling, chemical-gene-disease relationship mapping, regulatory safety extraction, and environmental hazard assessment. Use when asked about chemical toxicity, drug safety profiling, ADMET properties, environmental health risks, chemical hazard assessment, or toxicogenomic analysis.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Chemical Safety & Toxicology Assessment
Comprehensive chemical safety and toxicology analysis integrating predictive AI models, curated toxicogenomics databases, regulatory safety data, and chemical-biological interaction networks. Generates structured risk assessment reports with evidence grading.
## When to Use This Skill
**Triggers**:
- "Is this chemical toxic?" / "What are the toxicity endpoints for [compound]?"
- "Assess the safety profile of [drug/chemical]"
- "What are the ADMET properties of [SMILES]?"
- "What genes does [chemical] interact with?"
- "What diseases are linked to [chemical] exposure?"
- "Predict toxicity for these molecules"
- "Drug safety assessment for [drug name]"
- "Environmental health risk of [chemical]"
- "Chemical hazard profiling"
- "Toxicogenomic analysis of [compound]"
**Use Cases**:
1. **Predictive Toxicology**: AI-predicted toxicity endpoints (AMES mutagenicity, DILI, LD50, carcinogenicity, skin reactions) for novel compounds via SMILES
2. **ADMET Profiling**: Full absorption, distribution, metabolism, excretion, toxicity characterization
3. **Toxicogenomics**: Chemical-gene interaction mapping, gene-disease associations from CTD
4. **Regulatory Safety**: FDA label warnings, boxed warnings, contraindications, adverse reactions
5. **Drug Safety Assessment**: Combined DrugBank safety + FDA labels + adverse event data
6. **Chemical-Protein Interactions**: STITCH-based chemical-protein binding and interaction networks
7. **Environmental Toxicology**: Chemical-disease associations for environmental contaminants
---
## KEY PRINCIPLES
1. **Report-first approach** - Create report file FIRST, then populate progressively
2. **Tool parameter verification** - Verify params via `get_tool_info` before calling unfamiliar tools
3. **Evidence grading** - Grade all safety claims by evidence strength (T1-T4)
4. **Citation requirements** - Every toxicity finding must have inline source attribution
5. **Mandatory completeness** - All sections must exist with data minimums or explicit "No data" notes
6. **Disambiguation first** - Resolve compound identity (name -> SMILES, CID, ChEMBL ID) before analysis
7. **Negative results documented** - "No toxicity signals found" is data; empty sections are failures
8. **Conservative risk assessment** - When evidence is ambiguous, flag as "requires further investigation"
9. **English-first queries** - Always use English chemical/drug names in tool calls
---
## Evidence Grading System (MANDATORY)
Grade every toxicity claim by evidence strength:
| Tier | Symbol | Criteria | Examples |
|------|--------|----------|----------|
| **T1** | [T1] | Direct human evidence, regulatory finding | FDA boxed warning, clinical trial toxicity, human case reports |
| **T2** | [T2] | Animal studies, validated in vitro | Nonclinical toxicology, AMES positive, animal LD50 |
| **T3** | [T3] | Computational prediction, association data | ADMET-AI prediction, CTD association, QSAR model |
| **T4** | [T4] | Database annotation, text-mined | Literature mention, database entry without validation |
### Required Evidence Grading Locations
Evidence grades MUST appear in:
1. **Executive Summary** - Key toxicity findings graded
2. **Toxicity Predictions** - Every ADMET-AI endpoint with confidence note
3. **Regulatory Safety** - FDA findings marked [T1]
4. **Chemical-Gene Interactions** - CTD data marked by curation status
5. **Risk Assessment** - Final risk classification with supporting evidence tiers
---
## Core Strategy: 8 Research Dimensions
```
Chemical/Drug Query
|
+-- PHASE 0: Compound Disambiguation (ALWAYS FIRST)
| +-- Resolve name -> SMILES, PubChem CID, ChEMBL ID
| +-- Get molecular formula, weight, canonical structure
|
+-- PHASE 1: Predictive Toxicology (ADMET-AI)
| +-- Mutagenicity (AMES)
| +-- Hepatotoxicity (DILI, ClinTox)
| +-- Carcinogenicity
| +-- Acute toxicity (LD50)
| +-- Skin reactions
| +-- Stress response pathways
| +-- Nuclear receptor activity
|
+-- PHASE 2: ADMET Properties
| +-- Absorption: BBB penetrance, bioavailability
| +-- Distribution: clearance, volume of distribution
| +-- Metabolism: CYP interactions (1A2, 2C9, 2C19, 2D6, 3A4)
| +-- Physicochemical: solubility, lipophilicity, pKa
|
+-- PHASE 3: Toxicogenomics (CTD)
| +-- Chemical-gene interactions
| +-- Chemical-disease associations
| +-- Affected biological pathways
|
+-- PHASE 4: Regulatory Safety (FDA Labels)
| +-- Boxed warnings (Black Box)
| +-- Contraindications
| +-- Adverse reactions
| +-- Warnings and precautions
| +-- Nonclinical toxicology
|
+-- PHASE 5: Drug Safety Profile (DrugBank)
| +-- Toxicity data
| +-- Contraindications
| +-- Drug interactions affecting safety
|
+-- PHASE 6: Chemical-Protein Interactions (STITCH)
| +-- Direct chemical-protein binding
| +-- Interaction confidence scores
| +-- Off-target effects
|
+-- PHASE 7: Structural Alerts (ChEMBL)
| +-- Known toxic substructures (PAINS, Brenk)
| +-- Structural alert flags
|
+-- SYNTHESIS: Integrated Risk Assessment
+-- Aggregate all evidence tiers
+-- Risk classification (Low/Medium/High/Critical)
+-- Data gaps and recommendations
```
---
## Phase 0: Compound Disambiguation (ALWAYS FIRST)
**CRITICAL**: Resolve compound identity before any analysis.
### Input Types Handled
| Input Format | Resolution Strategy |
|-------------|---------------------|
| Drug name (e.g., "Aspirin") | PubChem_get_CID_by_compound_name -> get SMILES from properties |
| SMILES string | Use directly for ADMET-AI; resolve to CID for other tools |
| PubChem CID | PubChem_get_compound_properties_by_CID -> get SMILES + name |
| ChEMBL ID | ChEMBL_get_molecule -> get SMILES + properties |
### Resolution Steps
1. **Input detection**: Determine if input is name, SMILES, CID, or ChEMBL ID
- SMILES: contains typical SMILES characters (=, #, [, ], (, ), c, n, o and no spaces in middle)
- CID: numeric only
- ChEMBL: starts with "CHEMBL"
- Otherwise: treat as compound name
2. **Name to CID**: `PubChem_get_CID_by_compound_name(name=<compound_name>)`
3. **CID to properties**: `PubChem_get_compound_properties_by_CID(cid=<cid>)`
4. **Extract SMILES**: Get SMILES from PubChem properties (field: `ConnectivitySMILES`, `CanonicalSMILES`, or `IsomericSMILES` depending on response format)
5. **Store resolved IDs**: Maintain dict with `name`, `smiles`, `cid`, `formula`, `weight`, `inchi`
### Disambiguation Output
```markdown
## Compound Identity
| Property | Value |
|----------|-------|
| **Name** | Acetaminophen |
| **PubChem CID** | 1983 |
| **SMILES** | CC(=O)Nc1ccc(O)cc1 |
| **Formula** | C8H9NO2 |
| **Molecular Weight** | 151.16 |
| **InChI** | InChI=1S/C8H9NO2/... |
```
---
## Phase 1: Predictive Toxicology (ADMET-AI)
**When**: SMILES is available (from Phase 0 or provided directly)
**Objective**: Run comprehensive AI-predicted toxicity endpoints
### Tools Used
All ADMET-AI tools take the same parameter format:
| Tool | Predicted Endpoints | Parameter |
|------|---------------------|-----------|
| `ADMETAI_predict_toxicity` | AMES, Carcinogens_Lagunin, ClinTox, DILI, LD50_Zhu, Skin_Reaction, hERG | `smiles`: list[str] |
| `ADMETAI_predict_stress_response` | Stress response pathway activation (ARE, ATAD5, HSE, MMP, p53) | `smiles`: list[str] |
| `ADMETAI_predict_nuclear_receptor_activity` | AhR, AR, ER, PPARg, Aromatase nuclear receptor activity | `smiles`: list[str] |
### Workflow
1. Call `ADMETAI_predict_toxicity(smiles=[resolved_smiles])`
2. Call `ADMETAI_predict_stress_response(smiles=[resolved_smiles])`
3. Call `ADMETAI_predict_nuclear_receptor_activity(smiles=[resolved_smiles])`
4. For each endpoint, interpret prediction:
- Classification endpoints: Active (1) = toxic signal, Inactive (0) = no signal
- Regression endpoints (LD50): Report numerical value with context
- All predictions graded [T3] (computational prediction)
### Decision Logic
- **Multiple SMILES**: Can batch up to ~10 SMILES in single call
- **Failed prediction**: If ADMET-AI fails, note "prediction unavailable" (don't fail entire report)
- **Confidence**: Note that AI predictions are [T3] evidence, not definitive
- **hERG flag**: If hERG = Active, flag prominently (cardiac safety risk)
- **AMES flag**: If AMES = Active, flag prominently (mutagenicity concern)
- **DILI flag**: If DILI = Active, flag prominently (liver toxicity concern)
### Output Table
```markdown
### Toxicity Predictions [T3]
| Endpoint | Prediction | Interpretation | Concern Level |
|----------|-----------|---------------|---------------|
| AMES Mutagenicity | Inactive | No mutagenic signal | Low |
| Carcinogenicity | Inactive | No carcinogenic signal | Low |
| ClinTox | Active | Clinical toxicity signal | HIGH |
| DILI | Active | Drug-induced liver injury risk | HIGH |
| LD50 (Zhu) | 2.45 log(mg/kg) | ~282 mg/kg (moderate) | Medium |
| Skin Reaction | Inactive | No skin sensitization signal | Low |
| hERG Inhibition | Active | Cardiac arrhythmia risk | HIGH |
*All predictions from ADMET-AI. Evidence tier: [T3] (computational prediction)*
```
---
## Phase 2: ADMET Properties
**When**: SMILES is available
**Objective**: Full ADMET characterization beyond toxicity
### Tools Used
| Tool | Properties Predicted | Parameter |
|------|---------------------|-----------|
| `ADMETAI_predict_BBB_penetrance` | Blood-brain barrier crossing probability | `smiles`: list[str] |
| `ADMETAI_predict_bioavailability` | Oral bioavailability (F20%, F30%) | `smiles`: list[str] |
| `ADMETAI_predict_clearance_distribution` | Clearance, VDss, half-life, PPB | `smiles`: list[str] |
| `ADMETAI_predict_CYP_interactions` | CYP1A2, 2C9, 2C19, 2D6, 3A4 inhibition/substrate | `smiles`: list[str] |
| `ADMETAI_predict_physicochemical_properties` | LogP, LogD, LogS, MW, pKa | `smiles`: list[str] |
| `ADMETAI_predict_solubility_lipophilicity_hydration` | Aqueous solubility, lipophilicity, hydration free energy | `smiles`: list[str] |
### Workflow
1. Call all 6 ADMET tools in parallel (independent calls)
2. Compile results into Absorption / Distribution / Metabolism / Excretion sections
3. Assess Lipinski Rule of 5 compliance from physicochemical properties
4. Flag drug-drug interaction risks from CYP inhibition profiles
### Decision Logic
- **BBB penetrant + toxicity**: If BBB = Yes and any CNS toxicity endpoint active, flag as neurotoxicity risk
- **Low bioavailability**: If F20% = Low, note absorption concerns
- **CYP inhibitor**: If CYP3A4 inhibitor = Yes, flag high DDI risk
- **Lipinski violations**: Count violations and report drug-likeness assessment
### Output Format
```markdown
### ADMET Profile [T3]
#### Absorption
| Property | Value | Interpretation |
|----------|-------|----------------|
| BBB Penetrance | Yes | Crosses blood-brain barrier |
| Bioavailability (F20%) | 85% | Good oral absorption |
#### Distribution
| Property | Value | Interpretation |
|----------|-------|----------------|
| VDss | 1.2 L/kg | Moderate tissue distribution |
| PPB | 92% | Highly protein bound |
#### Metabolism
| CYP Enzyme | Substrate | Inhibitor |
|------------|-----------|-----------|
| CYP1A2 | No | No |
| CYP2C9 | Yes | No |
| CYP2C19 | No | No |
| CYP2D6 | No | No |
| CYP3A4 | Yes | Yes (DDI risk) |
#### Excretion
| Property | Value | Interpretation |
|----------|-------|----------------|
| Clearance | 8.5 mL/min/kg | Moderate clearance |
| Half-life | 6.2 h | Moderate half-life |
```
---
## Phase 3: Toxicogenomics (CTD)
**When**: Compound name is resolved
**Objective**: Map chemical-gene-disease relationships from curated CTD data
### Tools Used
| Tool | Function | Parameter |
|------|----------|-----------|
| `CTD_get_chemical_gene_interactions` | Genes affected by chemical | `input_terms`: str (chemical name) |
| `CTD_get_chemical_diseases` | Diseases linked to chemical exposure | `input_terms`: str (chemical name) |
### Workflow
1. Call `CTD_get_chemical_gene_interactions(input_terms=compound_name)`
2. Call `CTD_get_chemical_diseases(input_terms=compound_name)`
3. Parse gene interactions: extract gene symbols, interaction types (increases/decreases expression, binding, etc.)
4. Parse disease associations: extract disease names, evidence types (marker/mechanism/therapeutic)
5. Identify most affected biological processes from gene list
### Decision Logic
- **Direct evidence vs inferred**: CTD separates curated direct evidence from inferred associations
- **Therapeutic vs toxic**: Disease associations can be therapeutic (drug treats disease) or adverse (chemical causes disease)
- **Gene interaction types**: Distinguish between expression changes, binding, and activity modulation
- **Prioritize marker/mechanism**: These indicate stronger causal evidence than simple associations
- **Grade curated as [T2]**: Direct curated CTD evidence from literature
- **Grade inferred as [T3]**: Computationally inferred associations
### Output Format
```markdown
### Toxicogenomics (CTD) [T2/T3]
#### Chemical-Gene Interactions (Top 20)
| Gene | Interaction | Type | Evidence |
|------|------------|------|----------|
| CYP1A2 | increases expression | mRNA | [T2] curated |
| TP53 | affects activity | protein | [T2] curated |
| ... | ... | ... | ... |
**Total interactions found**: 156
**Top affected pathways**: Xenobiotic metabolism, Apoptosis, DNA damage response
#### Chemical-Disease Associations (Top 10)
| Disease | Association Type | Evidence |
|---------|-----------------|----------|
| Liver Neoplasms | marker/mechanism | [T2] curated |
| Contact Dermatitis | therapeutic | [T2] curated |
| ... | ... | ... |
```
---
## Phase 4: Regulatory Safety (FDA Labels)
**When**: Compound has an approved drug name
**Objective**: Extract regulatory safety information from FDA drug labels
### Tools Used
| Tool | Information Retrieved | Parameter |
|------|---------------------|-----------|
| `FDA_get_boxed_warning_info_by_drug_name` | Black box warnings (most serious) | `drug_name`: str |
| `FDA_get_contraindications_by_drug_name` | Absolute contraindications | `drug_name`: str |
| `FDA_get_adverse_reactions_by_drug_name` | Known adverse reactions | `drug_name`: str |
| `FDA_get_warnings_by_drug_name` | Warnings and precautions | `drug_name`: str |
| `FDA_get_nonclinical_toxicology_info_by_drug_name` | Animal toxicology data | `drug_name`: str |
| `FDA_get_carcinogenic_mutagenic_fertility_by_drug_name` | Carcinogenicity/mutagenicity/fertility data | `drug_name`: str |
### Workflow
1. Call all 6 FDA tools in parallel (independent queries by drug name)
2. Parse and structure each response
3. Prioritize: Boxed Warnings > Contraindications > Warnings > Adverse Reactions
4. All FDA label data is [T1] evidence (regulatory finding based on human/animal data)
### Decision Logic
- **Boxed warning present**: Flag as CRITICAL safety concern in executive summary
- **No FDA data**: Chemical may not be an approved drug; note "Not an FDA-approved drug" and continue with other phases
- **Multiple warnings**: Categorize by organ system (hepatic, cardiac, renal, CNS, etc.)
- **Nonclinical toxicology**: Grade as [T2] (animal data supporting human risk)
### Output Format
```markdown
### Regulatory Safety (FDA) [T1]
#### Boxed Warning
**PRESENT** - Hepatotoxicity risk with doses >4g/day. Liver failure reported. [T1]
#### Contraindications
- Severe hepatic impairment [T1]
- Known hypersensitivity [T1]
#### Adverse Reactions (by frequency)
| Reaction | Frequency | Severity |
|----------|-----------|----------|
| Nausea | Common (>1%) | Mild |
| Hepatotoxicity | Rare (<0.1%) | Severe |
| ... | ... | ... |
#### Nonclinical Toxicology [T2]
- **Carcinogenicity**: No carcinogenic potential in 2-year rat/mouse studies
- **Mutagenicity**: Negative in Ames assay and in vivo micronucleus test
- **Fertility**: No effects on fertility at doses up to 10x human dose
```
---
## Phase 5: Drug Safety Profile (DrugBank)
**When**: Compound is a known drug
**Objective**: Retrieve curated drug safety data from DrugBank
### Tools Used
| Tool | Information | Parameters |
|------|------------|------------|
| `drugbank_get_safety_by_drug_name_or_drugbank_id` | Toxicity, contraindications | `query`: str, `case_sensitive`: bool, `exact_match`: bool, `limit`: int |
### Workflow
1. Call `drugbank_get_safety_by_drug_name_or_drugbank_id(query=drug_name, case_sensitive=False, exact_match=False, limit=5)`
2. Parse toxicity information, overdose data, contraindications
3. Cross-reference with FDA data from Phase 4
### Decision Logic
- **Toxicity field**: Contains LD50 values, overdose symptoms, organ toxicity data
- **DrugBank ID**: Note if found for cross-referencing
- **Conflict with FDA**: If DrugBank and FDA disagree, note discrepancy and defer to FDA [T1]
- **Not found**: Chemical may not be in DrugBank; continue with other phases
---
## Phase 6: Chemical-Protein Interactions (STITCH)
**When**: Compound can be identified by name or SMILES
**Objective**: Map chemical-protein interaction network for off-target assessment
### Tools Used
| Tool | Function | Parameters |
|------|----------|------------|
| `STITCH_resolve_identifier` | Resolve chemical name to STITCH ID | `identifier`: str, `species`: int (9606=human) |
| `STITCH_get_chemical_protein_interactions` | Get chemical-protein interactions | `identifiers`: list[str], `species`: int, `required_score`: int |
| `STITCH_get_interaction_partners` | Get interaction network | `identifiers`: list[str], `species`: int, `limit`: int |
### Workflow
1. Resolve compound: `STITCH_resolve_identifier(identifier=compound_name, species=9606)`
2. Get interactions: `STITCH_get_chemical_protein_interactions(identifiers=[stitch_id], species=9606, required_score=700)`
3. Identify off-target proteins (not the intended drug target)
4. Flag safety-relevant targets: hERG (cardiac), CYP enzymes (metabolism), nuclear receptors (endocrine)
### Decision Logic
- **High confidence (>900)**: Well-established interaction [T2]
- **Medium confidence (700-900)**: Probable interaction [T3]
- **Low confidence (400-700)**: Possible interaction, needs validation [T4]
- **Safety-relevant targets**: Flag interactions with known safety targets
- **No STITCH data**: Chemical may be too novel; note and continue
---
## Phase 7: Structural Alerts (ChEMBL)
**When**: ChEMBL molecule ID is available (from Phase 0)
**Objective**: Check for known toxic substructures
### Tools Used
| Tool | Function | Parameters |
|------|----------|------------|
| `ChEMBL_search_compound_structural_alerts` | Find structural alert matches | `molecule_chembl_id`: str, `limit`: int |
### Workflow
1. If ChEMBL ID available: `ChEMBL_search_compound_structural_alerts(molecule_chembl_id=chembl_id, limit=20)`
2. Parse alert types: PAINS (pan-assay interference), Brenk (medicinal chemistry), Glaxo (GSK structural alerts)
3. Categorize severity: Some alerts are informational, others indicate likely toxicity
### Decision Logic
- **PAINS alerts**: May cause false positives in screening; note for medicinal chemistry
- **Brenk alerts**: Known problematic substructures; flag if present
- **No alerts**: Good sign but not definitive proof of safety
- **No ChEMBL ID**: Skip this phase gracefully; note "structural alert analysis not available"
---
## Synthesis: Integrated Risk Assessment (MANDATORY)
**Always the final section**. Integrates all evidence into actionable risk classification.
### Risk Classification Matrix
| Risk Level | Criteria |
|-----------|----------|
| **CRITICAL** | FDA boxed warning present OR multiple [T1] toxicity findings OR active DILI + active hERG |
| **HIGH** | FDA warnings present OR [T2] animal toxicity OR multiple active ADMET endpoints |
| **MEDIUM** | Some [T3] predictions positive OR CTD disease associations OR structural alerts |
| **LOW** | All ADMET endpoints negative AND no FDA/DrugBank safety flags AND no CTD concerns |
| **INSUFFICIENT DATA** | Fewer than 3 phases returned data; cannot make confident assessment |
### Synthesis Template
```markdown
## Integrated Risk Assessment
### Overall Risk Classification: [HIGH]
### Evidence Summary
| Dimension | Finding | Evidence Tier | Concern |
|-----------|---------|--------------|---------|
| ADMET Toxicity | DILI active, hERG active | [T3] | HIGH |
| FDA Label | Boxed warning for hepatotoxicity | [T1] | CRITICAL |
| CTD Toxicogenomics | 156 gene interactions, liver neoplasms | [T2] | HIGH |
| DrugBank | Known hepatotoxicity at high doses | [T2] | HIGH |
| STITCH | Binds CYP3A4, hERG | [T3] | MEDIUM |
| Structural Alerts | 2 Brenk alerts | [T3] | MEDIUM |
### Key Safety Concerns
1. **Hepatotoxicity** [T1]: FDA boxed warning + ADMET-AI DILI prediction + CTD liver disease associations
2. **Cardiac Risk** [T3]: ADMET-AI hERG prediction + STITCH hERG interaction
3. **Drug Interactions** [T3]: CYP3A4 substrate/inhibitor, potential DDI risk
### Data Gaps
- [ ] No in vivo genotoxicity data available
- [ ] STITCH interaction scores moderate (700-900)
- [ ] No environmental exposure data
### Recommendations
1. Avoid doses >4g/day (hepatotoxicity threshold) [T1]
2. Monitor liver function in chronic use [T1]
3. Screen for CYP3A4 interactions before co-administration [T3]
4. Consider cardiac monitoring for at-risk patients [T3]
```
---
## Mandatory Completeness Checklist
Before finalizing any report, verify:
- [ ] **Phase 0**: Compound fully disambiguated (SMILES + CID at minimum)
- [ ] **Phase 1**: At least 5 toxicity endpoints reported or "prediction unavailable" noted
- [ ] **Phase 2**: ADMET profile with A/D/M/E sections or "not available" noted
- [ ] **Phase 3**: CTD queried; gene interactions and disease associations reported or "no data in CTD"
- [ ] **Phase 4**: FDA labels queried; results or "not an FDA-approved drug" noted
- [ ] **Phase 5**: DrugBank queried; results or "not found in DrugBank" noted
- [ ] **Phase 6**: STITCH queried; results or "no STITCH data available" noted
- [ ] **Phase 7**: Structural alerts checked or "ChEMBL ID not available" noted
- [ ] **Synthesis**: Risk classification provided with evidence summary
- [ ] **Evidence Grading**: All findings have [T1]-[T4] annotations
- [ ] **Data Gaps**: Explicitly listed in synthesis section
---
## Tool Parameter Reference
**Critical Parameter Notes** (verified from source code):
| Tool | Parameter Name | Type | Notes |
|------|---------------|------|-------|
| All ADMETAI tools | `smiles` | `list[str]` | Always a list, even for single compound |
| All CTD tools | `input_terms` | `str` | Chemical name, MeSH name, CAS RN, or MeSH ID |
| All FDA tools | `drug_name` | `str` | Brand or generic drug name |
| drugbank_get_safety_* | `query`, `case_sensitive`, `exact_match`, `limit` | str, bool, bool, int | All 4 required |
| STITCH_resolve_identifier | `identifier`, `species` | str, int | species=9606 for human |
| STITCH_get_chemical_protein_interactions | `identifiers`, `species`, `required_score` | list[str], int, int | required_score=400 default |
| PubChem_get_CID_by_compound_name | `name` | `str` | Compound name (not SMILES) |
| PubChem_get_compound_properties_by_CID | `cid` | `int` | Numeric CID |
| ChEMBL_search_compound_structural_alerts | `molecule_chembl_id` | `str` | ChEMBL ID (e.g., "CHEMBL112") |
### Response Format Notes
- **ADMET-AI**: Returns `{status: "success", data: {...}}` with prediction values
- **CTD**: Returns list of interaction/association objects
- **FDA**: Returns `{status, data}` with label text
- **DrugBank**: Returns `{data: [...]}` with drug records
- **STITCH**: Returns list of interaction objects with scores
- **PubChem CID lookup**: Returns `{IdentifierList: {CID: [...]}}` (may or may not have `data` wrapper)
- **PubChem properties**: Returns dict with `CID`, `MolecularWeight`, `ConnectivitySMILES`, `IUPACName`
---
## Fallback Strategies
### Compound Resolution
- **Primary**: PubChem by name -> CID -> properties -> SMILES
- **Fallback 1**: ChEMBL search by name -> molecule -> SMILES
- **Fallback 2**: If SMILES provided directly, skip name resolution
### Toxicity Prediction
- **Primary**: All 9 ADMET-AI endpoints
- **Fallback**: If ADMET-AI fails for a compound, note "prediction failed" and continue with database evidence
- **Note**: ADMET-AI may fail for very large or unusual SMILES
### Regulatory Data
- **Primary**: FDA labels by drug name
- **Fallback**: If FDA returns no data, try alternative drug names (brand vs generic)
- **Note**: Non-drug chemicals (pesticides, industrial) will not have FDA labels
### CTD Data
- **Primary**: Search by common chemical name
- **Fallback**: Try MeSH name if common name fails
- **Note**: Novel compounds may not be in CTD
---
## Common Use Patterns
### Pattern 1: Novel Compound Assessment
```
Input: SMILES string for new molecule
Workflow: Phase 0 (SMILES->CID) -> Phase 1 (toxicity) -> Phase 2 (ADMET) -> Phase 7 (structural alerts) -> Synthesis
Output: Predictive safety profile for novel compound
```
### Pattern 2: Approved Drug Safety Review
```
Input: Drug name (e.g., "Acetaminophen")
Workflow: All phases (0-7 + Synthesis)
Output: Complete safety dossier with regulatory + predictive + database evidence
```
### Pattern 3: Environmental Chemical Risk
```
Input: Chemical name (e.g., "Bisphenol A")
Workflow: Phase 0 -> Phase 1 -> Phase 2 -> Phase 3 (CTD, key for env chemicals) -> Phase 6 -> Synthesis
Output: Environmental health risk assessment focused on gene-disease associations
```
### Pattern 4: Batch Toxicity Screening
```
Input: Multiple SMILES strings
Workflow: Phase 0 -> Phase 1 (batch) -> Phase 2 (batch) -> Comparative table -> Synthesis
Output: Comparative toxicity table ranking compounds by safety
```
### Pattern 5: Toxicogenomic Deep-Dive
```
Input: Chemical name + specific gene or disease interest
Workflow: Phase 0 -> Phase 3 (CTD expanded) -> Literature search -> Synthesis
Output: Detailed chemical-gene-disease mechanistic analysis
```
---
## Output Report Structure
All analyses generate a structured markdown report with progressive sections:
```markdown
# Chemical Safety & Toxicology Report: [Compound Name]
**Generated**: YYYY-MM-DD HH:MM
**Compound**: [Name] | SMILES: [SMILES] | CID: [CID]
## Executive Summary
[2-3 sentence overview with risk classification and key findings, all graded]
## 1. Compound Identity
[Phase 0 results - disambiguation table]
## 2. Predictive Toxicology
[Phase 1 results - ADMET-AI toxicity endpoints]
## 3. ADMET Profile
[Phase 2 results - absorption, distribution, metabolism, excretion]
## 4. Toxicogenomics
[Phase 3 results - CTD chemical-gene-disease relationships]
## 5. Regulatory Safety
[Phase 4 results - FDA label information]
## 6. Drug Safety Profile
[Phase 5 results - DrugBank data]
## 7. Chemical-Protein Interactions
[Phase 6 results - STITCH network]
## 8. Structural Alerts
[Phase 7 results - ChEMBL alerts]
## 9. Integrated Risk Assessment
[Synthesis - risk classification, evidence summary, data gaps, recommendations]
## Appendix: Methods and Data Sources
[Tool versions, databases queried, date of access]
```
---
## Limitations & Known Issues
### Tool-Specific
- **ADMET-AI**: Predictions are computational [T3]; should not replace experimental testing
- **CTD**: Curated but may lag behind latest literature by 6-12 months
- **FDA**: Only covers FDA-approved drugs; not applicable to environmental chemicals or supplements
- **DrugBank**: Primarily drugs; limited coverage of industrial chemicals
- **STITCH**: Score thresholds affect sensitivity; lower scores increase false positives
- **ChEMBL**: Structural alerts require ChEMBL ID; not all compounds have one
### Analysis
- **Novel compounds**: May only have ADMET-AI predictions (no database evidence)
- **Environmental chemicals**: FDA/DrugBank phases will be empty; rely on CTD and ADMET-AI
- **Batch mode**: ADMET-AI can handle batches; other tools require individual queries
- **Species specificity**: Most data is human-centric; animal data noted where applicable
### Technical
- **SMILES validity**: Invalid SMILES will cause ADMET-AI failures
- **Name ambiguity**: Chemical names can be ambiguous; always verify with CID
- **Rate limits**: Some FDA endpoints may rate-limit for rapid queries
---
## Summary
**Chemical Safety & Toxicology Assessment Skill** provides comprehensive safety evaluation by integrating:
1. **Predictive toxicology** (ADMET-AI) - 9 tools covering toxicity, ADMET, physicochemical properties
2. **Toxicogenomics** (CTD) - Chemical-gene-disease relationship mapping
3. **Regulatory safety** (FDA) - 6 tools for label-based safety extraction
4. **Drug safety** (DrugBank) - Curated toxicity and contraindication data
5. **Chemical interactions** (STITCH) - Chemical-protein interaction networks
6. **Structural alerts** (ChEMBL) - Known toxic substructure detection
**Outputs**: Structured markdown report with risk classification, evidence grading, and actionable recommendations
**Best for**: Drug safety assessment, chemical hazard profiling, environmental toxicology, ADMET characterization, toxicogenomic analysis
**Total tools integrated**: 25+ tools across 6 databasesRelated Skills
tooluniverse-variant-interpretation
Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.
tooluniverse-variant-analysis
Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.
tooluniverse-target-research
Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.
tooluniverse-systems-biology
Comprehensive systems biology and pathway analysis using multiple pathway databases (Reactome, KEGG, WikiPathways, Pathway Commons, BioModels). Performs pathway enrichment, protein-pathway mapping, keyword searches, and systems-level analysis. Use when analyzing gene sets, exploring biological pathways, or investigating systems-level biology.
tooluniverse-structural-variant-analysis
Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.
tooluniverse-statistical-modeling
Perform statistical modeling and regression analysis on biomedical datasets. Supports linear regression, logistic regression (binary/ordinal/multinomial), mixed-effects models, Cox proportional hazards survival analysis, Kaplan-Meier estimation, and comprehensive model diagnostics. Extracts odds ratios, hazard ratios, confidence intervals, p-values, and effect sizes. Designed to solve BixBench statistical reasoning questions involving clinical/experimental data. Use when asked to fit regression models, compute odds ratios, perform survival analysis, run statistical tests, or interpret model coefficients from provided data.
tooluniverse-spatial-transcriptomics
Analyze spatial transcriptomics data to map gene expression in tissue architecture. Supports 10x Visium, MERFISH, seqFISH, Slide-seq, and imaging-based platforms. Performs spatial clustering, domain identification, cell-cell proximity analysis, spatial gene expression patterns, tissue architecture mapping, and integration with single-cell data. Use when analyzing spatial transcriptomics datasets, studying tissue organization, identifying spatial expression patterns, mapping cell-cell interactions in tissue context, characterizing tumor microenvironment spatial structure, or integrating spatial and single-cell RNA-seq data for comprehensive tissue analysis.
tooluniverse-spatial-omics-analysis
Computational analysis framework for spatial multi-omics data integration. Given spatially variable genes (SVGs), spatial domain annotations, tissue type, and disease context from spatial transcriptomics/proteomics experiments (10x Visium, MERFISH, DBiTplus, SLIDE-seq, etc.), performs comprehensive biological interpretation including pathway enrichment, cell-cell interaction inference, druggable target identification, immune microenvironment characterization, and multi-modal integration. Produces a detailed markdown report with Spatial Omics Integration Score (0-100), domain-by-domain characterization, and validation recommendations. Uses 70+ ToolUniverse tools across 9 analysis phases. Use when users ask about spatial transcriptomics analysis, spatial omics interpretation, tissue heterogeneity, spatial gene expression patterns, tumor microenvironment mapping, tissue zonation, or cell-cell communication from spatial data.
tooluniverse-single-cell
Production-ready single-cell and expression matrix analysis using scanpy, anndata, and scipy. Performs scRNA-seq QC, normalization, PCA, UMAP, Leiden/Louvain clustering, differential expression (Wilcoxon, t-test, DESeq2), cell type annotation, per-cell-type statistical analysis, gene-expression correlation, batch correction (Harmony), trajectory inference, and cell-cell communication analysis. NEW: Analyzes ligand-receptor interactions between cell types using OmniPath (CellPhoneDB, CellChatDB), scores communication strength, identifies signaling cascades, and handles multi-subunit receptor complexes. Integrates with ToolUniverse gene annotation tools (HPA, Ensembl, MyGene, UniProt) and enrichment tools (gseapy, PANTHER, STRING). Supports h5ad, 10X, CSV/TSV count matrices, and pre-annotated datasets. Use when analyzing single-cell RNA-seq data, studying cell-cell interactions, performing cell type differential expression, computing gene-expression correlations by cell type, analyzing tumor-immune communication, or answering questions about scRNA-seq datasets.
tooluniverse-sequence-retrieval
Retrieves biological sequences (DNA, RNA, protein) from NCBI and ENA with gene disambiguation, accession type handling, and comprehensive sequence profiles. Creates detailed reports with sequence metadata, cross-database references, and download options. Use when users need nucleotide sequences, protein sequences, genome data, or mention GenBank, RefSeq, EMBL accessions.
tooluniverse-rnaseq-deseq2
Production-ready RNA-seq differential expression analysis using PyDESeq2. Performs DESeq2 normalization, dispersion estimation, Wald testing, LFC shrinkage, and result filtering. Handles multi-factor designs, multiple contrasts, batch effects, and integrates with gene enrichment (gseapy) and ToolUniverse annotation tools (UniProt, Ensembl, OpenTargets). Supports CSV/TSV/H5AD input formats and any organism. Use when analyzing RNA-seq count matrices, identifying DEGs, performing differential expression with statistical rigor, or answering questions about gene expression changes.
tooluniverse-rare-disease-diagnosis
Provide differential diagnosis for patients with suspected rare diseases based on phenotype and genetic data. Matches symptoms to HPO terms, identifies candidate diseases from Orphanet/OMIM, prioritizes genes for testing, interprets variants of uncertain significance. Use when clinician asks about rare disease diagnosis, unexplained phenotypes, or genetic testing interpretation.