drug-discovery-pipeline

Orchestrates a full drug discovery workflow from target identification through lead optimization. Use when searching for drug candidates against a biological target, evaluating compound libraries, or optimizing hits for drug-likeness. NOT for pure protein structure analysis or single-compound lookups.

564 stars

bybeita6969

View on GitHub Installation ↓

Best use case

drug-discovery-pipeline is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using drug-discovery-pipeline should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/drug-discovery-pipeline/SKILL.md --create-dirs "https://raw.githubusercontent.com/beita6969/ScienceClaw/main/skills/drug-discovery-pipeline/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/drug-discovery-pipeline/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How drug-discovery-pipeline Compares

Feature / Agent	drug-discovery-pipeline	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

SKILL.md Source

# Drug Discovery Pipeline (Meta Skill)

This meta-skill orchestrates a multi-stage drug discovery workflow by combining
target validation, compound searching, property filtering, and lead optimization
into a single coherent pipeline. It coordinates four specialized skills to move
from a biological target to a ranked list of drug candidates.

## Workflow

### Step 1: Target Validation

Query UniProt for the target protein to gather functional annotations, known
domains, post-translational modifications, and disease associations. Assess
druggability by checking for known binding pockets, ligand-binding domains,
and membership in established druggable protein families (kinases, GPCRs,
ion channels, nuclear receptors).

### Step 2: Known Drug and Compound Survey

Query ChEMBL for existing drugs, clinical candidates, and bioactive compounds
reported against the target. Collect activity data (IC50, Ki, EC50) and note
selectivity profiles. Identify chemical series and mechanism of action classes
already explored in the literature.

### Step 3: Lead Expansion via Similarity Search

Use PubChem similarity and substructure searches to find structural analogs
of the most promising hits from Step 2. Expand the candidate pool by exploring
nearby chemical space using Tanimoto similarity with ECFP4 fingerprints.
Retrieve vendor availability and patent status where possible.

### Step 4: Property Filtering and ADMET Prediction

Apply RDKit to compute molecular descriptors and filter candidates through
established drug-likeness rules:
- Lipinski Rule of Five (MW, LogP, HBD, HBA)
- Veber rules (rotatable bonds, TPSA)
- PAINS filter to remove frequent hitters
- ADMET property estimation (solubility, permeability, CYP inhibition flags)

Remove compounds that violate multiple criteria or show structural alerts.

### Step 5: Compound Ranking and Prioritization

Score remaining candidates using a weighted multi-parameter optimization:
- Potency (pIC50 or pKi against target)
- Selectivity (activity ratio vs. off-targets)
- Drug-likeness (QED score)
- Synthetic accessibility (SA score)
- Novelty (Tanimoto distance from known drugs)

Output a ranked table of top candidates with reasoning for each score.

## Integration Points

- **uniprot-protein** -- Target protein annotation, domain architecture, druggability assessment
- **chembl-drug** -- Bioactivity data, existing drugs, SAR context for the target
- **pubchem-compound** -- Similarity searching, analog identification, vendor availability
- **rdkit-chemistry** -- Descriptor calculation, filtering rules, ADMET prediction, scoring

## Output Formats

- **Target summary**: Protein name, function, druggability assessment, known ligands
- **Compound table**: SMILES, name, source, activity, drug-likeness scores
- **Ranked list**: Top 10-20 candidates with composite scores and rationale
- **SAR notes**: Observed structure-activity trends across chemical series

## Best Practices

1. Always validate the target before searching for compounds to avoid wasted effort
2. Set activity thresholds early (e.g., IC50 < 1 uM) to keep the candidate pool manageable
3. Use multiple fingerprint types for similarity search to capture diverse analogs
4. Apply PAINS filters before investing effort in detailed ADMET analysis
5. Document the rationale for each filtering step to maintain reproducibility
6. Consider the therapeutic area when weighting ranking criteria
7. Flag compounds with known IP restrictions or limited synthetic routes
8. Cross-check top candidates against ChEMBL for any reported toxicity signals
9. Present results with confidence levels reflecting data quality and coverage
10. Iterate the pipeline if initial results are sparse by relaxing similarity thresholds

Related Skills

scienceclaw-discovery

564

from beita6969/ScienceClaw

Identify research gaps, synthesize cross-disciplinary insights, and generate novel hypotheses. Use when: user asks about unexplored areas, cross-field connections, or new research directions. NOT for: routine literature review or data analysis.

ml-pipeline

564

from beita6969/ScienceClaw

Machine learning pipeline for scientific research including data preprocessing, feature engineering, model selection, training, evaluation, and interpretation. Covers supervised/unsupervised learning, deep learning, cross-validation, hyperparameter tuning, and model explainability. Use when user asks to build a predictive model, classify data, cluster samples, do feature selection, or apply ML to research data. Triggers on "machine learning", "classification", "clustering", "random forest", "neural network", "deep learning", "predict", "feature selection", "cross-validation", "train model".

knowledge-discovery

564

from beita6969/ScienceClaw

Discover patterns, build knowledge graphs, and extract insights from linguistic and historical data

drug-discovery

564

from beita6969/ScienceClaw

Supports drug discovery workflows including target identification, virtual screening, ADMET prediction, lead optimization, pharmacokinetics modeling, and drug repurposing analyses; trigger when users discuss drug targets, compound libraries, medicinal chemistry, or pharmaceutical development.

drug-discovery-search

564

from beita6969/ScienceClaw

End-to-end drug discovery platform combining ChEMBL compounds, DrugBank, targets, and FDA labels. Natural language powered by Valyu.

chembl-drug

564

from beita6969/ScienceClaw

Query the ChEMBL REST API for drug-target interactions, bioactivity data, ADMET properties, and approved drug information. Use when the user needs drug mechanism of action, binding affinity data, target information, or pharmacokinetic properties. NOT for basic compound lookup (use pubchem-compound), NOT for gene-disease associations (use open-targets), NOT for protein 3D structures (use pdb-structure).

xurl

564

from beita6969/ScienceClaw

A CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.

xlsx

564

from beita6969/ScienceClaw

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

writing

564

from beita6969/ScienceClaw

No description provided.

world-bank-data

564

from beita6969/ScienceClaw

World Bank Open Data API for development indicators. Use when: user asks about GDP, population, poverty, health, or education statistics by country. NOT for: real-time financial data or stock prices.

wikipedia-search

564

from beita6969/ScienceClaw

Search and fetch structured content from Wikipedia using the MediaWiki API for reliable, encyclopedic information

wikidata-knowledge

564

from beita6969/ScienceClaw

Query Wikidata for structured knowledge using SPARQL and entity search. Use when: (1) finding structured facts about entities (people, places, organizations), (2) querying relationships between entities, (3) cross-referencing external identifiers (Wikipedia, VIAF, GND, ORCID), (4) building knowledge graphs from linked data. NOT for: full-text article content (use Wikipedia API), scientific literature (use semantic-scholar), geospatial data (use OpenStreetMap).