drug-discovery-pipeline
Orchestrates a full drug discovery workflow from target identification through lead optimization. Use when searching for drug candidates against a biological target, evaluating compound libraries, or optimizing hits for drug-likeness. NOT for pure protein structure analysis or single-compound lookups.
Best use case
drug-discovery-pipeline is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Orchestrates a full drug discovery workflow from target identification through lead optimization. Use when searching for drug candidates against a biological target, evaluating compound libraries, or optimizing hits for drug-likeness. NOT for pure protein structure analysis or single-compound lookups.
Teams using drug-discovery-pipeline should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/drug-discovery-pipeline/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How drug-discovery-pipeline Compares
| Feature / Agent | drug-discovery-pipeline | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Orchestrates a full drug discovery workflow from target identification through lead optimization. Use when searching for drug candidates against a biological target, evaluating compound libraries, or optimizing hits for drug-likeness. NOT for pure protein structure analysis or single-compound lookups.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
# Drug Discovery Pipeline (Meta Skill) This meta-skill orchestrates a multi-stage drug discovery workflow by combining target validation, compound searching, property filtering, and lead optimization into a single coherent pipeline. It coordinates four specialized skills to move from a biological target to a ranked list of drug candidates. ## Workflow ### Step 1: Target Validation Query UniProt for the target protein to gather functional annotations, known domains, post-translational modifications, and disease associations. Assess druggability by checking for known binding pockets, ligand-binding domains, and membership in established druggable protein families (kinases, GPCRs, ion channels, nuclear receptors). ### Step 2: Known Drug and Compound Survey Query ChEMBL for existing drugs, clinical candidates, and bioactive compounds reported against the target. Collect activity data (IC50, Ki, EC50) and note selectivity profiles. Identify chemical series and mechanism of action classes already explored in the literature. ### Step 3: Lead Expansion via Similarity Search Use PubChem similarity and substructure searches to find structural analogs of the most promising hits from Step 2. Expand the candidate pool by exploring nearby chemical space using Tanimoto similarity with ECFP4 fingerprints. Retrieve vendor availability and patent status where possible. ### Step 4: Property Filtering and ADMET Prediction Apply RDKit to compute molecular descriptors and filter candidates through established drug-likeness rules: - Lipinski Rule of Five (MW, LogP, HBD, HBA) - Veber rules (rotatable bonds, TPSA) - PAINS filter to remove frequent hitters - ADMET property estimation (solubility, permeability, CYP inhibition flags) Remove compounds that violate multiple criteria or show structural alerts. ### Step 5: Compound Ranking and Prioritization Score remaining candidates using a weighted multi-parameter optimization: - Potency (pIC50 or pKi against target) - Selectivity (activity ratio vs. off-targets) - Drug-likeness (QED score) - Synthetic accessibility (SA score) - Novelty (Tanimoto distance from known drugs) Output a ranked table of top candidates with reasoning for each score. ## Integration Points - **uniprot-protein** -- Target protein annotation, domain architecture, druggability assessment - **chembl-drug** -- Bioactivity data, existing drugs, SAR context for the target - **pubchem-compound** -- Similarity searching, analog identification, vendor availability - **rdkit-chemistry** -- Descriptor calculation, filtering rules, ADMET prediction, scoring ## Output Formats - **Target summary**: Protein name, function, druggability assessment, known ligands - **Compound table**: SMILES, name, source, activity, drug-likeness scores - **Ranked list**: Top 10-20 candidates with composite scores and rationale - **SAR notes**: Observed structure-activity trends across chemical series ## Best Practices 1. Always validate the target before searching for compounds to avoid wasted effort 2. Set activity thresholds early (e.g., IC50 < 1 uM) to keep the candidate pool manageable 3. Use multiple fingerprint types for similarity search to capture diverse analogs 4. Apply PAINS filters before investing effort in detailed ADMET analysis 5. Document the rationale for each filtering step to maintain reproducibility 6. Consider the therapeutic area when weighting ranking criteria 7. Flag compounds with known IP restrictions or limited synthetic routes 8. Cross-check top candidates against ChEMBL for any reported toxicity signals 9. Present results with confidence levels reflecting data quality and coverage 10. Iterate the pipeline if initial results are sparse by relaxing similarity thresholds
Related Skills
scienceclaw-discovery
Identify research gaps, synthesize cross-disciplinary insights, and generate novel hypotheses. Use when: user asks about unexplored areas, cross-field connections, or new research directions. NOT for: routine literature review or data analysis.
ml-pipeline
Machine learning pipeline for scientific research including data preprocessing, feature engineering, model selection, training, evaluation, and interpretation. Covers supervised/unsupervised learning, deep learning, cross-validation, hyperparameter tuning, and model explainability. Use when user asks to build a predictive model, classify data, cluster samples, do feature selection, or apply ML to research data. Triggers on "machine learning", "classification", "clustering", "random forest", "neural network", "deep learning", "predict", "feature selection", "cross-validation", "train model".
knowledge-discovery
Discover patterns, build knowledge graphs, and extract insights from linguistic and historical data
drug-discovery
Supports drug discovery workflows including target identification, virtual screening, ADMET prediction, lead optimization, pharmacokinetics modeling, and drug repurposing analyses; trigger when users discuss drug targets, compound libraries, medicinal chemistry, or pharmaceutical development.
drug-discovery-search
End-to-end drug discovery platform combining ChEMBL compounds, DrugBank, targets, and FDA labels. Natural language powered by Valyu.
chembl-drug
Query the ChEMBL REST API for drug-target interactions, bioactivity data, ADMET properties, and approved drug information. Use when the user needs drug mechanism of action, binding affinity data, target information, or pharmacokinetic properties. NOT for basic compound lookup (use pubchem-compound), NOT for gene-disease associations (use open-targets), NOT for protein 3D structures (use pdb-structure).
xurl
A CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.
xlsx
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
writing
No description provided.
world-bank-data
World Bank Open Data API for development indicators. Use when: user asks about GDP, population, poverty, health, or education statistics by country. NOT for: real-time financial data or stock prices.
wikipedia-search
Search and fetch structured content from Wikipedia using the MediaWiki API for reliable, encyclopedic information
wikidata-knowledge
Query Wikidata for structured knowledge using SPARQL and entity search. Use when: (1) finding structured facts about entities (people, places, organizations), (2) querying relationships between entities, (3) cross-referencing external identifiers (Wikipedia, VIAF, GND, ORCID), (4) building knowledge graphs from linked data. NOT for: full-text article content (use Wikipedia API), scientific literature (use semantic-scholar), geospatial data (use OpenStreetMap).