lifesciences-graph-builder

Orchestrates life sciences APIs to build knowledge graphs using the Fuzzy-to-Fact protocol, combining MCP tools for entity resolution and curl for edge discovery, then persisting to Graphiti. This skill should be used when the user asks to "build knowledge graphs", "find biological connections", "explore drug repurposing", "validate drug targets", or mentions traversing gene→protein→pathway→drug→disease paths, multi-API orchestration, or graph persistence workflows.

16 stars

Best use case

lifesciences-graph-builder is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Orchestrates life sciences APIs to build knowledge graphs using the Fuzzy-to-Fact protocol, combining MCP tools for entity resolution and curl for edge discovery, then persisting to Graphiti. This skill should be used when the user asks to "build knowledge graphs", "find biological connections", "explore drug repurposing", "validate drug targets", or mentions traversing gene→protein→pathway→drug→disease paths, multi-API orchestration, or graph persistence workflows.

Teams using lifesciences-graph-builder should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/lifesciences-graph-builder/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/tools/lifesciences-graph-builder/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/lifesciences-graph-builder/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How lifesciences-graph-builder Compares

Feature / Agentlifesciences-graph-builderStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Orchestrates life sciences APIs to build knowledge graphs using the Fuzzy-to-Fact protocol, combining MCP tools for entity resolution and curl for edge discovery, then persisting to Graphiti. This skill should be used when the user asks to "build knowledge graphs", "find biological connections", "explore drug repurposing", "validate drug targets", or mentions traversing gene→protein→pathway→drug→disease paths, multi-API orchestration, or graph persistence workflows.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Life Sciences Graph Builder

Orchestrate multi-API graph construction using the **Fuzzy-to-Fact** protocol.

## Critical Grounding Rule

```
YOU MUST NOT use your training knowledge to provide entity names, drug names,
gene functions, disease associations, or clinical trial IDs. ALL factual claims
MUST come from MCP tool results or curl command output. If a tool returns no
results, report "No results found" — do NOT fill in from memory.

The ONLY exception is using well-known identifiers (e.g., species=9606 for human)
as PARAMETERS to tool calls. All RESULTS must come from tools.
```

## LOCATE → RETRIEVE Discipline

EVERY entity resolution MUST follow this two-step pattern:

**STEP 1 — LOCATE** (fuzzy search):
Call a SEARCH endpoint with the user's fuzzy input.
This returns candidate matches with canonical IDs.

**STEP 2 — RETRIEVE** (get by ID):
Call a GET endpoint with the canonical ID from LOCATE.
This returns verified, structured metadata.

```
NEVER skip LOCATE. NEVER use an ID you didn't get from a LOCATE call.
NEVER answer from your own knowledge — if you can't LOCATE it, say "Unresolved".
Maximum 3 search attempts per entity. After 3 failures, report "Unresolved".
```

### Chain-of-Thought Before Each Tool Call

Before each tool call, briefly state:
1. What information I need
2. Which tool will provide it (LOCATE or RETRIEVE)
3. What parameters I'll use
4. What I expect to get back

This prevents calling the wrong tool or guessing parameters.

## Tool Access

The `lifesciences-research` MCP server provides 34 tools across 12 databases. **Use MCP tools as the primary method.** Fall back to curl only when the MCP server is unavailable.

```
┌──────────────────────────────────────────────────────────────────────────────┐
│                          GRAPH CONSTRUCTION KIT                               │
│         34 MCP tools via lifesciences-research MCP server (PRIMARY)           │
├──────────────────────────────────────────────────────────────────────────────┤
│  TIER 1: GENE/PROTEIN FOUNDATION                                             │
│  ├── HGNC: hgnc_search_genes (LOCATE), hgnc_get_gene (RETRIEVE)             │
│  ├── UniProt: uniprot_search_proteins (LOCATE), uniprot_get_protein          │
│  ├── STRING: string_search_proteins (LOCATE), string_get_interactions,       │
│  │           string_get_network_image_url                                     │
│  └── BioGRID: biogrid_search_genes (LOCATE), biogrid_get_interactions        │
├──────────────────────────────────────────────────────────────────────────────┤
│  TIER 2: DRUG DISCOVERY                                                       │
│  ├── ChEMBL: chembl_search_compounds (LOCATE), chembl_get_compound,          │
│  │           chembl_get_compounds_batch                                       │
│  ├── Open Targets: opentargets_search_targets (LOCATE),                      │
│  │           opentargets_get_target, opentargets_get_associations             │
│  ├── PubChem: pubchem_search_compounds (LOCATE), pubchem_get_compound        │
│  └── IUPHAR: iuphar_search_ligands, iuphar_get_ligand,                      │
│              iuphar_search_targets, iuphar_get_target                         │
├──────────────────────────────────────────────────────────────────────────────┤
│  TIER 3: PATHWAYS & CLINICAL                                                  │
│  ├── WikiPathways: wikipathways_search_pathways,                             │
│  │           wikipathways_get_pathway, wikipathways_get_pathways_for_gene,    │
│  │           wikipathways_get_pathway_components                              │
│  └── ClinicalTrials: clinicaltrials_search_trials (LOCATE),                  │
│              clinicaltrials_get_trial, clinicaltrials_get_trial_locations     │
├──────────────────────────────────────────────────────────────────────────────┤
│  TIER 4: GENOMICS & IDENTIFIERS                                               │
│  ├── Ensembl: ensembl_search_genes (LOCATE), ensembl_get_gene,               │
│  │            ensembl_get_transcript                                          │
│  └── Entrez: entrez_search_genes (LOCATE), entrez_get_gene,                  │
│              entrez_get_pubmed_links                                          │
├──────────────────────────────────────────────────────────────────────────────┤
│  DIRECT API (curl — for GraphQL & edge discovery when MCP unavailable)        │
│  ├── Open Targets GraphQL: knownDrugs, associatedDiseases, tractability      │
│  ├── ChEMBL /mechanism: Drug → Target edges                                  │
│  └── STRING /enrichment: Protein Set → GO/KEGG terms                         │
├──────────────────────────────────────────────────────────────────────────────┤
│  GRAPHITI (Persistence)                                                       │
│  └── persist_to_graphiti: Save validated subgraph as JSON episode             │
└──────────────────────────────────────────────────────────────────────────────┘
```

## MCP Token Budgeting (`slim` Parameter)

ALL 34 MCP tools support a `slim` parameter for phase-specific token budgeting:

**Phase-Specific Usage:**
- **LOCATE phases (1 ANCHOR, 3 EXPAND)**: Use `slim=true, page_size=3-5` for fast candidate lists
- **RETRIEVE phases (2 ENRICH, 4 TRAVERSE, 5 VALIDATE)**: Use `slim=false` (default) for full metadata

**Token Savings:**
- `slim=true`: ~20 tokens/entity (ID, symbol, name only)
- `slim=false`: ~115-300 tokens/entity (full metadata + cross-references)

**Example (Phase 1 ANCHOR):**
```
# LOCATE: Find gene candidates with slim=true
Call `hgnc_search_genes` with: {"query": "ACVR", "slim": true, "page_size": 5}
→ Returns 5 candidates at ~20 tokens each = 100 tokens total

# RETRIEVE: Get full record for selected candidate
Call `hgnc_get_gene` with: {"hgnc_id": "HGNC:171"}
→ Returns complete metadata (~115 tokens)
```

**Impact:** Using `slim=true` during LOCATE phases enables batch resolution of 10-50 entities per LLM turn, critical for network expansion (Phase 3) and drug discovery (Phase 4a).

**Reference:** Token budgeting pattern documented in `reference/prior-art-api-patterns.md` (Section 7.1).

## CURIE Format Conventions

Two contexts require different ID formats:

### API Arguments (bare IDs)
Use bare IDs matching what the MCP server or API accepts:
- `"Q04771"` for UniProt
- `"CHEMBL25"` for ChEMBL compounds
- `"9606.ENSP00000269305"` for STRING proteins
- `"ENSG00000141510"` for Ensembl genes
- `"HGNC:11998"` for HGNC (includes prefix — this is how HGNC API works)

### Graph Node IDs (full CURIEs for Graphiti)
Use `PREFIX:LOCAL_ID` format for persistence:
- `"HGNC:11998"` for genes
- `"UniProtKB:P04637"` for proteins
- `"CHEMBL:3137309"` for compounds
- `"STRING:9606.ENSP00000269305"` for STRING proteins
- `"NCT03312634"` for trials (no prefix needed — NCT is the standard)
- `"EFO:0000574"` or `"MONDO:0018875"` for diseases
- `"WP:WP1742"` for pathways

## Fuzzy-to-Fact Execution Checklist

- [ ] Phase 1 ANCHOR: LOCATE gene/drug/disease → RETRIEVE canonical CURIEs
- [ ] Phase 2 ENRICH: RETRIEVE metadata for each CURIE (UniProt function, cross-refs)
- [ ] Phase 3 EXPAND: LOCATE interaction partners → RETRIEVE pathway membership
- [ ] Phase 4a TRAVERSE_DRUGS: LOCATE drugs targeting identified proteins → RETRIEVE mechanisms
- [ ] Phase 4b TRAVERSE_TRIALS: LOCATE trials for identified drugs → RETRIEVE trial details
- [ ] Phase 5 VALIDATE: RETRIEVE verification for every NCT ID, drug mechanism, gene-disease link
- [ ] Phase 6a PERSIST: Format validated graph as JSON → persist to Graphiti
- [ ] Phase 6b REPORT: Select template → grade evidence → format final report (lifesciences-reporting skill)

## Phase 1: ANCHOR — Entity Resolution

**Goal**: Resolve every gene, drug, and disease in the user's query to canonical CURIEs.

**LOCATE** (always start with HGNC for genes — fastest, most reliable):

PRIMARY (MCP tool):
```
Call `hgnc_search_genes` with: {"query": "TP53"}
→ Returns candidates with HGNC IDs → select best match: HGNC:11998
```

FALLBACK (curl):
```bash
curl -s "https://rest.genenames.org/search/symbol/TP53" \
  -H "Accept: application/json" | jq '.response.docs[0] | {hgnc_id, symbol, name}'
```

**RETRIEVE** (get full record with cross-references):

PRIMARY (MCP tool):
```
Call `hgnc_get_gene` with: {"hgnc_id": "HGNC:11998"}
→ Returns: symbol, name, UniProt cross-ref, Ensembl cross-ref, Entrez cross-ref
```

FALLBACK (curl):
```bash
curl -s "https://rest.genenames.org/fetch/hgnc_id/11998" \
  -H "Accept: application/json" | jq '.response.docs[0]'
```

**For drugs** (try ChEMBL search, note failure for Phase 4a fallback):

PRIMARY (MCP tool):
```
Call `chembl_search_compounds` with: {"query": "Venetoclax"}
→ If 500 error: note it, move on — drugs resolved in Phase 4a via Open Targets
```

**For diseases** (use ClinicalTrials.gov or Open Targets search):

PRIMARY (MCP tool):
```
Call `clinicaltrials_search_trials` with: {"query": "fibrodysplasia ossificans progressiva"}
```

**Output**: JSON with resolved entities:
```json
{
  "entities": [
    {"mention": "TP53", "type": "Gene", "curie": "HGNC:11998", "symbol": "TP53", "status": "resolved"},
    {"mention": "FOP", "type": "Disease", "curie": "MONDO:0018875", "status": "resolved"}
  ],
  "unresolved": []
}
```

## Phase 2: ENRICH — Metadata Enrichment

**Goal**: Decorate each resolved entity with metadata and cross-references. UniProt function text is the most valuable output.

**RETRIEVE** gene metadata (captures cross-references for downstream phases):

PRIMARY (MCP tool):
```
Call `hgnc_get_gene` with: {"hgnc_id": "HGNC:11998"}
→ Extract: uniprot_id="P04637", ensembl_id="ENSG00000141510", entrez_id="7157"
```

**RETRIEVE** protein function (reveals interactors, pathways, disease connections):

PRIMARY (MCP tool):
```
Call `uniprot_get_protein` with: {"uniprot_id": "P04637"}
→ Parse function text for interactor mentions: BAX, BCL2, FAS, MDM2
```

**RETRIEVE** drug metadata (if drug was resolved in Phase 1):

PRIMARY (MCP tool):
```
Call `chembl_get_compound` with: {"chembl_id": "CHEMBL3137309"}
→ May return 500 — note failure, continue
```

**LOCATE** disease CURIE (use Ensembl ID from HGNC cross-references above):

PRIMARY (MCP tool):
```
Call `opentargets_get_associations` with: {"ensembl_id": "ENSG00000115170"}
→ Returns diseases with MONDO/EFO/Orphanet IDs + association scores
→ Pick the highest-scoring disease matching the user's query
→ Record disease CURIE (e.g., MONDO:0007606 for FOP) for Phase 4a/4b
```

**Disease CURIE Optionality**:
- REQUIRED if drug discovery (Phase 4a) or clinical trial search (Phase 4b) is in scope
- OPTIONAL for gene network questions (Template 2) without therapeutic focus
- If query is about biological mechanisms (not therapeutics), disease CURIE may be omitted

FALLBACK (curl):
```bash
curl -s -X POST "https://api.platform.opentargets.org/api/v4/graphql" \
  -H "Content-Type: application/json" \
  -d '{"query": "{ target(ensemblId: \"ENSG00000115170\") { associatedDiseases(page: {index: 0, size: 5}) { rows { disease { name id } score } } } }"}'
```

**Critical outputs for downstream phases**:
- Ensembl ID (ENSG...) → needed by Phase 4a for Open Targets GraphQL
- UniProt ID → needed by Phase 3 for STRING
- Interactor mentions from function text → guides Phase 3 expansion
- **Disease CURIE** (MONDO/EFO ID) → needed by Phase 4a/4b for drug/trial filtering (OPTIONAL if Phases 4a/4b not in scope)

## Phase 3: EXPAND — Network Expansion

**Goal**: Build adjacency list from interaction databases.

**LOCATE** STRING protein ID:

PRIMARY (MCP tool):
```
Call `string_search_proteins` with: {"query": "TP53", "species": 9606}
→ Returns: 9606.ENSP00000269305
```

**RETRIEVE** protein interactions:

PRIMARY (MCP tool):
```
Call `string_get_interactions` with: {"string_id": "9606.ENSP00000269305", "species": 9606, "required_score": 700}
→ Returns: MDM2 (0.999), SIRT1 (0.999), ATM (0.995), BCL2
```

**RETRIEVE** pathway membership:

PRIMARY (MCP tool):
```
Call `wikipathways_get_pathways_for_gene` with: {"gene_id": "TP53"}
```

**RETRIEVE** gene-disease associations (Open Targets):

PRIMARY (MCP tool):
```
Call `opentargets_get_associations` with: {"ensembl_id": "ENSG00000141510"}
→ Returns associated diseases with scores
```

FALLBACK (curl — for custom GraphQL queries):
```bash
curl -s -X POST "https://api.platform.opentargets.org/api/v4/graphql" \
  -H "Content-Type: application/json" \
  -d '{"query": "{ target(ensemblId: \"ENSG00000141510\") { associatedDiseases(page: {index: 0, size: 5}) { rows { disease { name id } score } } } }"}'
```

**Pitfalls**:
- STRING batch queries (multiple proteins) return protein names; single queries may NOT. Prefer batch.
- STRING rate limit is 1 req/s.
- Always use `species=9606` (human) unless explicitly doing comparative genomics.
- BioGRID requires `BIOGRID_API_KEY` — check with `grep BIOGRID_API_KEY .env`.

## Phase 4a: TRAVERSE_DRUGS — Drug Discovery

**Goal**: Find drugs targeting identified proteins. Open Targets is the PRIMARY source (more reliable than ChEMBL).

**LOCATE** drugs via Open Targets (preferred — returns drugs + mechanisms + phases in one call):

PRIMARY (MCP tool):
```
Call `opentargets_get_target` with: {"ensembl_id": "ENSG00000171791"}
→ Returns knownDrugs with drug name, mechanismOfAction, phase
```

FALLBACK (curl — for full GraphQL control):
```bash
curl -s -X POST "https://api.platform.opentargets.org/api/v4/graphql" \
  -H "Content-Type: application/json" \
  -d '{"query": "{ target(ensemblId: \"ENSG00000171791\") { knownDrugs(size: 25) { rows { drug { name id } mechanismOfAction phase } } } }"}'
```

**Open Targets `knownDrugs` Pagination**:
- Use `size` parameter only (e.g., `size: 25`) — this is the reliable pattern
- Do NOT use `page` or `index` — these cause intermittent failures
- For paginated results, use `cursor` (returned in the response) as the continuation token
- If first query fails, retry with `size` only (no other pagination params)

**LOCATE** drugs via ChEMBL (secondary fallback — frequently 500s on detail endpoints):

PRIMARY (MCP tool):
```
Call `chembl_search_compounds` with: {"query": "BCL2 inhibitor"}
→ Search endpoint is generally reliable
```

**Gain-of-Function Disease Filter** (CRITICAL for diseases like FOP):
```
For GAIN-OF-FUNCTION diseases (e.g., FOP caused by constitutive ACVR1 activation),
you need INHIBITORS or ANTAGONISTS. Do NOT return agonists — they worsen the disease.

Check the "mechanismOfAction" field from Open Targets:
- INCLUDE: "inhibitor", "antagonist", "negative modulator", "antibody (blocking)"
- EXCLUDE: "agonist", "positive modulator", "activator"
```

**Pitfalls**:
- Open Targets GraphQL `knownDrugs`: use `size` only (e.g., `size: 25`). Do NOT use `page`/`index`. Use `cursor` for pagination.
- The Ensembl ID (ENSG...) is required for Open Targets target queries — get from Phase 2.
- ChEMBL detail endpoints (`chembl_get_compound`) often return 500 errors; search endpoints (`chembl_search_compounds`) are generally reliable.
- Do NOT retry ChEMBL more than once — switch to Open Targets.

## Phase 4b: TRAVERSE_TRIALS — Clinical Trial Discovery

**Goal**: Find clinical trials for identified drugs. Can run in parallel with Phase 4a.

**LOCATE** trials by drug + disease:

PRIMARY (MCP tool):
```
Call `clinicaltrials_search_trials` with: {"query": "venetoclax AND leukemia"}
→ Returns trials with NCT IDs, phases, statuses
```

FALLBACK (curl):
```bash
curl -s "https://clinicaltrials.gov/api/v2/studies?query.cond=cancer&query.intr=venetoclax&pageSize=5&format=json" \
  | jq '.studies[] | {nct: .protocolSection.identificationModule.nctId, title: .protocolSection.identificationModule.briefTitle, phase: .protocolSection.designModule.phases, status: .protocolSection.statusModule.overallStatus}'
```

**Fallback** — disease-only search (when drug-specific search returns zero):
```bash
curl -s "https://clinicaltrials.gov/api/v2/studies?query.cond=fibrodysplasia+ossificans+progressiva&pageSize=10&format=json"
```

**ClinicalTrials.gov v2 Valid Parameters**:
- `query.cond` — condition/disease
- `query.intr` — intervention/drug
- `query.term` — general search (supports `AREA[StudyType]INTERVENTIONAL`)
- `filter.overallStatus` — e.g., `RECRUITING`, `COMPLETED`
- `pageSize` — results per page
- `format` — `json`

**Invalid parameter**: `filter.studyType` does NOT exist in v2 API. Use `query.term=AREA[StudyType]INTERVENTIONAL` instead.

**Pitfalls**:
- Use SPECIFIC drug names from Phase 4a output, NOT broad terms like "inhibitor".
- Search for each drug separately to avoid missing trials.
- If drug-specific search returns zero, try disease-only search as fallback.

## Phase 5: VALIDATE — Fact Verification

**Goal**: Verify every NCT ID, drug mechanism, and gene-disease claim. Prevents hallucinations.

**RETRIEVE** trial verification:

PRIMARY (MCP tool):
```
Call `clinicaltrials_get_trial` with: {"nct_id": "NCT03312634"}
→ If "Entity Not Found" → mark as INVALID
```

**RETRIEVE** cross-database ID verification:

FALLBACK (curl — for Ensembl xrefs):
```bash
curl -s "https://rest.ensembl.org/xrefs/id/ENSG00000141510?content-type=application/json" \
  | jq '.[] | select(.dbname | test("HGNC|UniProt|OMIM|RefSeq")) | {db: .dbname, id: .primary_id}'
```

**Validation checklist**:
- [ ] Every NCT ID verified via `clinicaltrials_get_trial`
- [ ] Drug mechanisms match what Open Targets/ChEMBL reported
- [ ] Gene-protein ID mappings consistent across HGNC, UniProt, Ensembl
- [ ] No entity was introduced from parametric knowledge (everything traces to a tool call)

**Verdicts**: Mark each fact as `VALIDATED`, `INVALID` (with reason), or `UNVERIFIABLE`.

## Phase 6a: PERSIST — Graph Persistence

**Goal**: Format validated graph as JSON and persist to Graphiti.

**Structure** (only include VALIDATED entities and relationships):
```python
graph_data = {
    "nodes": [
        {"id": "HGNC:11998", "type": "Gene", "label": "TP53", "properties": {"ensembl": "ENSG00000141510"}},
        {"id": "HGNC:990", "type": "Gene", "label": "BCL2", "properties": {"ensembl": "ENSG00000171791"}},
        {"id": "CHEMBL:3137309", "type": "Compound", "label": "Venetoclax", "properties": {"phase": 4}}
    ],
    "edges": [
        {"source": "HGNC:11998", "target": "HGNC:990", "type": "REGULATES", "properties": {}},
        {"source": "CHEMBL:3137309", "target": "HGNC:990", "type": "INHIBITOR", "properties": {"mechanism": "BCL2 inhibitor"}}
    ]
}
```

**Persist** (if Graphiti is available):
```python
persist_to_graphiti(
    name="TP53-BCL2-Venetoclax pathway",
    episode_body=json.dumps(graph_data),
    source="json",
    group_id="drug-repurposing"
)
```

## Phase 6b: REPORT — Formatted Report with Evidence Grading

**Goal**: Produce a professional report using the **lifesciences-reporting** skill.

Use the `lifesciences-reporting` skill to format the Phases 1-5 output. The skill will:
1. **Template selection**: Route query through its Template Decision Tree (7 templates)
2. **Evidence grading**: Apply L1-L4 levels + modifiers to all claims
3. **Confidence calculation**: Compute median of all claim scores (resistant to outliers)
4. **Source attribution**: Include `[Source: tool(param)]` on every factual claim

The reporting skill consumes Phases 1-5 output and the graph structure from Phase 6a. It does NOT make new API calls.

**Fallback** (if reporting skill unavailable):
```
## Summary
[Direct answer with source citations on every claim]

## Confidence
[State that full evidence grading was not performed and why]
```

## Edge Discovery Commands

These curl commands are for edge types not covered by MCP tools:

| Edge Type | Command |
|-----------|---------|
| Drug → Target | `curl -s "https://www.ebi.ac.uk/chembl/api/data/mechanism?molecule_chembl_id={ID}&format=json"` |
| Target → Drugs | Open Targets `knownDrugs` GraphQL (preferred) or `curl -s "https://www.ebi.ac.uk/chembl/api/data/mechanism?target_chembl_id={ID}&format=json"` |
| Drug → Disease | `curl -s "https://www.ebi.ac.uk/chembl/api/data/drug_indication?molecule_chembl_id={ID}&format=json"` |
| Gene → Disease | Open Targets `associatedDiseases` GraphQL |
| Gene → Orthologs | `curl -s "https://rest.ensembl.org/homology/id/human/{ENSG}?type=orthologues&content-type=application/json"` |
| Protein Set → GO | `curl -s "https://string-db.org/api/json/enrichment?identifiers={IDs}&species=9606"` |
| Gene → PubMed | `curl -s "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=gene&db=pubmed&id={ID}&retmode=json"` |

## Node Types

| Type | CURIE Pattern | API Argument Format |
|------|---------------|---------------------|
| Gene | `HGNC:\d+` | `"HGNC:11998"` (HGNC includes prefix) |
| Protein | `UniProtKB:[A-Z0-9]+` | `"P04637"` (bare accession) |
| Compound | `CHEMBL:\d+` | `"CHEMBL3137309"` (bare, no colon) |
| Target | `CHEMBL:\d+` | `"CHEMBL4860"` (bare) |
| Disease | `EFO:\d+` or `MONDO:\d+` | Varies by API |
| Pathway | `WP:WP\d+` | `"WP1742"` (bare) |
| Trial | `NCT\d+` | `"NCT03312634"` (bare) |
| STRING Protein | `STRING:9606.ENSP\d+` | `"9606.ENSP00000269305"` (bare) |

## Edge Types

| Edge | Source → Target | Key Properties |
|------|----------------|----------------|
| ENCODES | Gene → Protein | — |
| REGULATES | Gene → Gene | direction: activation/repression |
| INTERACTS | Protein → Protein | score, evidence_type |
| INHIBITOR | Compound → Target | Ki, IC50 |
| AGONIST | Compound → Target | EC50 |
| TREATS | Compound → Disease | max_phase |
| ASSOCIATED_WITH | Gene → Disease | score, evidence_sources |
| MEMBER_OF | Gene → Pathway | — |

## API Reliability & Fallback Patterns

| Primary Source | Fallback | When to Switch |
|---------------|----------|----------------|
| ChEMBL `chembl_get_compound` | Open Targets `opentargets_get_target` | On 500 error (common for detail endpoints) |
| ChEMBL `chembl_search_compounds` | (generally reliable) | Retry once, then report failure |
| STRING `string_get_interactions` | BioGRID `biogrid_get_interactions` | On <3 interactions returned |
| WikiPathways | STRING `/enrichment` endpoint | On no pathways found |
| ClinicalTrials.gov drug search | Disease-only search | On zero results for drug+disease |

## Query Best Practices

### Human-Centric Defaults
- Default to `species=9606` (human) for gene/protein searches
- Use `page_size=10` for exploration, `page_size=50` for batch operations
- Only omit organism filter for comparative genomics across species

### Drug Discovery vs Repurposing
- **Drug repurposing**: Use `phase >= 2` (clinical validation, shorter approval path)
- **General discovery**: No phase filter (include preclinical tools, mechanism probes)
- **Always check mechanisms** before bioactivity data

### Clinical Trial Defaults
- Default `filter.overallStatus=RECRUITING` for active research
- No phase filter for full landscape view
- Use phase filter only for specific analysis (PHASE3+ for commercialization)

## See Also

- **lifesciences-genomics**: HGNC, Ensembl, NCBI gene resolution endpoints (Phases 1-2)
- **lifesciences-proteomics**: UniProt, STRING, BioGRID interaction endpoints (Phases 2-3)
- **lifesciences-pharmacology**: ChEMBL, PubChem, IUPHAR, Open Targets drug endpoints (Phase 4a)
- **lifesciences-clinical**: Open Targets associations, ClinicalTrials.gov trial endpoints (Phases 4b, 5)
- **lifesciences-crispr**: BioGRID ORCS essentiality validation (extends Phase 3 with CRISPR screen data)
- **lifesciences-reporting**: Domain-specific report templates and evidence grading (Phase 6b)

### MCP Server Reference

All 34 tools are available via the `lifesciences-research` MCP server.
Endpoint: `https://lifesciences-research.fastmcp.app/mcp`

Graphiti persistence is optional/best-effort. Core workflow completion is producing validated graph and report artifacts.

Related Skills

xrd-crystallography-analyzer

16
from diegosouzapw/awesome-omni-skill

X-ray Diffraction skill for crystal structure, phase identification, and crystallite size analysis of nanomaterials

viral-generator-builder

16
from diegosouzapw/awesome-omni-skill

Expert in building shareable generator tools that go viral - name generators, quiz makers, avatar creators, personality tests, and calculator tools. Covers the psychology of sharing, viral mechanic...

sourcegraph-automation

16
from diegosouzapw/awesome-omni-skill

Automate Sourcegraph tasks via Rube MCP (Composio). Always search tools first for current schemas.

slash-command-builder

16
from diegosouzapw/awesome-omni-skill

Use when creating, improving, or troubleshooting Claude Code slash commands. Expert guidance on command structure, arguments, frontmatter, tool permissions, and best practices for building effective custom commands.

skill-builder

16
from diegosouzapw/awesome-omni-skill

Create new Claude Code skills with proper SKILL.md format, frontmatter, and best practices. Use when building, scaffolding, or generating a new skill for Claude Code.

seo-authority-builder

16
from diegosouzapw/awesome-omni-skill

Analyzes content for E-E-A-T signals and suggests improvements to build authority and trust. Identifies missing credibility elements. Use PROACTIVELY for YMYL topics.

security-bluebook-builder

16
from diegosouzapw/awesome-omni-skill

Build security Blue Books for sensitive apps

rollback-workflow-builder

16
from diegosouzapw/awesome-omni-skill

Creates safe rollback procedures for deployments with automated workflows, rollback runbooks, version management, and incident response. Use for "rollback automation", "deployment recovery", "incident response", or "production rollback".

reference-builder

16
from diegosouzapw/awesome-omni-skill

Creates exhaustive technical references and API documentation. Generates comprehensive parameter listings, configuration guides, and searchable reference materials.

ralph-prompt-builder

16
from diegosouzapw/awesome-omni-skill

Master orchestrator for generating Ralph Wiggum-compatible prompts. Analyzes task requirements and routes to appropriate generator (single-task, multi-task, project, or research). Use when you need to create any Ralph loop prompt and want automatic selection of the right generator.

promptly-agent-builder

16
from diegosouzapw/awesome-omni-skill

Create, update, or fix standalone Swift command line agents built with Swift Package Manager that use Promptly packages (PromptlyKit, PromptlyConsole, PromptlyKitTooling, PromptlySubAgents). Use when asked to scaffold a new Promptly-based agent, change its prompts or command line interface, add or adjust tools or sub agents, or diagnose build and run errors in such agents.

prompt-template-builder

16
from diegosouzapw/awesome-omni-skill

Creates reusable prompt templates with strict output contracts, style rules, few-shot examples, and do/don't guidelines. Provides system/user prompt files, variable placeholders, output formatting instructions, and quality criteria. Use when building "prompt templates", "LLM prompts", "AI system prompts", or "prompt engineering".