tooluniverse-noncoding-rna

Analyze non-coding RNAs (miRNAs, lncRNAs, circRNAs) using miRBase, LNCipedia, RNAcentral, Rfam, and target prediction databases. Covers ncRNA identification, target prediction, disease associations, expression profiling, and functional annotation. Use when asked about microRNAs, long non-coding RNAs, RNA interference, miRNA targets, lncRNA function, or ncRNA-disease associations.

1,202 stars

bymims-harvard

View on GitHub Installation ↓

Best use case

tooluniverse-noncoding-rna is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using tooluniverse-noncoding-rna should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/tooluniverse-noncoding-rna/SKILL.md --create-dirs "https://raw.githubusercontent.com/mims-harvard/ToolUniverse/main/skills/tooluniverse-noncoding-rna/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/tooluniverse-noncoding-rna/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How tooluniverse-noncoding-rna Compares

Feature / Agent	tooluniverse-noncoding-rna	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Cursor vs Codex for AI Workflows

Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.

SKILL.md Source

# Non-Coding RNA Analysis

Pipeline for identifying, annotating, and interpreting non-coding RNAs and their biological roles. Covers microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and other ncRNA classes.

**Key principles**:
1. **Class determines function** — miRNAs repress mRNA translation; lncRNAs have diverse mechanisms (scaffolds, guides, decoys, enhancers); rRNAs/tRNAs are structural
2. **Targets matter more than the ncRNA itself** — for miRNAs, the regulated mRNA targets determine the phenotype
3. **Expression context is critical** — ncRNAs are highly tissue/cell-type specific
4. **Conservation indicates function** — deeply conserved ncRNAs (miR-let-7, MALAT1) have well-established roles
5. **Evidence grading** — T1: validated targets (reporter assay, CLIP-seq), T2: high-confidence computational prediction, T3: expression correlation, T4: sequence-based prediction only

**Type-based reasoning — look up, don't guess**:
Non-coding RNA function depends on type: miRNA silences target mRNAs (look up targets in miRTarBase/TargetScan), lncRNA has diverse functions (scaffolding, guiding, decoying — check literature for the specific lncRNA), circRNA may sponge miRNAs.

For any ncRNA query: first identify the class from the name/sequence, then select the appropriate evidence source. Do not assume function based on name alone — a gene named "LINC" may have a characterized mechanism, or none at all. Always search PubMed for the specific ncRNA before interpreting. For miRNAs, validated targets (T1) from miRTarBase outweigh any computational prediction — a predicted target with no experimental support is a hypothesis, not a finding. For lncRNAs, mechanism is almost always determined by experimental studies; use `PubMed_search_articles` with the lncRNA name + "mechanism" or "function" to find relevant evidence. For circRNAs, miRNA sponging is the most common proposed mechanism but is frequently over-claimed — look for CLIP-seq or reporter assay evidence before asserting it.

---

## When to Use

- "What are the targets of miR-21?"
- "Find lncRNAs associated with breast cancer"
- "Is this lncRNA conserved across species?"
- "What miRNAs regulate TP53?"
- "Annotate these non-coding RNA IDs"
- "Which miRNAs are biomarkers for [disease]?"

**Not this skill**: For mRNA expression analysis, use `tooluniverse-rnaseq-deseq2`. For CRISPR screens, use `tooluniverse-crispr-screen-analysis`.

---

## Core Tools

| Tool | Use For |
|------|---------|
| `miRBase_search_mirna` | Search miRNAs by name, accession, or sequence |
| `miRBase_get_mirna` | Detailed miRNA info (sequence, genomic location, family) |
| `miRBase_get_mature_mirna` | Mature miRNA sequences and annotations |
| `PubMed_search_articles` | Search for validated miRNA targets in literature (e.g., "miR-21 target validation") |
| `LNCipedia_search_lncrna` | Search lncRNAs by name, gene symbol, or transcript ID |
| `LNCipedia_get_lncrna` | Detailed lncRNA transcript info (sequence, structure, conservation) |
| `LNCipedia_get_lncrna_xrefs` | lncRNA gene info with all transcript variants |
| `LNCipedia_search_ncrna_by_type` | List all transcripts for a lncRNA gene |
| `LNCipedia_get_lncrna_publications` | lncRNA sequence (FASTA format) |
| `RNAcentral_search` | Search all ncRNA types across databases |
| `RNAcentral_get_rna` | Detailed ncRNA annotations from 40+ databases |
| `Rfam_get_family` | RNA family details (structure, alignment, species distribution) |
| `Rfam_search` | Search RNA families by keyword |
| `DisGeNET_search_gene` | ncRNA-disease associations |
| `PubMed_search_articles` | ncRNA literature |
| `GTEx_get_median_gene_expression` | Tissue expression of ncRNA genes |

---

## Workflow

```
Phase 0: ncRNA Identity & Classification
  Name/ID → miRBase/LNCipedia/RNAcentral → class, sequence, genomic location
    |
Phase 1: Target & Interaction Analysis
  miRNA → target mRNAs; lncRNA → interacting proteins/RNAs/chromatin
    |
Phase 2: Expression & Tissue Specificity
  GTEx/GEO → where is it expressed? Tissue-specific or ubiquitous?
    |
Phase 3: Disease Associations
  DisGeNET/PubMed/CTD → ncRNA-disease links with evidence
    |
Phase 4: Functional Interpretation
  Pathway enrichment of targets → biological role → clinical significance
```

### Phase 0: ncRNA Identity & Classification

ncRNA classes by size and database:
- **miRNA** (~22 nt, miRBase): Post-transcriptional silencing via 3'UTR binding
- **lncRNA** (>200 nt, LNCipedia): Diverse — chromatin remodeling, transcription regulation, miRNA sponges
- **rRNA** (120-5000 nt, RNAcentral/Rfam): Ribosome components
- **tRNA** (~76 nt, RNAcentral): Amino acid delivery
- **snoRNA** (60-300 nt, Rfam): rRNA modification (methylation, pseudouridylation)
- **snRNA** (~150 nt, Rfam): Spliceosome components
- **piRNA** (26-31 nt, RNAcentral): Transposon silencing in germline
- **circRNA** (variable, RNAcentral): miRNA sponges, protein scaffolds (experimental evidence required)

**Identification workflow**:
- Name starts with `miR-` or `hsa-mir-` → search miRBase
- Name starts with `LINC`, `MALAT`, `HOTAIR`, `XIST`, or ends in `-AS1` → search LNCipedia
- Any ncRNA type → search RNAcentral (aggregates all databases)
- RNA family question → search Rfam

### Phase 1: Target & Interaction Analysis

**For miRNAs** — the targets determine the biology:

**NOTE**: There is no dedicated miRNA target lookup tool in ToolUniverse. To find miRNA targets:

1. **Literature search** (most reliable): `PubMed_search_articles(query="miR-21 target validation luciferase")`
2. **Cross-references**: `miRBase_get_mirna_xrefs(accession="MIMAT0000076")` — may link to external target databases
3. **Known targets for well-studied miRNAs**: Use the reference table below, then validate via STRING/Reactome
4. **For novel miRNAs**: Search PubMed for "[miRNA] target" and extract validated targets from papers

Well-studied miRNA targets (for common oncomiRs/tumor suppressors):
- **miR-21**: PTEN, PDCD4, TPM1, RECK, SPRY1, SPRY2, BTG2
- **miR-155**: SOCS1, SHIP1, AID, TP53INP1
- **miR-122**: SLC7A1, ADAM17 (also HCV IRES cofactor)
- **let-7**: RAS, HMGA2, MYC, LIN28

**Target interpretation framework**:
- **Validated** (T1): Luciferase reporter, CLIP-seq, degradome-seq — base conclusions on these
- **High-confidence prediction** (T2): TargetScan conserved sites, DIANA-microT score > 0.9 — support validated findings
- **Prediction only** (T3-T4): miRanda, PicTar, RNA22 — hypothesis generation only; do not report as findings

**For lncRNAs** — the mechanism varies:

| lncRNA Mechanism | Example | How to Investigate |
|---|---|---|
| **Chromatin modifier** | HOTAIR, XIST | Check interacting proteins (PRC2, LSD1) via PubMed |
| **Transcription regulator** | NEAT1, MEG3 | Check nearby genes (cis-regulation) via genomic location |
| **miRNA sponge** | MALAT1, circRNAs | Search for miRNA binding sites |
| **Scaffold** | NKILA, BCAR4 | Check protein interactions |
| **Enhancer RNA** | eRNAs | Check ENCODE enhancer annotations |

### Phase 2: Expression & Tissue Specificity

```python
GTEx_get_median_gene_expression(gene_symbol="MIR21")  # miRNA host gene expression
# Note: GTEx measures RNA-seq; miRNA expression may need miRNA-seq data from GEO
```

**Interpretation**: Tissue-restricted ncRNAs are often functionally important in that tissue. Ubiquitous ncRNAs (like MALAT1) tend to have housekeeping roles.

### Phase 3: Disease Associations

```python
DisGeNET_search_gene(query="MIR21")  # miR-21 disease associations
PubMed_search_articles(query="miR-21 biomarker cancer")
```

**Key ncRNA-disease associations** (well-established T1 examples — always verify via DisGeNET or PubMed for the specific ncRNA):
- miR-21: OncomiR in multiple cancers; targets PTEN, PDCD4, TPM1 (hundreds of T1 studies)
- miR-155: B-cell lymphoma, inflammation — immune regulation
- miR-122: Hepatitis C liver disease — HCV replication cofactor; therapeutic target (miravirsen)
- let-7 family: Lung cancer, stem cell differentiation — tumor suppressor targeting RAS, HMGA2
- HOTAIR: Breast/colorectal cancer — recruits PRC2, promotes metastasis
- MALAT1: Lung cancer/metastasis — splicing regulation
- XIST: X-inactivation, cancer — chromatin silencing
- H19: Beckwith-Wiedemann syndrome, cancer — imprinted lncRNA, miR-675 host
- ANRIL: CVD, diabetes, cancer — CDKN2A/B locus regulation (GWAS-validated)

### Phase 4: Functional Interpretation

After identifying miRNA targets (Phase 1), run pathway enrichment:

```python
# Collect validated target gene symbols
targets = ["PTEN", "PDCD4", "TPM1", "RECK", "SPRY1"]  # miR-21 targets

# Pathway enrichment
ReactomeAnalysis_pathway_enrichment(identifiers="PTEN PDCD4 TPM1 RECK SPRY1")
STRING_get_network(identifiers="PTEN\rPDCD4\rTPM1\rRECK\rSPRY1", species=9606)
```

**Interpretation**: If miR-21 targets are enriched in apoptosis and PI3K-AKT signaling → miR-21 is an oncomiR that promotes survival by simultaneously suppressing multiple tumor suppressors.

**Report structure**:
1. **ncRNA Identity** — class, sequence, genomic location, conservation
2. **Targets/Interactions** — validated targets with evidence grades
3. **Expression Profile** — tissue specificity, disease-specific expression changes
4. **Disease Associations** — evidence-graded disease links
5. **Pathway Analysis** — enriched pathways among targets
6. **Mechanistic Model** — how this ncRNA contributes to disease biology
7. **Clinical Potential** — biomarker utility, therapeutic target potential (antagomirs, ASOs)

---

## Limitations

### Computational Procedure: TargetScan Predicted Targets (Download-and-Process)

TargetScan provides the best computational miRNA target predictions but has no REST API. Download and process locally:

```python
# Step 1: Download TargetScan predicted targets (one-time, ~10MB zipped)
# URL: https://www.targetscan.org/vert_80/vert_80_data_download/Summary_Counts.default_predictions.txt.zip
import pandas as pd
import zipfile, io, requests

url = "https://www.targetscan.org/vert_80/vert_80_data_download/Summary_Counts.default_predictions.txt.zip"
resp = requests.get(url, timeout=60)
with zipfile.ZipFile(io.BytesIO(resp.content)) as z:
    fname = z.namelist()[0]
    df = pd.read_csv(z.open(fname), sep='\t')

# Step 2: Query for a specific miRNA family
mirna = "miR-21-5p"  # or "miR-21/590-5p" (TargetScan uses family names)
targets = df[df['miRNA Family'].str.contains("miR-21", case=False, na=False)]

# Step 3: Rank by cumulative weighted context++ score
targets_ranked = targets.sort_values('Cumulative weighted context++ score', ascending=True)
print(f"Top 20 predicted targets of {mirna}:")
for _, row in targets_ranked.head(20).iterrows():
    print(f"  {row['Target Gene']:10s} score={row['Cumulative weighted context++ score']:.3f}  "
          f"sites={row['Total num conserved sites']}")
```

**Interpretation**: More negative context++ score = stronger predicted repression. Conserved sites (>1) are higher confidence.

### Computational Procedure: miRTarBase Validated Targets (Download-and-Process)

miRTarBase has Cloudflare protection blocking programmatic access. Use the R/Bioconductor data package or bulk download:

```python
# Option 1: Download from miRTarBase bulk export (requires browser download first)
# Go to: https://mirtarbase.cuhk.edu.cn/~miRTarBase/miRTarBase_2025/
# Download: hsa_MTI.xlsx (human miRNA-target interactions)

# Option 2: Use the GitHub data dump
# https://github.com/jorainer/mirtarbase — R package with cached data

# Once you have the file:
import pandas as pd
mti = pd.read_excel("hsa_MTI.xlsx")  # or read_csv if TSV

# Filter for your miRNA
mir21_targets = mti[mti['miRNA'].str.contains('hsa-miR-21', case=False, na=False)]
print(f"miR-21 validated targets: {len(mir21_targets)}")

# Filter by evidence strength
strong = mir21_targets[mir21_targets['Support Type'].str.contains(
    'Luciferase|Reporter|Western|CLIP', case=False, na=False
)]
print(f"  Strong evidence (reporter/CLIP): {len(strong)}")
for _, row in strong.head(10).iterrows():
    print(f"    {row['Target Gene']:10s} — {row['Support Type']}")
```

**When download is not available**: Use the built-in reference table in Phase 1 for well-studied miRNAs, or search PubMed for validated targets.

---

## Limitations

- **miRNA target prediction is noisy** — even the best algorithms have >50% false positive rates; always prioritize experimentally validated targets
- **lncRNA function is poorly characterized** — only ~5% of annotated lncRNAs have known functions
- **Expression measurement varies** — miRNA-seq, RNA-seq, and microarray capture different ncRNA classes; check the assay type
- **Species differences** — miRNAs are often conserved but lncRNAs are frequently species-specific; cross-species lncRNA comparisons are unreliable

Related Skills

tooluniverse

1202

from mims-harvard/ToolUniverse

Router skill for ToolUniverse tasks. First checks if specialized tooluniverse skills (105+ skills covering disease/drug/target research, gene-disease associations, clinical decision support, genomics, epigenomics, proteomics, comparative genomics, chemical safety, toxicology, systems biology, and more) can solve the problem, then falls back to general strategies for using 2300+ scientific tools. Covers tool discovery, multi-hop queries, comprehensive research workflows, disambiguation, evidence grading, and report generation. Use when users need to research any scientific topic, find biological data, or explore drug/target/disease relationships. ALSO USE for any biology, medicine, chemistry, pharmacology, or life science question — even simple factoid questions like "how many X in protein Y", "what drug interacts with Z", "what gene causes disease W", or "translate this sequence". These questions benefit from database lookups (UniProt, PubMed, ChEMBL, ClinVar, GWAS Catalog, etc.) rather than answering from memory alone. When in doubt about a scientific fact, USE THIS SKILL to verify against real databases.

tooluniverse-variant-to-mechanism

1202

from mims-harvard/ToolUniverse

End-to-end variant-to-mechanism analysis: given a genetic variant (rsID or coordinates), trace its functional impact from regulatory context (GWAS, eQTL, RegulomeDB, ENCODE) through target gene identification (GTEx, OpenTargets L2G) to downstream pathway and disease biology (STRING, Reactome, GO enrichment, disease associations). Produces an evidence-graded mechanistic narrative linking genotype to phenotype. Use when asked "how does this variant cause disease?", "what is the mechanism of rs7903146?", "trace variant to pathway", or "connect this GWAS hit to biology".

tooluniverse-variant-interpretation

1202

from mims-harvard/ToolUniverse

Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.

tooluniverse-variant-functional-annotation

1202

from mims-harvard/ToolUniverse

Comprehensive functional annotation of protein variants — pathogenicity, population frequency, structural context, and clinical significance. Integrates ProtVar (map_variant, get_function, get_population) for protein-level mapping and structural context, ClinVar for clinical classifications, gnomAD for population frequency with ancestry data, CADD for deleteriousness scores, and ClinGen for gene-disease validity. Produces a structured variant annotation report with evidence grading. Use when asked about protein variant impact, missense variant pathogenicity, ProtVar annotation, variant functional context, or combining population and structural evidence for a variant.

tooluniverse-variant-analysis

1202

from mims-harvard/ToolUniverse

Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.

tooluniverse-vaccine-design

1202

from mims-harvard/ToolUniverse

Design and evaluate vaccine candidates using computational immunology tools. Covers epitope prediction (MHC-I/II binding via IEDB), population coverage analysis, antigen selection, adjuvant matching, and immunogenicity assessment. Integrates IEDB for epitope prediction, UniProt for antigen sequences, PDB/AlphaFold for structural epitopes, BVBRC for pathogen proteomes, and literature for clinical precedent. Use when asked about vaccine design, epitope prediction, immunogenicity, MHC binding, T-cell epitopes, B-cell epitopes, or population coverage for vaccine candidates.

tooluniverse-toxicology

1202

from mims-harvard/ToolUniverse

Assess chemical and drug toxicity via adverse outcome pathways, real-world adverse event signals, and toxicogenomic evidence. Integrates AOPWiki (AOPWiki_list_aops, AOPWiki_get_aop) for mechanism- level pathway tracing, FAERS for post-market adverse event quantification, OpenFDA for label mining, and CTD for chemical-gene-disease evidence. Produces structured toxicity reports with evidence grading (T1-T4). Use when asked about toxicity mechanisms, adverse outcome pathways, AOP mapping, FAERS signal detection, or chemical-disease relationships for drugs or environmental chemicals.

tooluniverse-target-research

1202

from mims-harvard/ToolUniverse

Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.

tooluniverse-systems-biology

1202

from mims-harvard/ToolUniverse

Comprehensive systems biology and pathway analysis using multiple pathway databases (Reactome, KEGG, WikiPathways, Pathway Commons, BioModels). Performs pathway enrichment, protein-pathway mapping, keyword searches, and systems-level analysis. Use when analyzing gene sets, exploring biological pathways, or investigating systems-level biology.

tooluniverse-structural-variant-analysis

1202

from mims-harvard/ToolUniverse

Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.

tooluniverse-structural-proteomics

1202

from mims-harvard/ToolUniverse

Integrate structural biology data with proteomics for drug target validation. Retrieves protein structures from PDB (RCSB, PDBe), AlphaFold predictions, antibody structures (SAbDab), GPCR data (GPCRdb), binding pocket analysis (ProteinsPlus), and ligand interactions (BindingDB). Use when asked to find structures for a drug target, identify binding site ligands, cross-validate drug binding with structural data, assess structural druggability, or compare experimental vs predicted structures.

tooluniverse-stem-cell-organoid

1202

from mims-harvard/ToolUniverse

Research stem cells, iPSCs, organoids, and cell differentiation using ToolUniverse tools. Covers pluripotency marker identification, differentiation pathway analysis, organoid model characterization, cell type annotation, and disease modeling. Integrates CellxGene/HCA for single-cell atlas data, CellMarker for cell type markers, GEO for stem cell datasets, and pathway tools for differentiation signaling. Use when asked about stem cells, iPSCs, organoids, cell reprogramming, pluripotency, differentiation protocols, or 3D culture models.