bio-copy-number-cnv-annotation
Annotate CNVs with genes, pathways, and clinical significance. Use when interpreting CNV calls or identifying affected genes from copy number analysis.
Best use case
bio-copy-number-cnv-annotation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Annotate CNVs with genes, pathways, and clinical significance. Use when interpreting CNV calls or identifying affected genes from copy number analysis.
Teams using bio-copy-number-cnv-annotation should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/bio-copy-number-cnv-annotation/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How bio-copy-number-cnv-annotation Compares
| Feature / Agent | bio-copy-number-cnv-annotation | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Annotate CNVs with genes, pathways, and clinical significance. Use when interpreting CNV calls or identifying affected genes from copy number analysis.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
Best AI Agents for Marketing
A curated list of the best AI agents and skills for marketing teams focused on SEO, content systems, outreach, and campaign execution.
AI Agent for YouTube Script Writing
Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.
SKILL.md Source
## Version Compatibility
Reference examples tested with: bedtools 2.31+, pandas 2.2+, pybedtools 0.9+, pysam 0.22+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# CNV Annotation
**"Annotate my CNV calls with gene names"** → Overlap CNV segments with gene annotations, clinical databases, and pathway information to identify affected genes and assess clinical significance.
- CLI: `bedtools intersect -a cnvs.bed -b genes.bed`
- Python: `pybedtools.BedTool().intersect()`
## Annotate with Gene Names (bedtools)
```bash
# Convert CNV segments to BED
awk 'NR>1 {print $1"\t"$2"\t"$3"\t"$5"\t"$6}' sample.cns > sample.cnv.bed
# Intersect with gene annotations
bedtools intersect -a sample.cnv.bed -b genes.bed -wa -wb > cnv_genes.txt
# Get genes overlapping CNVs
bedtools intersect -a genes.bed -b sample.cnv.bed -u > affected_genes.bed
```
## CNVkit Gene Annotation
```bash
# Annotate during analysis
cnvkit.py batch tumor.bam --normal normal.bam \
--targets targets.bed \
--annotate refFlat.txt \
--fasta reference.fa \
-o results/
# Genes are included in output CNS file
```
## Python: Comprehensive Annotation
**Goal:** Annotate CNV segments with all overlapping genes using interval intersection.
**Approach:** Convert CNV segments and gene annotations to BedTool objects, intersect to find overlapping genes, and aggregate gene names per CNV segment.
```python
import pandas as pd
import pybedtools as pbt
def annotate_cnvs(cns_file, gene_bed, output=None):
'''Annotate CNV segments with overlapping genes.'''
cns = pd.read_csv(cns_file, sep='\t')
# Create BED from segments
cns_bed = pbt.BedTool.from_dataframe(
cns[['chromosome', 'start', 'end', 'log2']].rename(
columns={'chromosome': 'chrom'}))
genes = pbt.BedTool(gene_bed)
# Intersect
intersect = cns_bed.intersect(genes, wa=True, wb=True)
# Parse results
results = []
for interval in intersect:
results.append({
'chrom': interval[0],
'start': int(interval[1]),
'end': int(interval[2]),
'log2': float(interval[3]),
'gene_chrom': interval[4],
'gene_start': int(interval[5]),
'gene_end': int(interval[6]),
'gene_name': interval[7] if len(interval) > 7 else 'NA'
})
df = pd.DataFrame(results)
# Aggregate genes per CNV
cnv_genes = df.groupby(['chrom', 'start', 'end', 'log2'])['gene_name'].apply(
lambda x: ','.join(sorted(set(x)))).reset_index()
if output:
cnv_genes.to_csv(output, sep='\t', index=False)
return cnv_genes
```
## Annotate with Cancer Gene Census
**Goal:** Flag known cancer-associated genes within CNV regions.
**Approach:** Load the COSMIC Cancer Gene Census, cross-reference with genes overlapping CNVs, and tag matching genes.
```python
import pandas as pd
def annotate_cancer_genes(cnv_genes, cgc_file):
'''Flag cancer-associated genes in CNVs.'''
cgc = pd.read_csv(cgc_file, sep='\t')
cancer_genes = set(cgc['Gene Symbol'].tolist())
def check_cancer_genes(genes):
if pd.isna(genes):
return ''
gene_list = genes.split(',')
cancer = [g for g in gene_list if g in cancer_genes]
return ','.join(cancer)
cnv_genes['cancer_genes'] = cnv_genes['gene_name'].apply(check_cancer_genes)
cnv_genes['n_cancer_genes'] = cnv_genes['cancer_genes'].apply(
lambda x: len(x.split(',')) if x else 0)
return cnv_genes
```
## Annotate with ACMG/ClinVar
**Goal:** Identify pathogenic ClinVar variants within CNV regions for clinical interpretation.
**Approach:** Query the ClinVar VCF for each CNV region using pysam, collect pathogenic variants and their associated genes.
```python
def annotate_clinvar_cnvs(cnv_bed, clinvar_vcf):
'''Annotate CNVs with ClinVar variants.'''
import pysam
cnv = pd.read_csv(cnv_bed, sep='\t', header=None,
names=['chrom', 'start', 'end', 'log2'])
vcf = pysam.VariantFile(clinvar_vcf)
results = []
for _, row in cnv.iterrows():
chrom = row['chrom'].replace('chr', '')
pathogenic = []
for rec in vcf.fetch(chrom, row['start'], row['end']):
clnsig = rec.info.get('CLNSIG', [''])[0]
if 'pathogenic' in clnsig.lower():
gene = rec.info.get('GENEINFO', 'NA').split(':')[0]
pathogenic.append(gene)
results.append({
'chrom': row['chrom'],
'start': row['start'],
'end': row['end'],
'log2': row['log2'],
'clinvar_pathogenic': ','.join(set(pathogenic))
})
return pd.DataFrame(results)
```
## GISTIC2 for Recurrent CNVs
```bash
# Export segments for GISTIC
cnvkit.py export seg *.cns -o cohort.seg
# Run GISTIC2
gistic2 \
-b results/ \
-seg cohort.seg \
-refgene hg38.refgene.mat \
-genegistic 1 \
-smallmem 1 \
-broad 1 \
-brlen 0.5 \
-conf 0.90 \
-armpeel 1 \
-savegene 1
# Output: significant regions with genes
```
## AnnotSV for Comprehensive Annotation
```bash
# Export CNVs to VCF
cnvkit.py export vcf sample.cns -o sample.cnv.vcf
# Run AnnotSV
AnnotSV \
-SVinputFile sample.cnv.vcf \
-genomeBuild GRCh38 \
-outputFile sample_annotated \
-SVminSize 1000
# Output includes: genes, DGV, gnomAD-SV, ClinVar, OMIM
```
## R: Gene Enrichment of CNV Regions
**Goal:** Determine whether amplified or deleted genes are enriched for specific biological pathways.
**Approach:** Extract genes from amplified CNV regions, convert to Entrez IDs, and run GO and KEGG enrichment with clusterProfiler.
```r
library(clusterProfiler)
library(org.Hs.eg.db)
# Get genes in amplified regions
amp_genes <- unique(cnv_annotation$gene_name[cnv_annotation$log2 > 0.5])
# Convert to Entrez IDs
entrez_ids <- mapIds(org.Hs.eg.db, keys=amp_genes, keytype='SYMBOL', column='ENTREZID')
entrez_ids <- na.omit(entrez_ids)
# GO enrichment
go_results <- enrichGO(
gene=entrez_ids,
OrgDb=org.Hs.eg.db,
ont='BP',
pAdjustMethod='BH',
qvalueCutoff=0.05
)
# KEGG enrichment
kegg_results <- enrichKEGG(
gene=entrez_ids,
organism='hsa',
pAdjustMethod='BH'
)
```
## Interpret CNV States
```python
def interpret_cnv(log2, purity=1.0, ploidy=2):
'''Interpret CNV log2 ratio as copy number state.'''
# Adjusted for purity
cn = 2 * (2 ** log2)
if cn < 0.5:
return 'homozygous_deletion'
elif cn < 1.5:
return 'heterozygous_deletion'
elif cn < 2.5:
return 'diploid'
elif cn < 3.5:
return 'single_copy_gain'
else:
return 'amplification'
def summarize_cnvs(cns_annotated):
'''Summarize CNV calls.'''
cns_annotated['cnv_type'] = cns_annotated['log2'].apply(interpret_cnv)
summary = {
'total_cnvs': len(cns_annotated),
'amplifications': (cns_annotated['cnv_type'] == 'amplification').sum(),
'deletions': (cns_annotated['cnv_type'].str.contains('deletion')).sum(),
'total_genes': cns_annotated['gene_name'].str.split(',').explode().nunique(),
'cancer_genes': cns_annotated['cancer_genes'].str.split(',').explode().nunique()
}
return summary
```
## Output Report
```python
def generate_cnv_report(cns_annotated, output_prefix):
'''Generate CNV annotation report.'''
# Full annotation table
cns_annotated.to_csv(f'{output_prefix}_full.tsv', sep='\t', index=False)
# High-impact CNVs
high_impact = cns_annotated[
(cns_annotated['n_cancer_genes'] > 0) |
(abs(cns_annotated['log2']) > 1)
]
high_impact.to_csv(f'{output_prefix}_high_impact.tsv', sep='\t', index=False)
# Gene-level summary
genes = cns_annotated.explode('gene_name')
gene_summary = genes.groupby('gene_name').agg({
'log2': 'mean',
'chrom': 'first',
'start': 'min',
'end': 'max'
}).reset_index()
gene_summary.to_csv(f'{output_prefix}_genes.tsv', sep='\t', index=False)
return high_impact
```
## Related Skills
- copy-number/cnvkit-analysis - Generate CNV calls
- copy-number/cnv-visualization - Visualize annotated CNVs
- pathway-analysis/go-enrichment - Enrichment of CNV genes
- genome-intervals/bed-file-basics - BED file operationsRelated Skills
single-cell-annotation-skills-with-omicverse
Guide Claude through SCSA, MetaTiME, CellVote, CellMatch, GPTAnno, and weighted KNN transfer workflows for annotating single-cell modalities.
bio-variant-annotation
Comprehensive variant annotation using bcftools annotate/csq, VEP, SnpEff, and ANNOVAR. Add database annotations, predict functional consequences, and assess clinical significance. Use when annotating variants with functional and clinical information.
bio-single-cell-markers-annotation
Find marker genes and annotate cell types in single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for differential expression between clusters, identifying cluster-specific markers, scoring gene sets, and assigning cell type labels. Use when finding marker genes and annotating clusters.
bio-single-cell-cell-annotation
Automated cell type annotation using reference-based methods including CellTypist, scPred, SingleR, and Azimuth for consistent, reproducible cell labeling. Use when automatically annotating cell types using reference datasets.
bio-metabolomics-metabolite-annotation
Metabolite identification from m/z and retention time. Covers database matching, MS/MS spectral matching, and confidence level assignment. Use when assigning compound identities to detected features in untargeted metabolomics.
bio-imaging-mass-cytometry-interactive-annotation
Interactive cell type annotation for IMC data. Covers napari-based annotation, marker-guided labeling, training data generation, and annotation validation. Use when manually annotating cell types for training classifiers or validating automated phenotyping results.
bio-copy-number-gatk-cnv
Call copy number variants using GATK best practices workflow. Supports both somatic (tumor-normal) and germline CNV detection from WGS or WES data. Use when following GATK best practices or integrating CNV calling with other GATK variant pipelines.
bio-copy-number-cnvkit-analysis
Detect copy number variants from targeted/exome sequencing using CNVkit. Supports tumor-normal pairs, tumor-only, and germline CNV calling. Use when detecting CNVs from WES or targeted panel sequencing data.
bio-copy-number-cnv-visualization
Visualize copy number profiles, segments, and compare across samples. Create publication-quality plots of CNV data from CNVkit, GATK, or other callers. Use when creating genome-wide CNV plots, sample heatmaps, or chromosome-level visualizations.
bio-chipseq-peak-annotation
Annotate ChIP-seq peaks to genomic features and genes using ChIPseeker. Assign peaks to promoters, exons, introns, and intergenic regions. Find nearest genes and calculate distance to TSS. Generate annotation plots and statistics. Use when annotating ChIP-seq peaks to genomic features.
zinc-database
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
zarr-python
Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.