bio-proteomics-spectral-libraries

Build, manage, and search spectral libraries for proteomics. Use when creating or working with spectral libraries for DIA analysis. Covers DDA-based library generation, predicted libraries (Prosit, DeepLC), and library formats.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

bio-proteomics-spectral-libraries is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bio-proteomics-spectral-libraries should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-proteomics-spectral-libraries/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-proteomics-spectral-libraries/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-proteomics-spectral-libraries/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-proteomics-spectral-libraries Compares

Feature / Agent	bio-proteomics-spectral-libraries	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: matplotlib 3.8+, pandas 2.2+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Spectral Library Management

**"Build a spectral library for DIA analysis"** → Create, filter, and manage spectral libraries from DDA experiments or predicted spectra for use in DIA quantification workflows.
- CLI: `spectrast` (TPP) for consensus library building from search results
- CLI: Prosit/DeepLC for deep learning-predicted spectral libraries
- Python: `pandas` for library format conversion and quality filtering

## Build Library from DDA Data

### SpectraST (TPP)

```bash
# Build library from search results
spectrast -cNlibrary.splib -cAC search_results.pep.xml

# Filter library for quality
spectrast -cNfiltered.splib -cAQ library.splib

# Convert to other formats
spectrast -cNlibrary.tsv -cM library.splib
```

### EasyPQP (Skyline/OpenMS)

```bash
# Build library from search results
easypqp library \
    --in psm_results.tsv \
    --out library.pqp \
    --psmtsv \
    --rt_reference irt.tsv

# Convert to TSV format
easypqp convert \
    --in library.pqp \
    --out library.tsv \
    --format openswath
```

### EncyclopeDIA (Walnut)

```bash
# Build chromatogram library from DIA
EncyclopeDIA \
    -i sample1.mzML \
    -i sample2.mzML \
    -l wide_window_library.dlib \
    -f uniprot.fasta \
    -o results

# Search with narrow-window DIA
EncyclopeDIA \
    -i narrow_sample.mzML \
    -l narrow_library.elib \
    -f uniprot.fasta \
    -o search_results
```

## Predicted Libraries

### Prosit (Deep Learning)

```python
# Generate predictions via Prosit API
import requests
import pandas as pd

peptides = pd.DataFrame({
    'modified_sequence': ['PEPTIDEK', 'ANOTHERPEPTIDER'],
    'collision_energy': [30, 30],
    'precursor_charge': [2, 2]
})

# Submit to Prosit server
response = requests.post(
    'https://www.proteomicsdb.org/prosit/api/predict',
    json=peptides.to_dict(orient='records')
)

# Parse response to library format
predictions = response.json()
```

### DeepLC Retention Time Prediction

```python
from deeplc import DeepLC

# Initialize predictor
dlc = DeepLC()

# Predict retention times
peptides = ['PEPTIDEK', 'ANOTHERPEPTIDER']
calibration_peptides = ['GAGSSEPVTGLDAK', 'VEATFGVDESNAK']
calibration_rts = [22.4, 33.1]

# Calibrate and predict
dlc.calibrate_preds(
    seq_df=pd.DataFrame({'seq': calibration_peptides, 'rt': calibration_rts})
)
predicted_rts = dlc.make_preds(seq_df=pd.DataFrame({'seq': peptides}))
```

### MS2PIP Fragmentation Prediction

```python
from ms2pip import Predictor

# Initialize predictor
predictor = Predictor(model='HCD2021')

# Predict fragmentation
peptide_df = pd.DataFrame({
    'peptide': ['PEPTIDEK', 'ANOTHERPEPTIDER'],
    'charge': [2, 2],
    'modifications': ['', '']
})

predictions = predictor.predict(peptide_df)
```

## Library Formats

### DIA-NN TSV Format

```
# Required columns
PrecursorMz    ProductMz    Annotation    ProteinId    GeneName
PeptideSequence    ModifiedSequence    PrecursorCharge
FragmentCharge    FragmentType    FragmentSeriesNumber
NormalizedRetentionTime    LibraryIntensity
```

### OpenSWATH TSV Format

```python
import pandas as pd

# Convert to OpenSWATH format
library = pd.DataFrame({
    'PrecursorMz': precursor_mz,
    'ProductMz': product_mz,
    'LibraryIntensity': intensity,
    'NormalizedRetentionTime': rt,
    'PrecursorCharge': charge,
    'ProductCharge': 1,
    'FragmentType': ion_type,  # 'b' or 'y'
    'FragmentSeriesNumber': ion_num,
    'ModifiedPeptideSequence': mod_seq,
    'PeptideSequence': sequence,
    'ProteinId': protein,
    'GeneName': gene,
    'Decoy': 0
})

library.to_csv('library_openswath.tsv', sep='\t', index=False)
```

### Spectronaut Library Format

```
# Key columns for Spectronaut
ModifiedPeptide    StrippedPeptide    PrecursorCharge
PrecursorMz    iRT    FragmentLossType
FragmentCharge    FragmentType    FragmentNumber
RelativeIntensity    FragmentMz    ProteinGroups
Genes    ProteinIds
```

## Library QC

```python
import pandas as pd

library = pd.read_csv('library.tsv', sep='\t')

# Basic statistics
print(f"Precursors: {library['ModifiedSequence'].nunique()}")
print(f"Proteins: {library['ProteinId'].nunique()}")
print(f"Transitions per precursor: {len(library) / library['ModifiedSequence'].nunique():.1f}")

# RT distribution
import matplotlib.pyplot as plt
rts = library.groupby('ModifiedSequence')['NormalizedRetentionTime'].first()
plt.hist(rts, bins=50)
plt.xlabel('Normalized RT')
plt.ylabel('Precursors')
plt.savefig('rt_distribution.png')

# Charge state distribution
charges = library.groupby('ModifiedSequence')['PrecursorCharge'].first()
print(charges.value_counts())
```

## Merge Libraries

**Goal:** Combine multiple spectral libraries into a single non-redundant library, keeping the highest-quality spectra for each precursor.

**Approach:** Concatenate library tables, rank precursors by total fragment intensity, and deduplicate by keeping the best-scoring entry per precursor-fragment combination.

```python
import pandas as pd

# Load libraries
lib1 = pd.read_csv('library1.tsv', sep='\t')
lib2 = pd.read_csv('library2.tsv', sep='\t')

# Concatenate and remove duplicates
# Keep entry with highest total intensity per precursor
combined = pd.concat([lib1, lib2])

# Calculate total intensity per precursor
precursor_intensity = combined.groupby('ModifiedSequence')['LibraryIntensity'].sum()

# Keep best precursor entries
combined['total_int'] = combined['ModifiedSequence'].map(precursor_intensity)
combined = combined.sort_values('total_int', ascending=False)
combined = combined.drop_duplicates(subset=['ModifiedSequence', 'FragmentType', 'FragmentSeriesNumber'])
combined = combined.drop('total_int', axis=1)

combined.to_csv('merged_library.tsv', sep='\t', index=False)
```

## iRT Calibration

```python
# Biognosys iRT peptides for retention time calibration
IRT_PEPTIDES = {
    'LGGNEQVTR': -24.92,
    'GAGSSEPVTGLDAK': 0.00,  # Reference
    'VEATFGVDESNAK': 12.39,
    'YILAGVENSK': 19.79,
    'TPVISGGPYEYR': 28.71,
    'TPVITGAPYEYR': 33.38,
    'DGLDAASYYAPVR': 42.26,
    'ADVTPADFSEWSK': 54.62,
    'GTFIIDPGGVIR': 70.52,
    'GTFIIDPAAVIR': 87.23,
    'LFLQFGAQGSPFLK': 100.00
}

# Convert iRT to normalized RT
def irt_to_nrt(irt, gradient_length=60):
    '''Convert iRT to normalized RT (0-1 scale)'''
    return (irt + 24.92) / 124.92  # Scale to 0-1
```

## Related Skills

- dia-analysis - Use libraries in DIA workflows
- peptide-identification - Generate search results for library building
- data-import - Load MS data for library generation

Related Skills

tooluniverse-proteomics-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Analyze mass spectrometry proteomics data including protein quantification, differential expression, post-translational modifications (PTMs), and protein-protein interactions. Processes MaxQuant, Spectronaut, DIA-NN, and other MS platform outputs. Performs normalization, statistical analysis, pathway enrichment, and integration with transcriptomics. Use when analyzing proteomics data, comparing protein abundance between conditions, identifying PTM changes, studying protein complexes, integrating protein and RNA data, discovering protein biomarkers, or conducting quantitative proteomics experiments.

bio-spatial-transcriptomics-spatial-proteomics

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Analyzes spatial proteomics data from CODEX, IMC, and MIBI platforms including cell segmentation and protein colocalization. Use when working with multiplexed imaging data, analyzing protein spatial patterns, or integrating spatial proteomics with transcriptomics.

bio-proteomics-quantification

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Protein quantification from mass spectrometry data including label-free (LFQ, intensity-based), isobaric labeling (TMT, iTRAQ), and metabolic labeling (SILAC) approaches. Use when extracting protein abundances from MS data for differential analysis.

bio-proteomics-ptm-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Post-translational modification analysis including phosphorylation, acetylation, and ubiquitination. Covers site localization, motif analysis, and quantitative PTM analysis. Use when analyzing phosphoproteomic data or other modification-enriched samples.

bio-proteomics-proteomics-qc

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Quality control and assessment for proteomics data. Use when evaluating proteomics data quality before downstream analysis. Covers sample metrics, missing value patterns, replicate correlation, batch effects, and intensity distributions.

bio-proteomics-protein-inference

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Protein grouping and inference from peptide identifications. Use when resolving protein ambiguity from shared peptides. Handles protein groups and protein-level FDR control using parsimony and probabilistic approaches.

bio-proteomics-peptide-identification

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Peptide-spectrum matching and protein identification from MS/MS data. Use when identifying peptides from tandem mass spectra. Covers database searching, spectral library matching, and FDR estimation using target-decoy approaches.

bio-proteomics-differential-abundance

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Statistical testing for differentially abundant proteins between conditions. Covers limma and MSstats workflows with multiple testing correction. Use when identifying proteins with significant abundance changes between experimental groups.

bio-proteomics-dia-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Data-independent acquisition (DIA) proteomics analysis with DIA-NN and other tools. Use when analyzing DIA mass spectrometry data with library-free or library-based workflows for deep proteome profiling.

bio-proteomics-data-import

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Load and parse mass spectrometry data formats including mzML, mzXML, and quantification tool outputs like MaxQuant proteinGroups.txt. Use when starting a proteomics analysis with raw or processed MS data. Handles contaminant filtering and missing value assessment.

zinc-database

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.