bio-imaging-mass-cytometry-data-preprocessing
Load and preprocess imaging mass cytometry (IMC) and MIBI data. Covers MCD/TIFF handling, hot pixel removal, and image normalization. Use when starting IMC analysis from raw MCD files or preparing images for segmentation.
Best use case
bio-imaging-mass-cytometry-data-preprocessing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Load and preprocess imaging mass cytometry (IMC) and MIBI data. Covers MCD/TIFF handling, hot pixel removal, and image normalization. Use when starting IMC analysis from raw MCD files or preparing images for segmentation.
Teams using bio-imaging-mass-cytometry-data-preprocessing should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/bio-imaging-mass-cytometry-data-preprocessing/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How bio-imaging-mass-cytometry-data-preprocessing Compares
| Feature / Agent | bio-imaging-mass-cytometry-data-preprocessing | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Load and preprocess imaging mass cytometry (IMC) and MIBI data. Covers MCD/TIFF handling, hot pixel removal, and image normalization. Use when starting IMC analysis from raw MCD files or preparing images for segmentation.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
## Version Compatibility
Reference examples tested with: anndata 0.10+, numpy 1.26+, pandas 2.2+, scanpy 1.10+, scipy 1.12+, steinbock 0.16+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# IMC Data Preprocessing
**"Preprocess my imaging mass cytometry data"** → Load MCD files, apply hot pixel removal, channel cropping, and signal normalization to prepare multiplexed images for segmentation and analysis.
- CLI: `steinbock preprocess` for automated IMC preprocessing pipeline
## Load MCD Files with steinbock
```bash
# steinbock CLI workflow (Docker-based)
# Convert MCD to TIFF
steinbock preprocess imc \
--mcd raw/*.mcd \
--panel panel.csv \
-o img
# Output: img/*.tiff (one per acquisition)
```
## Panel File Format
```csv
# panel.csv
channel,name,keep,ilastik
1,DNA1,1,1
2,CD45,1,1
3,CD3,1,0
4,CD8,1,0
5,CD4,1,0
```
## Python-Based Loading
```python
import readimc
import numpy as np
from pathlib import Path
# Read MCD file
mcd_file = Path('acquisition.mcd')
with readimc.MCDFile(mcd_file) as mcd:
# List acquisitions
for acquisition in mcd.acquisitions:
print(f'Acquisition: {acquisition.id}')
print(f' Channels: {len(acquisition.channel_metals)}')
print(f' Size: {acquisition.width} x {acquisition.height}')
# Load specific acquisition
acq = mcd.acquisitions[0]
img = mcd.read_acquisition(acq) # Returns (C, H, W) array
# Channel names
channel_names = acq.channel_names
```
## Hot Pixel Removal
```python
from scipy import ndimage
import numpy as np
def remove_hot_pixels(img, threshold=50):
'''Remove hot pixels using median filtering comparison'''
filtered = ndimage.median_filter(img, size=3)
diff = np.abs(img - filtered)
hot_pixels = diff > threshold
# Replace hot pixels with median
result = img.copy()
result[hot_pixels] = filtered[hot_pixels]
return result
# Apply to each channel
img_clean = np.stack([remove_hot_pixels(img[c]) for c in range(img.shape[0])])
```
## Spillover Correction
**Goal:** Remove channel crosstalk caused by isotope impurities in IMC data so that each channel reflects only its intended metal target.
**Approach:** Invert the measured spillover matrix (channels x channels) and multiply each pixel's channel vector by the inverse, clipping negative values to zero.
```python
import numpy as np
import pandas as pd
def apply_spillover_correction(img, spillover_matrix):
'''Apply spillover correction to IMC image
spillover_matrix: (n_channels, n_channels) DataFrame or array
rows = measured, cols = emitting
'''
n_channels, height, width = img.shape
# Reshape to (pixels, channels)
pixels = img.reshape(n_channels, -1).T
# Invert spillover matrix
sm = np.array(spillover_matrix)
sm_inv = np.linalg.inv(sm)
# Apply correction
corrected = pixels @ sm_inv.T
corrected = np.clip(corrected, 0, None) # No negative values
# Reshape back to image
return corrected.T.reshape(n_channels, height, width)
# Load spillover matrix (from CATALYST or manual measurement)
spillover = pd.read_csv('spillover_matrix.csv', index_col=0)
img_corrected = apply_spillover_correction(img_clean, spillover)
```
## Estimate Spillover from Single-Stain Controls
```python
def estimate_spillover(single_stains, channel_names):
'''Estimate spillover matrix from single-stain controls'''
n_channels = len(channel_names)
spillover = np.eye(n_channels)
for i, (primary_channel, control_img) in enumerate(single_stains.items()):
primary_idx = channel_names.index(primary_channel)
primary_signal = control_img[primary_idx].flatten()
mask = primary_signal > np.percentile(primary_signal, 95)
for j, ch in enumerate(channel_names):
if i != j:
secondary_signal = control_img[j].flatten()[mask]
spillover[j, primary_idx] = np.median(secondary_signal / primary_signal[mask])
return pd.DataFrame(spillover, index=channel_names, columns=channel_names)
```
## Image Normalization
```python
def percentile_normalize(img, low=1, high=99):
'''Normalize to percentiles (per channel)'''
normalized = np.zeros_like(img, dtype=np.float32)
for c in range(img.shape[0]):
channel = img[c]
p_low = np.percentile(channel, low)
p_high = np.percentile(channel, high)
normalized[c] = np.clip((channel - p_low) / (p_high - p_low), 0, 1)
return normalized
def arcsinh_transform(img, cofactor=5):
'''Arcsinh transformation (similar to flow cytometry)'''
return np.arcsinh(img / cofactor)
# Apply transformations
img_norm = percentile_normalize(img_clean)
img_asinh = arcsinh_transform(img_clean)
```
## steinbock Preprocessing Pipeline
```bash
# Complete preprocessing with steinbock
# 1. Extract images from MCD
steinbock preprocess imc --mcd raw/*.mcd -o img
# 2. Apply hot pixel removal
steinbock preprocess filter --img img -o img_filtered
# 3. Generate probability maps (for segmentation)
# Requires trained Ilastik classifier
steinbock classify ilastik \
--img img_filtered \
--ilastik-project pixel_classifier.ilp \
-o probabilities
```
## Visualize with napari
```python
import napari
import tifffile
# Load image
img = tifffile.imread('acquisition.tiff')
channel_names = ['DNA1', 'CD45', 'CD3', 'CD8', 'CD4']
# Create viewer
viewer = napari.Viewer()
# Add channels
for i, name in enumerate(channel_names):
viewer.add_image(img[i], name=name, colormap='gray', blending='additive')
napari.run()
```
## Create AnnData Object
```python
import anndata as ad
import pandas as pd
# After segmentation, create AnnData from single-cell data
def create_anndata(intensities, cell_info, channel_names):
'''Create AnnData from segmented single-cell data'''
# Intensities: cells x channels
adata = ad.AnnData(X=intensities)
# Channel names
adata.var_names = channel_names
# Cell metadata
adata.obs = cell_info # DataFrame with area, centroid_x, centroid_y, etc.
return adata
# Example usage
adata = create_anndata(
intensities=cell_intensities, # (n_cells, n_channels)
cell_info=cell_metadata, # DataFrame
channel_names=channel_names
)
adata.write('imc_data.h5ad')
```
## Batch Processing
```python
from pathlib import Path
import tifffile
def process_batch(input_dir, output_dir):
'''Process all images in directory'''
input_dir = Path(input_dir)
output_dir = Path(output_dir)
output_dir.mkdir(exist_ok=True)
for img_path in input_dir.glob('*.tiff'):
img = tifffile.imread(img_path)
# Preprocessing
img = np.stack([remove_hot_pixels(img[c]) for c in range(img.shape[0])])
img = percentile_normalize(img)
# Save
output_path = output_dir / img_path.name
tifffile.imwrite(output_path, img.astype(np.float32))
print(f'Processed: {img_path.name}')
process_batch('raw_images', 'processed_images')
```
## Related Skills
- cell-segmentation - Segment preprocessed images
- spatial-transcriptomics/spatial-data-io - Similar data loading conceptsRelated Skills
zinc-database
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
uspto-database
Access USPTO APIs for patent/trademark searches, examination history (PEDS), assignments, citations, office actions, TSDR, for IP analysis and prior art searches.
uniprot-database
Direct REST API access to UniProt. Protein searches, FASTA retrieval, ID mapping, Swiss-Prot/TrEMBL. For Python workflows with multiple databases, prefer bioservices (unified interface to 40+ services). Use this for direct HTTP/REST work or UniProt-specific control.
tooluniverse-expression-data-retrieval
Retrieves gene expression and omics datasets from ArrayExpress and BioStudies with gene disambiguation, experiment quality assessment, and structured reports. Creates comprehensive dataset profiles with metadata, sample information, and download links. Use when users need expression data, omics datasets, or mention ArrayExpress (E-MTAB, E-GEOD) or BioStudies (S-BSST) accessions.
tcga-bulk-data-preprocessing-with-omicverse
Guide Claude through ingesting TCGA sample sheets, expression archives, and clinical carts into omicverse, initialising survival metadata, and exporting annotated AnnData files.
string-database
Query STRING API for protein-protein interactions (59M proteins, 20B interactions). Network analysis, GO/KEGG enrichment, interaction discovery, 5000+ species, for systems biology.
single-cell-preprocessing-with-omicverse
Walk through omicverse's single-cell preprocessing tutorials to QC PBMC3k data, normalise counts, detect HVGs, and run PCA/embedding pipelines on CPU, CPU–GPU mixed, or GPU stacks.
reactome-database
Query Reactome REST API for pathway analysis, enrichment, gene-pathway mapping, disease pathways, molecular interactions, expression analysis, for systems biology studies.
pubmed-database
Direct REST API access to PubMed. Advanced Boolean/MeSH queries, E-utilities API, batch processing, citation management. For Python workflows, prefer biopython (Bio.Entrez). Use this for direct HTTP/REST work or custom API implementations.
pubchem-database
Query PubChem via PUG-REST API/PubChemPy (110M+ compounds). Search by name/CID/SMILES, retrieve properties, similarity/substructure searches, bioactivity, for cheminformatics.
pdb-database
Access RCSB PDB for 3D protein/nucleic acid structures. Search by text/sequence/structure, download coordinates (PDB/mmCIF), retrieve metadata, for structural biology and drug discovery.
opentargets-database
Query Open Targets Platform for target-disease associations, drug target discovery, tractability/safety data, genetics/omics evidence, known drugs, for therapeutic target identification.