lobster-bioinformatics

Run bioinformatics analyses using Lobster AI - single-cell RNA-seq, bulk RNA-seq, literature mining, dataset discovery, quality control, and visualization. Use when analyzing genomics data, searching for papers/datasets, or working with H5AD, CSV, GEO/SRA accessions, or biological data. Requires lobster-ai package installed.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

lobster-bioinformatics is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using lobster-bioinformatics should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/lobster-bioinformatics/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/lobster-bioinformatics/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/lobster-bioinformatics/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How lobster-bioinformatics Compares

Feature / Agent	lobster-bioinformatics	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Lobster Bioinformatics Agent

Lobster AI is a bioinformatics platform that combines specialized AI agents with open-source tools to analyze multi-omics data through natural language.

## When to use this Skill

Use Lobster when the user asks to:
- Analyze single-cell RNA-seq data (QC, clustering, annotation, markers)
- Perform bulk RNA-seq analysis (differential expression, complex designs)
- Search scientific literature (PubMed, PMC, full-text retrieval)
- Discover datasets (GEO, SRA, ENA (free) and PRIDE, MASSive (cloud))
- Run quality control on biological data
- Generate bioinformatics visualizations (UMAP, volcano plots, heatmaps)
- Download and process biological datasets
- Work with H5AD, CSV, Excel, 10X formats
- Extract methods or metadata from papers

## Requirements

Lobster must be installed and configured:

```bash
# Check if Lobster is installed
which lobster

# If not installed:
uv pip install lobster-ai
lobster init --help #to see non-interactive
```

Lobster requires an LLM provider (Ollama, Anthropic, or AWS Bedrock).

## Pre-flight check (IMPORTANT)

**Before running any analysis, always verify Lobster is ready:**
```bash
lobster config-test --json
```

Returns structured JSON:
```json
{
  "valid": true,
  "env_file": "/path/to/.env",
  "checks": {
    "llm_provider": {"status": "pass", "provider": "bedrock", "message": "Connected"},
    "ncbi_api": {"status": "pass", "has_key": true, "message": "Connected"},
    "workspace": {"status": "pass", "path": "/path/to/workspace", "message": "Writable"}
  }
}
```

This command validates:
- **LLM provider** - Ollama server running + models installed, or Anthropic/Bedrock API keys valid
- **NCBI API** - PubMed/GEO access (optional but recommended)
- **Workspace** - Directory writable for output files

**Expected output for a working setup:**
```
✅ LLM Provider: bedrock (connected)
✅ NCBI API: Connected (with API key)
✅ Workspace: Writable
✅ Configuration Valid
```

**If config-test fails:**

| Error | Solution |
|-------|----------|
| No LLM provider configured | Run `lobster init` |
| Ollama server not accessible | Start Ollama: `ollama serve` |
| Ollama: No models installed | After asking user - Install a model: `ollama pull gpt-oss:20b` |
| Anthropic/Bedrock API error | Check API key validity in `.env` |
| NCBI API not configured | Add `NCBI_API_KEY` to `.env` (optional) |
| Workspace not writable | Check directory permissions |

**Quick status checks:**
```bash
# Show configuration values (masked)
lobster config-show

# Show subscription tier and available agents
lobster status
```

## Usage

### Basic syntax

```bash
# Single query (non-interactive)
lobster query "<natural language request>"

# With custom workspace
lobster query --workspace /path/to/workspace "<request>"

# With reasoning mode (for complex tasks)
lobster query --reasoning "<request>"
```

### Session continuity (multi-turn conversations)

Lobster supports conversation continuity via `--session-id`, enabling follow-up questions that reference previous context either by setting sessin-id to latest or a string of your choice:

```bash
# default session
lobster query "Search PubMed for CRISPR papers"
# Output: Session: session_20241208_150000 (use --session-id latest for follow-ups)
# then follow up with 
lobster query --session-id latest "Download the first dataset from that search"

#or use custom session id
lobster query --session-id "crispr_search_1" "Search PubMed for CRISPR papers"
#follow up with 
lobster query --session-id "crispr_search_1" "show me metadata from the first paper"
```

**Best practices:**
- Always use `--session-id latest` for follow-up queries
- Session files are saved in workspace as `session_*.json`
- Use same `--workspace` for related queries to maintain context
- Session contains conversation history, not tool execution state

**Workspace-based sessions:**
```bash
# Project 1: Cancer research
lobster query --workspace ~/cancer-project "Search for breast cancer datasets"
lobster query --workspace ~/cancer-project --session-id latest "Download the best one"

# Project 2: Immunology (separate session)
lobster query --workspace ~/immuno-project "Search for T cell datasets"
lobster query --workspace ~/immuno-project --session-id latest "Analyze that"
```

### Common patterns

**Single-cell analysis:**
```bash
lobster query "Download GSE109564 and perform quality control"
lobster query "Cluster the dataset and find marker genes"
lobster query "Create UMAP visualization colored by cell type"
```

**Literature mining:**
```bash
lobster query "Search PubMed for CRISPR screens in cancer"
lobster query "Find papers about CAR-T therapy and extract their GEO datasets"
lobster query "Get the full text and methods section for PMID:12345678"
```

**Dataset discovery:**
```bash
lobster query "Search GEO for single-cell pancreatic beta cell datasets"
lobster query "Validate GSE200997 metadata for required fields: cell_type, tissue"
lobster query "Download SRA dataset SRP123456"
```

**Data analysis:**
```bash
lobster query "Load counts.csv and run differential expression analysis"
lobster query "Perform batch correction on the loaded dataset"
lobster query "Generate volcano plot for DE results"
```

**Quality control:**
```bash
lobster query "Assess quality metrics for the loaded dataset"
lobster query "Filter cells with <200 genes or >8000 genes"
lobster query "Identify doublets using scrublet"
```

## Output handling

Lobster outputs are saved in the workspace directory (default: `.lobster_workspace/`):

**Key files to check:**
- `*.h5ad` - Processed datasets (AnnData format)
- `*.html` - Interactive visualizations
- `*.png` - Static plots for publications
- `*.csv` - Exported data tables
- `*.json` - Metadata and provenance

**To read results:**
```bash
# List workspace files
ls -lh .lobster_workspace/

# Read specific outputs
cat .lobster_workspace/analysis_summary.json
```

## Integration workflow

**Example 1: Analyze dataset and extract results**

```bash
# Step 1: Run analysis
lobster query --session-id "gse109564" "Download GSE109564, run QC, and cluster cells"

# Step 2: Check outputs
ls .lobster_workspace/*.h5ad
ls .lobster_workspace/*.html

# Step 3: Extract specific data
lobster query --session-id "gse109564" "Export cluster markers to CSV"

# Step 4: Use results in your code
# Results are now in .lobster_workspace/markers.csv
```

**Example 2: Literature mining workflow**

```bash
# Step 1: Find papers
lobster query "Search for papers about immune checkpoint inhibitors in melanoma"

# Step 2: Extract datasets
lobster query "Extract all GEO dataset IDs from the cached papers"

# Step 3: Validate datasets
lobster query "Check which datasets have cell_type and treatment metadata"

# Step 4: Download best match
lobster query "Download the dataset with most samples"
```

## Advanced features

**Export reproducible notebooks:**
```bash
lobster query "Export the analysis pipeline as a Jupyter notebook"
# Creates a Papermill-compatible notebook in workspace
```

**Workspace management:**
```bash
# Use custom workspace per project
lobster query --workspace ./project1-data "Analyze counts.csv"
lobster query --workspace ./project2-data "Analyze other-counts.csv"
```

**Provider switching (if multiple LLM providers configured):**
```bash
# Use specific provider
lobster query --provider ollama "Run expensive analysis"  # Free local
lobster query --provider anthropic "Quick task"  # Fast cloud
```

## Troubleshooting

**Command not found:**
- Verify installation: `which lobster`
- Install: `uv pip install lobster-ai`
- Configure: `lobster init`

**Rate limit errors:**
- Using Anthropic? Switch to Ollama (free) or AWS Bedrock (enterprise)
- Wait 60 seconds and retry
- Configure Ollama: `ollama pull llama3:8b-instruct && export LOBSTER_LLM_PROVIDER=ollama`

**Analysis errors:**
- Check workspace: `ls .lobster_workspace/`
- View session log: `cat ~/.lobster/.session.json`
- Try with reasoning: `lobster query --reasoning "<request>"`

**No output files:**
- Verify workspace location: `lobster query "show workspace info"`
- Check for errors in command output
- Ensure request was analysis (not just information retrieval)

## Tips for effective use

1. **Be specific:** Instead of "analyze data", say "perform single-cell clustering with resolution 0.5"
2. **Chain operations:** "Download GSE12345, run QC, cluster, and export markers to CSV"
3. **Check outputs:** Always verify generated files in `.lobster_workspace/`
4. **Use reasoning mode:** For complex multi-step tasks, add `--reasoning` flag
5. **Provide context:** Reference specific files, datasets, or previous results

## Limitations

- Lobster requires active LLM provider (Ollama/Anthropic/Bedrock)
- Large datasets (>100K cells) may be slow depending on system resources
- Some features require premium subscription (proteomics, metadata assistant)
- Full-text paper access limited by journal availability
- Rate limits apply when using cloud LLM providers

## Documentation

- Wiki: https://github.com/the-omics-os/lobster-local/wiki
- Examples: https://github.com/the-omics-os/lobster-local/wiki/27-examples-cookbook
- Installation: https://github.com/the-omics-os/lobster-local/wiki/02-installation
- Configuration: https://github.com/the-omics-os/lobster-local/wiki/03-configuration

## Version

This Skill is compatible with:
- Lobster AI v0.3.1.4+
- Claude Code v1.0+

For issues or questions: https://github.com/the-omics-os/lobster-local/issues

Related Skills

mcpmed-bioinformatics-server

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Model Context Protocol (MCP) server for bioinformatics web services like GEO, STRING, and UCSC Cell Browser.

zinc-database

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

xlsx

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

writing-skills

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when creating new skills, editing existing skills, or verifying skills work before deployment

writing-plans

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when you have a spec or requirements for a multi-step task, before touching code

wikipedia-search

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Search and fetch structured content from Wikipedia using the MediaWiki API for reliable, encyclopedic information

wellally-tech

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Integrate digital health data sources (Apple Health, Fitbit, Oura Ring) and connect to WellAlly.tech knowledge base. Import external health device data, standardize to local format, and recommend relevant WellAlly.tech knowledge base articles based on health data. Support generic CSV/JSON import, provide intelligent article recommendations, and help users better manage personal health data.

weightloss-analyzer

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

分析减肥数据、计算代谢率、追踪能量缺口、管理减肥阶段

<!--

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

# COPYRIGHT NOTICE

verification-before-completion

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always

vcf-annotator

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.