ice-crawler-harvester

Run ICE-Crawler’s Frost→Glacier→Crystal pipeline to ingest repositories safely, emit bounded artifact bundles, and hand off sealed fossils for downstream agents.

25 stars

byComeOnOliver

View on GitHub Installation ↓

Best use case

ice-crawler-harvester is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Run ICE-Crawler’s Frost→Glacier→Crystal pipeline to ingest repositories safely, emit bounded artifact bundles, and hand off sealed fossils for downstream agents.

Teams using ice-crawler-harvester should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ice-crawler-harvester/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/jacksonjp0311-gif/Clawbot-skills/ice-crawler-harvester/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ice-crawler-harvester/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ice-crawler-harvester Compares

Feature / Agent	ice-crawler-harvester	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Run ICE-Crawler’s Frost→Glacier→Crystal pipeline to ingest repositories safely, emit bounded artifact bundles, and hand off sealed fossils for downstream agents.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# ICE-Crawler Harvester

Use this skill to run the ICE-Crawler pipeline wherever you have the project cloned. Set an environment variable such as `ICE_CRAWLER_ROOT` that points to your local clone of the [Ice-Crawler repo](https://github.com/jacksonjp0311-gif/Ice-Crawler) and run the commands from there. The instructions below assume PowerShell, but any shell works.

Reference: [`references/ice-crawler-workflow.md`](references/ice-crawler-workflow.md)

## Prerequisites
- Python 3.10+ with Tkinter (for the UI) and Git on PATH.
- ICE-Crawler repository checked out locally (path referenced via `ICE_CRAWLER_ROOT`).
- Optional: `agentics/` hooks if you need partitioned follow-up tasks.

## Workflow 1 — Full UI Run (interactive)
1. `cd $env:ICE_CRAWLER_ROOT`
2. Launch the UI: `python icecrawler.py`
3. Paste any cloneable Git URL (browse/blob URLs are normalized automatically) and press the glowing **PRESS TO SUBMIT TO ICE CRAWLER** button.
4. Watch the phase ladder (Frost → Glacier → Crystal → Residue) and log panels update in real time.
5. When the run completes, open `state/runs/run_<timestamp>/` to inspect the fossilized artifact bundle.

### UI Controls
- **Ctrl+B** toggle left ladder; **Ctrl+Shift+B** toggle right logs; **Ctrl+J** toggle terminal.
- Drag PanedWindow sashes to resize panels.
- UI never touches git; it mirrors `ui_events.jsonl` written by the orchestrator.

## Workflow 2 — Headless CLI Run
```powershell
cd $env:ICE_CRAWLER_ROOT
$run = "state/runs/run_$(Get-Date -Format 'yyyyMMdd_HHmmss')"
$temp = "state/_temp_repo"
New-Item -ItemType Directory -Force -Path $run | Out-Null
python -m engine.orchestrator "https://github.com/openclaw/openclaw.git" $run 80 256 $temp
```
Arguments: `<repo_url> <state_run_dir> <max_files> <max_kb> <temp_dir>`
- `max_files` controls the Glacier selection cap.
- `max_kb` enforces a per-file size ceiling when copying into `artifact/`.
- `temp_dir` is purged automatically; failure to delete triggers a residue violation.

## Outputs & Follow-up
- `state/runs/<run>/artifact/` — crystallized file tree (repo-relative paths preserved).
- `artifact_manifest.json` + `artifact_hashes.json` — integrity anchors for downstream tools.
- `ai_handoff/manifest_compact.json` + `root_seal.txt` — sealed bundle for agent prompts.
- `ui_events.jsonl`, `run_cmds.jsonl` — truth logs for UI or automation.
- `residue_truth.json` — teardown attestation; treat violations as failures.
- **Extraction registry** — append a row to `skills/ice-crawler-harvester/extractions/index.jsonl` and drop notes under `extractions/<repo-slug>/` so future skills can mine algorithms/tools (see `extractions/README.md`).

## Extending / Integrating
- Call the orchestrator from scripts or scheduled jobs to keep repo fossils fresh.
- Parse `artifact_manifest.json` to feed other skills (e.g., code summarizers, diff analyzers).
- Hook `agentics/` when you need automatic partitioning of Frost metadata or Crystal artifacts into bounded tasks.
- Adjust `max_files` / `max_kb` per run to dial ingest size; keep limits conservative for safety.

## Troubleshooting
- Missing Tkinter on Linux → `sudo apt install python3-tk` (or distro equivalent).
- Git credential prompts bubble through the orchestrator; ensure SSH keys or tokens are configured.
- Residue violation (`state/_temp_repo` not deleted) aborts the run; rerun after manual cleanup if needed.

Follow this skill to get deterministic repo fossils with ICE-Crawler’s provenance guarantees.

Related Skills

twitter-crawler

from ComeOnOliver/skillshub

Twitter 推文爬取器 - 指定用户名爬取推文，保存为 Markdown 格式，支持自定义数量和字段

session-intelligence-harvester

from ComeOnOliver/skillshub

This skill should be used after productive sessions to extract learnings and route them to appropriate Reusable Intelligence Infrastructure (RII) components. Use when corrections were made, format drift was fixed, new patterns emerged, or the user explicitly asks to "harvest learnings" or "capture session intelligence". Transforms one-time fixes into permanent organizational knowledge by implementing updates across multiple files.

Crawl4AI — LLM-Friendly Web Crawler

from ComeOnOliver/skillshub

You are an expert in Crawl4AI, the open-source web crawler built for AI applications. You help developers extract clean, structured data from websites for LLM training, RAG pipelines, and content analysis — with automatic markdown conversion, JavaScript rendering, CSS-based extraction, LLM-powered structured extraction, and session management for multi-page crawling.

Daily Logs

from ComeOnOliver/skillshub

Record the user's daily activities, progress, decisions, and learnings in a structured, chronological format.

Socratic Method: The Dialectic Engine

from ComeOnOliver/skillshub

This skill transforms Claude into a Socratic agent — a cognitive partner who guides

Sokratische Methode: Die Dialektik-Maschine

from ComeOnOliver/skillshub

Dieser Skill verwandelt Claude in einen sokratischen Agenten — einen kognitiven Partner, der Nutzende durch systematisches Fragen zur Wissensentdeckung führt, anstatt direkt zu instruieren.

College Football Data (CFB)

from ComeOnOliver/skillshub

Before writing queries, consult `references/api-reference.md` for endpoints, conference IDs, team IDs, and data shapes.

College Basketball Data (CBB)

from ComeOnOliver/skillshub

Before writing queries, consult `references/api-reference.md` for endpoints, conference IDs, team IDs, and data shapes.

Betting Analysis

from ComeOnOliver/skillshub

Before writing queries, consult `references/api-reference.md` for odds formats, command parameters, and key concepts.

Research Proposal Generator

from ComeOnOliver/skillshub

Generate high-quality academic research proposals for PhD applications following Nature Reviews-style academic writing conventions.

Paper Slide Deck Generator

from ComeOnOliver/skillshub

Transform academic papers and content into professional slide deck images with automatic figure extraction.

Medical Imaging AI Literature Review Skill

from ComeOnOliver/skillshub

Write comprehensive literature reviews following a systematic 7-phase workflow.