gasp-diagnostics

System diagnostics using GASP (General AI Specialized Process monitor). Use when user asks about Linux system performance, requests system checks, mentions GASP, asks to diagnose hosts, or says things like "check my system" or "what's wrong with [hostname]". Can actively fetch GASP metrics from hosts via HTTP or interpret provided JSON output.

242 stars

byaiskillstore

View on GitHub Installation ↓

Best use case

gasp-diagnostics is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. System diagnostics using GASP (General AI Specialized Process monitor). Use when user asks about Linux system performance, requests system checks, mentions GASP, asks to diagnose hosts, or says things like "check my system" or "what's wrong with [hostname]". Can actively fetch GASP metrics from hosts via HTTP or interpret provided JSON output.

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "gasp-diagnostics" skill to help with this workflow task. Context: System diagnostics using GASP (General AI Specialized Process monitor). Use when user asks about Linux system performance, requests system checks, mentions GASP, asks to diagnose hosts, or says things like "check my system" or "what's wrong with [hostname]". Can actively fetch GASP metrics from hosts via HTTP or interpret provided JSON output.

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

Do not use this when you only need a one-off answer and do not need a reusable workflow.
Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/gasp-diagnostics/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/acceleratedindustries/gasp-diagnostics/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/gasp-diagnostics/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How gasp-diagnostics Compares

Feature / Agent	gasp-diagnostics	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# GASP Diagnostics

Enables comprehensive Linux system diagnostics using GASP's AI-optimized monitoring output. Actively fetches metrics from hosts and provides intelligent analysis with context-aware interpretation.

## Fetching GASP Metrics

When user mentions a host or requests a system check:

1. **Fetch the metrics endpoint**
```
web_fetch("http://{hostname}:8080/metrics")
```

2. **Hostname formats supported**
- mDNS/local: `accelerated.local`, `hyperion.local`
- DNS names: `proxmox1`, `dev-server`, `workstation`
- IP addresses: `192.168.1.100`

3. **Default port**: 8080 (unless user specifies otherwise)

4. **Error handling**
- Host unreachable: Inform user, suggest checking if GASP is running
- Port closed/refused: Try suggesting `systemctl status gasp` on the host
- JSON parse error: GASP may not be installed or wrong endpoint
- Timeout: Network issue or host down

5. **Multi-host queries**: If user mentions multiple hosts, fetch each in sequence and compare

## Quick Diagnosis Workflow

For any system check request:

1. **Fetch** metrics from specified host(s)
2. **Check summary first**: Look at `summary.health` and `summary.concerns[]`
3. **Identify issues** using metric correlations below
4. **Report** findings with severity and specific recommendations

## Trigger Examples

These user messages should trigger this skill and active fetching:

- "Check hyperion for me"
- "What's going on with accelerated.local?"
- "Is proxmox1 having issues?"
- "Compare hyperion and proxmox1"
- "Why is my system slow?" (fetch localhost)
- "Diagnose 192.168.1.50"
- "Check all my proxmox nodes"

## Metric Interpretation

### Health Summary
- `summary.health`: Quick assessment
- "healthy": No action needed
- "degraded": Issues present but not critical
- "critical": Immediate attention required
- `summary.concerns[]`: Pre-analyzed issues to investigate first
- `summary.recent_changes[]`: Context for current state

### CPU Analysis

**Load ratio** = `load_avg_1m / cores`:
- < 0.7: Normal usage
- 0.7-1.0: Busy but healthy
- 1.0-2.0: Saturated (may cause slowness)
- \> 2.0: Severe overload

**Key indicators**:
- `trend`: "increasing" is concerning even if current load is acceptable
- `baseline_load`: Delta from baseline is more important than absolute value
- `top_processes[]`: Check for unexpected CPU hogs

### Memory Analysis

**Red flags** (priority order):
1. `oom_kills_recent > 0`: CRITICAL - system killed processes, find memory hog immediately
2. `swap_used_mb > 0`: Performance degradation in progress
3. `pressure_pct > 5%`: System struggling with memory contention
4. `usage_percent > 90%`: Getting close to limits

**Important**: Linux uses memory for cache, so high `usage_percent` alone is normal. Focus on pressure and swap.

### Disk I/O

**Saturation indicators**:
- `io_wait_ms > 10`: Significant disk bottleneck
- `queue_depth` consistently high: Disk can't keep up
- High `read_iops` or `write_iops` with slow response: Disk performance issue

**Storage capacity**:
- `usage_percent > 90%`: Running out of space
- `usage_percent > 95%`: Critical - will cause failures soon

### Network

- `rx_bytes_per_sec` / `tx_bytes_per_sec`: Check for unexpected traffic spikes
- `errors > 0` or `drops > 0`: Network hardware/configuration issue
- Large number of `time_wait` connections: May indicate connection leak

### Process Intelligence

- `zombie > 0`: Process management bug (usually benign but indicates issue)
- Processes in `D state`: Stuck in uninterruptible sleep (disk or kernel issue)
- `new_since_last[]`: Check for unexpected process spawning

### Systemd Services

- `units_failed > 0`: Check `failed_units[]` array
- `recent_restarts[]`: May indicate instability

### Log Summary

- `errors_last_interval`: Elevated error rate indicates problems
- `message_rate_per_min`: Spikes suggest logging storm or serious issue
- Review `recent_errors[]` for specific problems

### Desktop Metrics (when present)

- `gpu.utilization_pct` vs CPU: Identify GPU-bound vs CPU-bound workloads
- `gpu.temperature_c > 85`: Thermal throttling likely
- `active_window`: Provides context for resource usage

## Common System Patterns

### Development Workstation (Expected)
- High memory usage from IDEs, browsers
- Firefox/Chrome often in top memory consumers
- Docker daemon using CPU/memory
- VSCode, JetBrains IDEs in top processes
- Baseline load: 10-30% of cores

### Container Host (Expected)
- Elevated baseline load (many processes)
- dockerd/containerd in top processes
- 50-70% memory usage normal
- Many processes in top list

### Proxmox/Virtualization Host (Expected)
- Baseline load proportional to VM count
- Consistent low-level resource usage
- ~2GB overhead for Proxmox itself
- Multiple QEMU/KVM processes

### GPU Workload (Expected)
- High GPU utilization with lower CPU
- Significant GPU memory usage
- Common for: rendering, ML inference, gaming

## Multi-Host Analysis

When checking multiple hosts:

1. **Fetch all hosts first** (parallel thinking)
2. **Compare baselines**: Identify outliers
3. **Look for correlations**: Network event vs individual host issue
4. **Check recent_changes**: Migrations, deployments, package updates
5. **Identify the odd one out**: Which host differs from the pattern?

Example analysis pattern:
```
Host 1: Load 2.1/8 cores (26%), normal
Host 2: Load 7.8/8 cores (97%), ATTENTION NEEDED ← outlier
Host 3: Load 1.9/8 cores (24%), normal

Focus on Host 2 - investigate top_processes
```

## Diagnosis Strategies

### "System is slow"

1. Check load ratio (CPU saturation?)
2. Check io_wait (disk bottleneck?)
3. Check memory pressure (swapping?)
4. Identify top consumer in relevant category
5. Assess if consumption is expected for that process

### "High memory usage"

1. First: Check pressure_pct (real issue or just cache?)
2. Check swap_used_mb (actual problem?)
3. Find top memory consumers
4. Check process uptime (leak or normal?)
5. Compare to baseline (delta more important than absolute)

### "Unexpected behavior"

1. Check recent_changes for clues
2. Review systemd failed units
3. Check recent_errors in logs
4. Look for new processes since last snapshot
5. Compare current metrics to baseline

## Reporting Guidelines

When reporting findings:

1. **Start with verdict**: "Healthy", "Issue found", "Critical problem"
2. **Be specific**: Name the process/service causing issues
3. **Provide context**: Is this expected for this host type?
4. **Give actionable recommendations**: What should user do?
5. **Include relevant metrics**: Back up findings with data

Good example:
> "Issue found on accelerated.local: Memory pressure at 8.2%. The postgres container started swapping 2 hours ago and is now using 12GB RAM (up from 4GB baseline). This likely indicates a query leak. Recommend checking recent queries and restarting the container."

Bad example:
> "Memory usage is high. You might want to look into it."

## Advanced Diagnostics

For complex issues or when initial analysis is unclear, consult:
- [references/diagnostic-workflows.md](references/diagnostic-workflows.md) - Detailed diagnostic procedures
- [references/common-patterns.md](references/common-patterns.md) - Infrastructure-specific patterns

## Using with Provided JSON

If user pastes GASP JSON instead of requesting a fetch:
1. Analyze the provided JSON using all guidance above
2. Don't attempt to fetch (data already provided)
3. Apply same interpretation and reporting guidelines

Related Skills

error-diagnostics-smart-debug

242

from aiskillstore/marketplace

Use when working with error diagnostics smart debug

error-diagnostics-error-trace

242

from aiskillstore/marketplace

You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging,

error-diagnostics-error-analysis

242

from aiskillstore/marketplace

You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.

azure-diagnostics

242

from aiskillstore/marketplace

Debug and troubleshoot production issues on Azure. Covers Container Apps diagnostics, log analysis with KQL, health checks, and common issue resolution for image pulls, cold starts, and health probes. USE FOR: debug production issues, troubleshoot container apps, analyze logs with KQL, fix image pull failures, resolve cold start issues, investigate health probe failures, check resource health, view application logs, find root cause of errors DO NOT USE FOR: deploying applications (use azure-deploy), creating new resources (use azure-prepare), setting up monitoring (use azure-observability), cost optimization (use azure-cost-optimization)

build-diagnostics

242

from aiskillstore/marketplace

When given a blocker:

azure-quotas

242

from aiskillstore/marketplace

Check/manage Azure quotas and usage across providers. For deployment planning, capacity validation, region selection. WHEN: "check quotas", "service limits", "current usage", "request quota increase", "quota exceeded", "validate capacity", "regional availability", "provisioning limits", "vCPU limit", "how many vCPUs available in my subscription".

DevOps & Infrastructure

raindrop-io

242

from aiskillstore/marketplace

Manage Raindrop.io bookmarks with AI assistance. Save and organize bookmarks, search your collection, manage reading lists, and organize research materials. Use when working with bookmarks, web research, reading lists, or when user mentions Raindrop.io.

Data & Research

zlibrary-to-notebooklm

242

from aiskillstore/marketplace

自动从 Z-Library 下载书籍并上传到 Google NotebookLM。支持 PDF/EPUB 格式，自动转换，一键创建知识库。

discover-skills

242

from aiskillstore/marketplace

当你发现当前可用的技能都不够合适（或用户明确要求你寻找技能）时使用。本技能会基于任务目标和约束，给出一份精简的候选技能清单，帮助你选出最适配当前任务的技能。

web-performance-seo

242

from aiskillstore/marketplace

Fix PageSpeed Insights/Lighthouse accessibility "!" errors caused by contrast audit failures (CSS filters, OKLCH/OKLAB, low opacity, gradient text, image backgrounds). Use for accessibility-driven SEO/performance debugging and remediation.

project-to-obsidian

242

from aiskillstore/marketplace

将代码项目转换为 Obsidian 知识库。当用户提到 obsidian、项目文档、知识库、分析项目、转换项目时激活。【激活后必须执行】： 1. 先完整阅读本 SKILL.md 文件 2. 理解 AI 写入规则（默认到 00_Inbox/AI/、追加式、统一 Schema） 3. 执行 STEP 0: 使用 AskUserQuestion 询问用户确认 4. 用户确认后才开始 STEP 1 项目扫描 5. 严格按 STEP 0 → 1 → 2 → 3 → 4 顺序执行【禁止行为】： - 禁止不读 SKILL.md 就开始分析项目 - 禁止跳过 STEP 0 用户确认 - 禁止直接在 30_Resources 创建（先到 00_Inbox/AI/） - 禁止自作主张决定输出位置

obsidian-helper

242

from aiskillstore/marketplace

Obsidian 智能笔记助手。当用户提到 obsidian、日记、笔记、知识库、capture、review 时激活。【激活后必须执行】： 1. 先完整阅读本 SKILL.md 文件 2. 理解 AI 写入三条硬规矩（00_Inbox/AI/、追加式、白名单字段） 3. 按 STEP 0 → STEP 1 → ... 顺序执行 4. 不要跳过任何步骤，不要自作主张【禁止行为】： - 禁止不读 SKILL.md 就开始工作 - 禁止跳过用户确认步骤 - 禁止在非 00_Inbox/AI/ 位置创建新笔记（除非用户明确指定）