linux-troubleshooting

Linux system troubleshooting workflow for diagnosing and resolving system issues, performance problems, and service failures.

31,392 stars
Complexity: medium

About this skill

This skill provides an AI agent with a structured, step-by-step methodology to proactively identify, analyze, and resolve a wide array of issues on Linux systems. Leveraging common Linux commands and diagnostic techniques (via the `bash-linux` skill), it guides the agent through an initial assessment, detailed investigation, and potential resolution steps for problems like sluggish performance, unresponsive services, network connectivity issues, and resource constraints (e.g., disk space, memory). It's designed to empower AI agents to act as first-line support or autonomous system administrators, streamlining the troubleshooting process and providing actionable insights or direct solutions to maintain system health and uptime.

Best use case

An AI agent acting as a system administrator assistant needs to diagnose why a server is slow, a web service is down, or a specific application is failing on a Linux host. It is useful for automated system health checks, incident response, performance tuning, and pre-emptive problem identification.

Linux system troubleshooting workflow for diagnosing and resolving system issues, performance problems, and service failures.

The AI agent will provide a detailed diagnosis of the Linux system issue, identify potential root causes, suggest specific commands or configuration changes for resolution, and potentially execute those changes (if permitted) to restore normal operation. It will offer a clear understanding of the problem's nature and a path forward, often with concrete steps.

Practical example

Example input

My web server running Nginx on Ubuntu is showing 502 errors and high load averages. Can you troubleshoot the issue and suggest a fix?

Example output

Beginning Linux system troubleshooting for your Nginx web server on Ubuntu.

**Phase 1: Initial Assessment**
1.  **Checking System Load & Processes:** Invoking `bash-linux` to run `uptime` and `top`.
    *   `uptime` output: `10:30:00 up 2 days, 3:15, 2 users, load average: 5.20, 4.80, 4.50` (High load detected)
    *   `top` output snippet: Identified several `php-fpm` processes consuming high CPU.
2.  **Checking Disk Space:** Invoking `bash-linux` to run `df -h`.
    *   `df -h` output: `/dev/sda1 95% Used` (Critically high disk usage)

**Phase 2: Deep Dive Investigation**
1.  **Service Status:** Invoking `bash-linux` to run `systemctl status nginx` and `systemctl status php-fpm`.
    *   Nginx: `active (running)`
    *   PHP-FPM: `active (running)` but logs show frequent restarts and errors related to max children.
2.  **Log Analysis:** Invoking `bash-linux` to check `tail -n 50 /var/log/nginx/error.log` and `/var/log/php-fpm/error.log`.
    *   Nginx error log: Many "upstream timed out" errors.
    *   PHP-FPM error log: "WARNING: [pool www] server reached pm.max_children setting, consider raising it"

**Diagnosis:**
The system is experiencing high load primarily due to `php-fpm` processes hitting their `pm.max_children` limit, leading to Nginx upstream timeouts. Additionally, disk space is critically low, which could exacerbate performance issues or prevent logging/temporary file creation.

**Proposed Resolution:**
1.  **Increase `pm.max_children`:** Edit `/etc/php/7.4/fpm/pool.d/www.conf` (or similar path) to increase `pm.max_children` to a suitable value (e.g., 100). This will allow PHP-FPM to handle more concurrent requests.
2.  **Clear Disk Space:** Identify and remove unnecessary files in `/var/log`, `/tmp`, or other large directories to free up critical space.
3.  **Restart PHP-FPM:** After making changes, run `sudo systemctl restart php7.4-fpm` to apply the new configuration.

Monitoring will continue after applying these changes. Please let me know if you would like me to proceed with these steps, or if you prefer to review them first.

When to use this skill

  • Diagnosing sudden or gradual system performance degradation (e.g., high CPU, memory, or I/O).
  • Troubleshooting unexpected service failures or restarts (e.g., Nginx, Apache, MySQL, Docker containers).
  • Investigating network connectivity problems to, from, or within a Linux system.
  • Resolving critical disk space warnings or full filesystem issues.

When not to use this skill

  • On non-Linux operating systems (e.g., Windows, macOS).
  • For hardware-related issues that require physical inspection, replacement, or BIOS/firmware updates.
  • When the problem is deeply embedded in custom application code logic rather than system resources or configuration.
  • For tasks requiring physical access to the machine or complex, interactive debugging sessions beyond command-line analysis.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/linux-troubleshooting/SKILL.md --create-dirs "https://raw.githubusercontent.com/sickn33/antigravity-awesome-skills/main/plugins/antigravity-awesome-skills-claude/skills/linux-troubleshooting/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/linux-troubleshooting/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How linux-troubleshooting Compares

Feature / Agentlinux-troubleshootingStandard Approach
Platform SupportClaudeLimited / Varies
Context Awareness High Baseline
Installation ComplexitymediumN/A

Frequently Asked Questions

What does this skill do?

Linux system troubleshooting workflow for diagnosing and resolving system issues, performance problems, and service failures.

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as medium. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Linux Troubleshooting Workflow

## Overview

Specialized workflow for diagnosing and resolving Linux system issues including performance problems, service failures, network issues, and resource constraints.

## When to Use This Workflow

Use this workflow when:
- Diagnosing system performance issues
- Troubleshooting service failures
- Investigating network problems
- Resolving disk space issues
- Debugging application errors

## Workflow Phases

### Phase 1: Initial Assessment

#### Skills to Invoke
- `bash-linux` - Linux commands
- `devops-troubleshooter` - Troubleshooting

#### Actions
1. Check system uptime
2. Review recent changes
3. Identify symptoms
4. Gather error messages
5. Document findings

#### Commands
```bash
uptime
hostnamectl
cat /etc/os-release
dmesg | tail -50
```

#### Copy-Paste Prompts
```
Use @bash-linux to gather system information
```

### Phase 2: Resource Analysis

#### Skills to Invoke
- `bash-linux` - Resource commands
- `performance-engineer` - Performance analysis

#### Actions
1. Check CPU usage
2. Analyze memory
3. Review disk space
4. Monitor I/O
5. Check network

#### Commands
```bash
top -bn1 | head -20
free -h
df -h
iostat -x 1 5
```

#### Copy-Paste Prompts
```
Use @performance-engineer to analyze system resources
```

### Phase 3: Process Investigation

#### Skills to Invoke
- `bash-linux` - Process commands
- `server-management` - Process management

#### Actions
1. List running processes
2. Identify resource hogs
3. Check process status
4. Review process trees
5. Analyze strace output

#### Commands
```bash
ps aux --sort=-%cpu | head -10
pstree -p
lsof -p PID
strace -p PID
```

#### Copy-Paste Prompts
```
Use @server-management to investigate processes
```

### Phase 4: Log Analysis

#### Skills to Invoke
- `bash-linux` - Log commands
- `error-detective` - Error detection

#### Actions
1. Check system logs
2. Review application logs
3. Search for errors
4. Analyze log patterns
5. Correlate events

#### Commands
```bash
journalctl -xe
tail -f /var/log/syslog
grep -i error /var/log/*
```

#### Copy-Paste Prompts
```
Use @error-detective to analyze log files
```

### Phase 5: Network Diagnostics

#### Skills to Invoke
- `bash-linux` - Network commands
- `network-engineer` - Network troubleshooting

#### Actions
1. Check network interfaces
2. Test connectivity
3. Analyze connections
4. Review firewall rules
5. Check DNS resolution

#### Commands
```bash
ip addr show
ss -tulpn
curl -v http://target
dig domain
```

#### Copy-Paste Prompts
```
Use @network-engineer to diagnose network issues
```

### Phase 6: Service Troubleshooting

#### Skills to Invoke
- `server-management` - Service management
- `systematic-debugging` - Debugging

#### Actions
1. Check service status
2. Review service logs
3. Test service restart
4. Verify dependencies
5. Check configuration

#### Commands
```bash
systemctl status service
journalctl -u service -f
systemctl restart service
```

#### Copy-Paste Prompts
```
Use @systematic-debugging to troubleshoot service issues
```

### Phase 7: Resolution

#### Skills to Invoke
- `incident-responder` - Incident response
- `bash-pro` - Fix implementation

#### Actions
1. Implement fix
2. Verify resolution
3. Monitor stability
4. Document solution
5. Create prevention plan

#### Copy-Paste Prompts
```
Use @incident-responder to implement resolution
```

## Troubleshooting Checklist

- [ ] System information gathered
- [ ] Resources analyzed
- [ ] Logs reviewed
- [ ] Network tested
- [ ] Services verified
- [ ] Issue resolved
- [ ] Documentation created

## Quality Gates

- [ ] Root cause identified
- [ ] Fix verified
- [ ] Monitoring in place
- [ ] Documentation complete

## Related Workflow Bundles

- `os-scripting` - OS scripting
- `bash-scripting` - Bash scripting
- `cloud-devops` - DevOps

Related Skills

e2e-testing

31392
from sickn33/antigravity-awesome-skills

End-to-end testing workflow with Playwright for browser automation, visual regression, cross-browser testing, and CI/CD integration.

Granular Workflow BundleClaude

linux-shell-scripting

31392
from sickn33/antigravity-awesome-skills

Provide production-ready shell script templates for common Linux system administration tasks including backups, monitoring, user management, log analysis, and automation. These scripts serve as building blocks for security operations and penetration testing environments.

DevOps & InfrastructureClaude

linux-privilege-escalation

31392
from sickn33/antigravity-awesome-skills

Execute systematic privilege escalation assessments on Linux systems to identify and exploit misconfigurations, vulnerable services, and security weaknesses that allow elevation from low-privilege user access to root-level control.

SecurityClaude

bash-linux

31392
from sickn33/antigravity-awesome-skills

Bash/Linux terminal patterns. Critical commands, piping, error handling, scripting. Use when working on macOS or Linux systems.

Developer ToolsClaude

nft-standards

31392
from sickn33/antigravity-awesome-skills

Master ERC-721 and ERC-1155 NFT standards, metadata best practices, and advanced NFT features.

Web3 & BlockchainClaude

nextjs-app-router-patterns

31392
from sickn33/antigravity-awesome-skills

Comprehensive patterns for Next.js 14+ App Router architecture, Server Components, and modern full-stack React development.

Web FrameworksClaude

new-rails-project

31392
from sickn33/antigravity-awesome-skills

Create a new Rails project

Code GenerationClaude

networkx

31392
from sickn33/antigravity-awesome-skills

NetworkX is a Python package for creating, manipulating, and analyzing complex networks and graphs.

Network AnalysisClaude

network-engineer

31392
from sickn33/antigravity-awesome-skills

Expert network engineer specializing in modern cloud networking, security architectures, and performance optimization.

Network EngineeringClaude

nestjs-expert

31392
from sickn33/antigravity-awesome-skills

You are an expert in Nest.js with deep knowledge of enterprise-grade Node.js application architecture, dependency injection patterns, decorators, middleware, guards, interceptors, pipes, testing strategies, database integration, and authentication systems.

Frameworks & LibrariesClaude

nerdzao-elite

31392
from sickn33/antigravity-awesome-skills

Senior Elite Software Engineer (15+) and Senior Product Designer. Full workflow with planning, architecture, TDD, clean code, and pixel-perfect UX validation.

Software DevelopmentClaude

nerdzao-elite-gemini-high

31392
from sickn33/antigravity-awesome-skills

Modo Elite Coder + UX Pixel-Perfect otimizado especificamente para Gemini 3.1 Pro High. Workflow completo com foco em qualidade máxima e eficiência de tokens.

Software DevelopmentClaudeGemini