server-management
Server management principles and decision-making. Process management, monitoring strategy, and scaling decisions. Teaches thinking, not commands.
Best use case
server-management is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Server management principles and decision-making. Process management, monitoring strategy, and scaling decisions. Teaches thinking, not commands.
Teams using server-management should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/server-management/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How server-management Compares
| Feature / Agent | server-management | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Server management principles and decision-making. Process management, monitoring strategy, and scaling decisions. Teaches thinking, not commands.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Server Management > Server management principles for production operations. > **Learn to THINK, not memorize commands.** --- ## 1. Process Management Principles ### Tool Selection | Scenario | Tool | |----------|------| | **Node.js app** | PM2 (clustering, reload) | | **Any app** | systemd (Linux native) | | **Containers** | Docker/Podman | | **Orchestration** | Kubernetes, Docker Swarm | ### Process Management Goals | Goal | What It Means | |------|---------------| | **Restart on crash** | Auto-recovery | | **Zero-downtime reload** | No service interruption | | **Clustering** | Use all CPU cores | | **Persistence** | Survive server reboot | --- ## 2. Monitoring Principles ### What to Monitor | Category | Key Metrics | |----------|-------------| | **Availability** | Uptime, health checks | | **Performance** | Response time, throughput | | **Errors** | Error rate, types | | **Resources** | CPU, memory, disk | ### Alert Severity Strategy | Level | Response | |-------|----------| | **Critical** | Immediate action | | **Warning** | Investigate soon | | **Info** | Review daily | ### Monitoring Tool Selection | Need | Options | |------|---------| | Simple/Free | PM2 metrics, htop | | Full observability | Grafana, Datadog | | Error tracking | Sentry | | Uptime | UptimeRobot, Pingdom | --- ## 3. Log Management Principles ### Log Strategy | Log Type | Purpose | |----------|---------| | **Application logs** | Debug, audit | | **Access logs** | Traffic analysis | | **Error logs** | Issue detection | ### Log Principles 1. **Rotate logs** to prevent disk fill 2. **Structured logging** (JSON) for parsing 3. **Appropriate levels** (error/warn/info/debug) 4. **No sensitive data** in logs --- ## 4. Scaling Decisions ### When to Scale | Symptom | Solution | |---------|----------| | High CPU | Add instances (horizontal) | | High memory | Increase RAM or fix leak | | Slow response | Profile first, then scale | | Traffic spikes | Auto-scaling | ### Scaling Strategy | Type | When to Use | |------|-------------| | **Vertical** | Quick fix, single instance | | **Horizontal** | Sustainable, distributed | | **Auto** | Variable traffic | --- ## 5. Health Check Principles ### What Constitutes Healthy | Check | Meaning | |-------|---------| | **HTTP 200** | Service responding | | **Database connected** | Data accessible | | **Dependencies OK** | External services reachable | | **Resources OK** | CPU/memory not exhausted | ### Health Check Implementation - Simple: Just return 200 - Deep: Check all dependencies - Choose based on load balancer needs --- ## 6. Security Principles | Area | Principle | |------|-----------| | **Access** | SSH keys only, no passwords | | **Firewall** | Only needed ports open | | **Updates** | Regular security patches | | **Secrets** | Environment vars, not files | | **Audit** | Log access and changes | --- ## 7. Troubleshooting Priority When something's wrong: 1. **Check if running** (process status) 2. **Check logs** (error messages) 3. **Check resources** (disk, memory, CPU) 4. **Check network** (ports, DNS) 5. **Check dependencies** (database, APIs) --- ## 8. Anti-Patterns | ❌ Don't | ✅ Do | |----------|-------| | Run as root | Use non-root user | | Ignore logs | Set up log rotation | | Skip monitoring | Monitor from day one | | Manual restarts | Auto-restart config | | No backups | Regular backup schedule | --- > **Remember:** A well-managed server is boring. That's the goal. ## When to Use This skill is applicable to execute the workflow or actions described in the overview. ## Limitations - Use this skill only when the task clearly matches the scope described above. - Do not treat the output as a substitute for environment-specific validation, testing, or expert review. - Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
Related Skills
btcpay-server-automation
Automate Btcpay Server tasks via Rube MCP (Composio). Always search tools first for current schemas.
track-management
Use this skill when creating, managing, or working with Conductor tracks - the logical work units for features, bugs, and refactors. Applies to spec.md, plan.md, and track lifecycle operations.
secrets-management
Secure secrets management practices for CI/CD pipelines using Vault, AWS Secrets Manager, and other tools.
robius-state-management
CRITICAL: Use for Robius state management patterns. Triggers on: AppState, persistence, theme switch, 状态管理, Scope::with_data, save state, load state, serde, 状态持久化, 主题切换
react-state-management
Master modern React state management with Redux Toolkit, Zustand, Jotai, and React Query. Use when setting up global state, managing server state, or choosing between state management solutions.
monorepo-management
Build efficient, scalable monorepos that enable code sharing, consistent tooling, and atomic changes across multiple packages and applications.
logistics-exception-management
Codified expertise for handling freight exceptions, shipment delays, damages, losses, and carrier disputes. Informed by logistics professionals with 15+ years operational experience.
istio-traffic-management
Comprehensive guide to Istio traffic management for production service mesh deployments.
dependency-management-deps-audit
You are a dependency security expert specializing in vulnerability scanning, license compliance, and supply chain security. Analyze project dependencies for known vulnerabilities, licensing issues, outdated packages, and provide actionable remediation strategies.
context-window-management
Strategies for managing LLM context windows including summarization, trimming, routing, and avoiding context rot
context-management-context-save
Use when working with context management context save
context-management-context-restore
Use when working with context management context restore