Node Tuning Helper Scripts

Generate tuned manifests and evaluate node tuning snapshots

16 stars

Best use case

Node Tuning Helper Scripts is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Generate tuned manifests and evaluate node tuning snapshots

Teams using Node Tuning Helper Scripts should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/node-tuning-helper-scripts/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/development/node-tuning-helper-scripts/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/node-tuning-helper-scripts/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Node Tuning Helper Scripts Compares

Feature / AgentNode Tuning Helper ScriptsStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Generate tuned manifests and evaluate node tuning snapshots

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Node Tuning Helper Scripts

Detailed instructions for invoking the helper utilities that back `/node-tuning` commands:
- `generate_tuned_profile.py` renders Tuned manifests (`tuned.openshift.io/v1`).
- `analyze_node_tuning.py` inspects live nodes or sosreports for tuning gaps.

## When to Use These Scripts
- Translate structured command inputs into Tuned manifests for the Node Tuning Operator.
- Iterate on generated YAML outside the assistant or integrate the generator into automation.
- Analyze CPU isolation, IRQ affinity, huge pages, sysctl values, and networking counters from live clusters or archived sosreports.

## Prerequisites
- Python 3.8 or newer (`python3 --version`).
- Repository checkout so the scripts under `plugins/node-tuning/skills/scripts/` are accessible.
- Optional: `oc` CLI when validating or applying manifests.
- Optional: Extracted sosreport directory when running the analysis script offline.
- Optional (remote analysis): `oc` CLI access plus a valid `KUBECONFIG` when capturing `/proc`/`/sys` or sosreport via `oc debug node/<name>`. The sosreport workflow pulls the `registry.redhat.io/rhel9/support-tools` image (override with `--toolbox-image` or `TOOLBOX_IMAGE`) and requires registry access. HTTP(S) proxy env vars from the host are forwarded automatically when present, but using a proxy is optional.

---

## Script: `generate_tuned_profile.py`

### Implementation Steps
1. **Collect Inputs**
   - `--profile-name`: Tuned resource name.
   - `--summary`: `[main]` section summary.
   - Repeatable options: `--include`, `--main-option`, `--variable`, `--sysctl`, `--section` (`SECTION:KEY=VALUE`).
   - Target selectors: `--machine-config-label key=value`, `--match-label key[=value]`.
   - Optional: `--priority` (default 20), `--namespace`, `--output`, `--dry-run`.
   - Use `--list-nodes`/`--node-selector` to inspect nodes and `--label-node NODE:KEY[=VALUE]` (plus `--overwrite-labels`) to tag machines.

2. **Inspect or Label Nodes (optional)**
   ```bash
   # List all worker nodes
   python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py --list-nodes --node-selector "node-role.kubernetes.io/worker" --skip-manifest

   # Label a specific node for the worker-hp pool
   python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \
     --label-node ip-10-0-1-23.ec2.internal:node-role.kubernetes.io/worker-hp= \
     --overwrite-labels \
     --skip-manifest
   ```

3. **Render the Manifest**
   ```bash
   python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \
     --profile-name "$PROFILE" \
     --summary "$SUMMARY" \
     --sysctl net.core.netdev_max_backlog=16384 \
     --match-label tuned.openshift.io/custom-net \
     --output .work/node-tuning/$PROFILE/tuned.yaml
   ```
   - Omit `--output` to write `<profile-name>.yaml` in the current directory.
   - Add `--dry-run` to print the manifest to stdout.

4. **Review Output**
   - Inspect the generated YAML for accuracy.
   - Optionally format with `yq` or open in an editor for readability.

5. **Validate and Apply**
   - Dry-run: `oc apply --server-dry-run=client -f <manifest>`.
   - Apply: `oc apply -f <manifest>`.

### Error Handling
- Missing required options raise `ValueError` with descriptive messages.
- The script exits non-zero when no target selectors (`--machine-config-label` or `--match-label`) are supplied.
- Invalid key/value or section inputs identify the failing argument explicitly.

### Examples
```bash
python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \
  --profile-name realtime-worker \
  --summary "Realtime tuned profile" \
  --include openshift-node --include realtime \
  --variable isolated_cores=1 \
  --section bootloader:cmdline_ocp_realtime=+systemd.cpu_affinity=${not_isolated_cores_expanded} \
  --machine-config-label machineconfiguration.openshift.io/role=worker-rt \
  --priority 25 \
  --output .work/node-tuning/realtime-worker/tuned.yaml
```
```bash
python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \
  --profile-name openshift-node-hugepages \
  --summary "Boot time configuration for hugepages" \
  --include openshift-node \
  --section bootloader:cmdline_openshift_node_hugepages="hugepagesz=2M hugepages=50" \
  --machine-config-label machineconfiguration.openshift.io/role=worker-hp \
  --priority 30 \
  --output .work/node-tuning/openshift-node-hugepages/hugepages-tuned-boottime.yaml
```

---

## Script: `analyze_node_tuning.py`

### Purpose
Inspect either a live node (`/proc`, `/sys`) or an extracted sosreport snapshot for tuning signals (CPU isolation, IRQ affinity, huge pages, sysctl state, networking counters) and emit actionable recommendations.

### Usage Patterns
- **Live node analysis**
  ```bash
  python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py --format markdown
  ```
- **Remote analysis via oc debug**
  ```bash
  python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \
    --node worker-rt-0 \
    --kubeconfig ~/.kube/prod \
    --format markdown
  ```
- **Collect sosreport via oc debug and analyze locally**
  ```bash
  python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \
    --node worker-rt-0 \
    --toolbox-image registry.example.com/support-tools:latest \
    --sosreport-arg "--case-id=01234567" \
    --sosreport-output .work/node-tuning/sosreports \
    --format json
  ```
- **Offline sosreport analysis**
  ```bash
  python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \
    --sosreport /path/to/sosreport-2025-10-20
  ```
- **Automation-friendly JSON**
  ```bash
  python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \
    --sosreport /path/to/sosreport \
    --format json --output .work/node-tuning/node-analysis.json
  ```

### Implementation Steps
1. **Select data source**
   - Provide `--node <name>` (with optional `--kubeconfig` / `--oc-binary`). By default the helper runs `sosreport` remotely from inside the RHCOS toolbox container (`registry.redhat.io/rhel9/support-tools`). Override the image with `--toolbox-image`, extend the sosreport command with `--sosreport-arg`, or disable the curated OpenShift flags via `--skip-default-sosreport-flags`. Pass `--no-collect-sosreport` to fall back to the direct `/proc` snapshot mode.
   - Provide `--sosreport <dir>` for archived diagnostics; detection finds embedded `proc/` and `sys/`.
   - Omit both switches to query the live filesystem (defaults to `/proc` and `/sys`).
   - Override paths with `--proc-root` or `--sys-root` when the layout differs.
2. **Run analysis**
   - The script parses `cpuinfo`, kernel cmdline parameters (`isolcpus`, `nohz_full`, `tuned.non_isolcpus`), default IRQ affinities, huge page counters, sysctl values (net, vm, kernel), transparent hugepage settings, `netstat`/`sockstat` counters, and `ps` snapshots (when available in sosreport).
3. **Review the report**
   - Markdown output groups findings by section (System Overview, CPU & Isolation, Huge Pages, Sysctl Highlights, Network Signals, IRQ Affinity, Process Snapshot) and lists recommendations.
   - JSON output contains the same information in structured form for pipelines or dashboards.
4. **Act on recommendations**
   - Apply Tuned profiles, MachineConfig updates, or manual sysctl/irqbalance adjustments.
   - Feed actionable items back into `/node-tuning:generate-tuned-profile` to codify desired state.

### Error Handling
- Missing `proc/` or `sys/` directories trigger descriptive errors.
- Unreadable files are skipped gracefully and noted in observations where relevant.
- Non-numeric sysctl values are flagged for manual investigation.

### Example Output (Markdown excerpt)
```
# Node Tuning Analysis

## System Overview
- Hostname: worker-rt-1
- Kernel: 4.18.0-477.el8
- NUMA nodes: 2
- Kernel cmdline: `BOOT_IMAGE=... isolcpus=2-15 tuned.non_isolcpus=0-1`

## CPU & Isolation
- Logical CPUs: 32
- Physical cores: 16 across 2 socket(s)
- SMT detected: yes
- Isolated CPUs: 2-15
...

## Recommended Actions
- Configure net.core.netdev_max_backlog (>=32768) to accommodate bursty NIC traffic.
- Transparent Hugepages are not disabled (`[never]` not selected). Consider setting to `never` for latency-sensitive workloads.
- 4 IRQs overlap isolated CPUs. Relocate interrupt affinities using tuned profiles or irqbalance.
```

### Follow-up Automation Ideas
- Persist JSON results in `.work/node-tuning/<host>/analysis.json` for historical tracing.
- Gate upgrades by comparing recommendations across nodes.
- Integrate with CI jobs that validate cluster tuning post-change.

Related Skills

nodejs-javascript-vitest

16
from diegosouzapw/awesome-omni-skill

Guidelines for writing Node.js and JavaScript code with Vitest testing Triggers on: **/*.js, **/*.mjs, **/*.cjs

nodejs-best-practices

16
from diegosouzapw/awesome-omni-skill

Node.js development principles and decision-making. Framework selection, async patterns, security, and architecture. Teaches thinking, not copying.

nodejs-backend-typescript

16
from diegosouzapw/awesome-omni-skill

Node.js backend development with TypeScript, Express/Fastify servers, routing, middleware, and database integration

nodejs-backend-patterns

16
from diegosouzapw/awesome-omni-skill

Build production-ready Node.js backend services with Express/Fastify, implementing middleware patterns, error handling, authentication, database integration, and API design best practices. Use when creating Node.js servers, REST APIs, GraphQL backends, or microservices architectures.

n8n-node-configuration

16
from diegosouzapw/awesome-omni-skill

Operation-aware node configuration guidance. Use when configuring nodes, understanding property dependencies, determining required fields, choosing between get_node detail levels, or learning common configuration patterns by node type.

libpdf-helper

16
from diegosouzapw/awesome-omni-skill

Work with @libpdf/core - modern TypeScript PDF library for parsing, modifying, and generating PDFs. Use when (1) starting new @libpdf/core project, (2) migrating from pdf-lib/pdf.js/pdfkit, (3) understanding @libpdf/core API, (4) solving PDF tasks (forms, signatures, encryption, merging, text extraction), or (5) choosing between PDF libraries.

hashnode-automation

16
from diegosouzapw/awesome-omni-skill

Automate Hashnode tasks via Rube MCP (Composio). Always search tools first for current schemas.

fetching-youtube-transcripts

16
from diegosouzapw/awesome-omni-skill

Fetch transcripts and subtitles from YouTube videos using youtube-transcript-api. Use when extracting video transcripts, listing available languages, translating captions, or processing YouTube content for summarization or analysis.

backend-nodejs

16
from diegosouzapw/awesome-omni-skill

Node.js/TypeScript backend expert. Handles Express/Fastify API routes, TypeScript strict mode, Prisma ORM, Zod validation, error handling, configuration management. Use when project is Node.js backend (package.json + TypeScript server).

Backend Node.js Expert

16
from diegosouzapw/awesome-omni-skill

专注于 Node.js 后端开发模式与最佳实践。

Assertion Helper

16
from diegosouzapw/awesome-omni-skill

Guide for writing effective test assertions with clear, meaningful error messages across different testing frameworks

smithnode

16
from diegosouzapw/awesome-omni-skill

P2P blockchain for AI agents. Proof of Cognition. Run a validator, solve puzzles, earn SMITH tokens.