ln-514-test-log-analyzer

Analyzes application logs: classifies errors, checks log quality, maps stack traces to source. Use when logs need review after test runs or during development.

310 stars

bylevnikolaevich

View on GitHub Installation ↓

Best use case

ln-514-test-log-analyzer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Analyzes application logs: classifies errors, checks log quality, maps stack traces to source. Use when logs need review after test runs or during development.

Teams using ln-514-test-log-analyzer should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ln-514-test-log-analyzer/SKILL.md --create-dirs "https://raw.githubusercontent.com/levnikolaevich/claude-code-skills/main/skills-catalog/ln-514-test-log-analyzer/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ln-514-test-log-analyzer/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ln-514-test-log-analyzer Compares

Feature / Agent	ln-514-test-log-analyzer	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Analyzes application logs: classifies errors, checks log quality, maps stack traces to source. Use when logs need review after test runs or during development.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

Cursor vs Codex for AI Workflows

Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.

SKILL.md Source

> **Paths:** File paths (`shared/`, `references/`, `../ln-*`) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root. If `shared/` is missing, fetch files via WebFetch from `https://raw.githubusercontent.com/levnikolaevich/claude-code-skills/master/skills/{path}`.

# Test Log Analyzer

**Type:** L3 Worker
**Category:** 5XX Quality

Two-layer analysis of application logs. Node.js script handles collection and quantitative analysis; AI handles classification, quality assessment, and fix recommendations.

## Inputs

No required inputs. Runs in current project directory, auto-detects log sources.

Optional `args` — caller instructions (natural language): time window, expected errors, test context. Example: `"review logs for last 30min, auth 401 errors expected from negative tests"`.

## Purpose & Scope
- Analyze application logs (after test runs, during development, or on demand)
- Classify errors into 4 categories: Real Bug, Test Artifact, Expected Behavior, Operational Warning
- Assess log quality: noisiness, completeness, level correctness, format, structured logging
- Map stack traces to source files; provide fix recommendations
- Report findings for quality verdict (only Real Bugs block)
- **No status changes or task creation** — report only

## When to Use
- Analyze application logs in any project (default: last 1h)
- After test runs to classify errors and assess log quality
- Can be invoked with context instructions: `Skill(skill: "ln-514-test-log-analyzer", args: "review last 30min, 401 errors expected")`

## Workflow

### Phase 0: Parse Instructions

If `args` provided — extract: time window (default: 1h), expected errors list, test context.
If no `args` — use defaults (last 1h, no expected errors).

### Phase 1: Log Source Detection and Script Execution

Read target project files if they exist: `docs/project/infrastructure.md`, `docs/project/runbook.md`

1) Check if `scripts/analyze_test_logs.mjs` exists in target project. If missing, copy from `references/analyze_test_logs.mjs`.
2) Detect log source mode (auto-detection priority: docker → file → loki):

| Mode | Detection | Source |
|------|-----------|--------|
| `docker` | `docker compose ps` returns running containers | `docker compose logs --since {window}` |
| `file` | `.log` files exist, or `tests/manual/results/` has output | File paths from infrastructure.md or `*.log` glob |
| `loki` | `LOKI_URL` env var or `tools_config.md` observability section | Loki HTTP query_range API |

3) Run script: `node scripts/analyze_test_logs.mjs --mode {detected} [options]`
4) If no log sources found → return `NO_LOG_SOURCES` status, skip to Phase 5.

**Level-based error detection (CRITICAL):**
When constructing Loki queries or grep commands to scan for errors, ALWAYS filter by the **parsed level field**, NOT by text matching the word "error" in the full log line. Logger names like `uvicorn.error` contain "error" as part of the name but log at INFO level — text matching produces false positives.

| Log format | Correct error filter | Wrong filter |
|------------|---------------------|--------------|
| Pipe-delimited (`ts \| LEVEL \| ...`) | `\| ERROR` or `\| CRITICAL` (match level field position) | `grep -i error` (matches logger names) |
| Loki structured | `{service_name="X"} \| level="ERROR"` or `\| pattern` extraction | `{service_name="X"} \|= "error"` |
| Key=value (`level=ERROR msg=...`) | `level=ERROR` or `level=FATAL` | `\|~ "(?i)error"` |
| Docker logs (local) | `grep -E '\| ERROR \| \| CRITICAL '` | `grep -iE 'error\|exception'` |

The `analyze_test_logs.mjs` script handles this correctly via structured regex parsers. These rules apply to **ad-hoc Loki/grep queries** constructed during analysis.

### Phase 2: 4-Category Error Classification

Classify each error group from script JSON output:

| Category | Action | Criteria |
|----------|--------|----------|
| **Real Bug** | Fix | Unexpected crash, data loss, broken pipeline |
| **Test Artifact** | Skip | From test scripts, deliberate error-path validation |
| **Expected Behavior** | Skip | Rate limiting, input validation, auth failures from invalid tokens |
| **Operational Warning** | Monitor | Clock drift, resource pressure, temporary unavailability |

**Test artifact detection heuristics:**
- Test name contains: `invalid`, `error`, `fail`, `reject`, `unauthorized`, `forbidden`, `not_found`, `bad_request`, `timeout`
- Test asserts non-2xx status codes (4xx, 5xx)
- Test uses `pytest.raises`, `expect(...).rejects`, `assertThrows`, `should.throw`
- Errors correlate with test execution timestamps from regression test output
- Patterns matching `tests/manual/` scripts

**Error taxonomy per** `references/error_taxonomy.md` **(9 categories: CRASH, TIMEOUT, AUTH, DB, NETWORK, VALIDATION, CONFIG, RESOURCE, DEPRECATION).**

### Phase 3: Log Quality Assessment

**MANDATORY READ:** Load `references/error_taxonomy.md` (per-level criteria table + level correctness reference)

**Step 1: Detect configured log level.** Check in order:
1. `LOG_LEVEL` / `LOGLEVEL` env var (`.env`, `docker-compose.yml`, `infrastructure.md`)
2. Framework config: Python `logging.conf` / Django `LOGGING` / Node `LOG_LEVEL`
3. Default: assume `INFO` if not detected

Configured level determines WHICH levels appear in logs, but each level has its own noise threshold regardless.

**Step 2: Assess 6 quality dimensions:**

| Dimension | What to Check | Signal |
|-----------|---------------|--------|
| **Noisiness** | Per-level noise thresholds from `error_taxonomy.md` section 4: TRACE (zero in prod), DEBUG (>50% monopoly), INFO (>30%), WARNING (>1% of total), ERROR (>0.1% of total) | `NOISY: {level} template "{msg}" at {ratio}%` |
| **Completeness & Traceability** | Critical operations missing log entries + traceability gaps (see table below) | `MISSING: No log for {operation}` / `TRACEABILITY_GAP: {type} in {file}:{line}` |
| **Level correctness** | Per-level criteria from `error_taxonomy.md` section 4: content, anti-patterns, library rule | `WRONG_LEVEL: should be {level}` |
| **Structured logging** | Missing trace_id/request_id/user context; unstructured plaintext | `UNSTRUCTURED: lacks {field}` |
| **Sensitivity** | PII/secrets/tokens/passwords in log messages | `SENSITIVE: {type} exposure` |
| **Context richness** | Errors without actionable context (order_id, user_id, operation) | `LOW_CONTEXT: lacks context` |

**Traceability gap detection** — scan source code for operations without INFO-level logging:

| Operation Type | Expected Log | Where to Add |
|---------------|-------------|--------------|
| Incoming request handling | Request received + response status | Entry/exit of route handler |
| External API call | Request sent + response status + duration | Before/after HTTP client call |
| DB write (INSERT/UPDATE/DELETE) | Operation + affected entity + count | Before/after ORM/query call |
| Auth decision | Result (allow/deny) + reason | After auth check |
| State transition | Old state → new state + trigger | At transition point |
| Background job | Start + complete/fail + duration | Entry/exit of job handler |
| File/resource operation | Open/close + path + size | At I/O operation |

**Log Format Quality** (10-criterion checklist per `references/log_analysis_output_format.md`):

| # | Criterion | Check |
|---|-----------|-------|
| 1 | Dual format | JSON in prod, readable in dev |
| 2 | Timestamp | Consistent, timezone-aware |
| 3 | Level field | Present, uppercase |
| 4 | Trace/Correlation ID | Present in every entry, async-safe |
| 5 | Service name | Identifies source service |
| 6 | Source location | module:line + function |
| 7 | Extra context | Structured fields, not string interpolation |
| 8 | PII redaction | Passwords, API keys, emails handled |
| 9 | Noise suppression | Duplicate filters, third-party suppressed |
| 10 | Parseability | Dev: pipe-delimited; prod: valid JSON per line |

Score: passed criteria / 10.

### Phase 4: Stack Trace Mapping + Fix Recommendations

For each Real Bug:
1) Extract stack trace frames; identify origin frame (first frame in project code, not in node_modules/site-packages)
2) Map to source file:line
3) Generate fix recommendation: what to change, where, effort estimate (S/M/L)

**Prioritize using Sentry-inspired dimensions:**
- High-volume (occurrence count), Post-test regression (new errors), High-impact path (auth/payment/DB), Correlated traces (trace_id across services)

### Phase 5: Generate Report

**MANDATORY READ:** Load `references/log_analysis_output_format.md`

Output report to chat with header `## Test Log Analysis`. Include:
- Signals table (Real Bugs count, Test Artifacts filtered, Log Noise status, Log Format score, Log Quality score)
- Real Bugs table (priority, category, error, source, fix recommendation)
- Filtered table (category, count, examples)
- Log Quality Issues table (dimension, service, issue, recommendation)
- Noise Report table (count, ratio, service, level, template, action)
- Machine-readable block `<!-- LOG-ANALYSIS-DATA ... -->` for programmatic consumption

### Phase 6: Meta-Analysis

**MANDATORY READ:** Load `shared/references/meta_analysis_protocol.md`

Skill type: `execution-worker`. Run after all phases complete.

## Verdict Contribution

Quality coordinator normalization matrix component:

| Status | Maps To | Penalty |
|--------|---------|---------|
| CLEAN | -- | 0 |
| WARNINGS_ONLY | -- | 0 |
| REAL_BUGS_FOUND | FAIL | -20 |
| SKIPPED / NO_LOG_SOURCES | ignored | 0 |

Log quality/format issues are INFORMATIONAL — do not affect quality verdict. Only Real Bugs block.

## Critical Rules
- No status changes or task creation; report only.
- Test Artifacts and Expected Behavior are ALWAYS filtered — never count as bugs.
- Log quality issues are advisory — inform, don't block.
- Script must handle gracefully: no Docker, no log files, no Loki → `NO_LOG_SOURCES`.
- Language preservation in comments (EN/RU).

## Runtime Summary Artifact

**MANDATORY READ:** Load `shared/references/quality_summary_contract.md`

Accept optional `summaryArtifactPath`.

Summary kind:
- `quality-worker`

Required payload semantics:
- `worker = "ln-514"`
- `status`
- `verdict`
- `issues`
- `warnings`
- `artifact_path`

Write the summary to the provided artifact path or return the same envelope in structured output.

## Definition of Done

- [ ] Script deployed to target project `scripts/` (or already exists)
- [ ] Log source detected and script executed (or NO_LOG_SOURCES returned)
- [ ] Errors classified into 4 categories; Real Bugs identified
- [ ] Log quality assessed (6 dimensions + 10-criterion format checklist)
- [ ] Stack traces mapped to source files for Real Bugs
- [ ] Report output to chat with signals table + machine-readable block

## Reference Files
- **Error taxonomy:** `references/error_taxonomy.md`
- **Output format:** `references/log_analysis_output_format.md`
- **Analysis script:** `references/analyze_test_logs.mjs`

---
**Version:** 1.0.0
**Last Updated:** 2026-03-13

Related Skills

ln-782-test-runner

310

from levnikolaevich/claude-code-skills

Executes all test suites and reports results with coverage. Use when verifying that test infrastructure works after bootstrap.

ln-743-test-infrastructure

310

from levnikolaevich/claude-code-skills

Sets up test infrastructure with Vitest, xUnit, and pytest. Use when adding testing frameworks and sample tests to a project.

ln-641-pattern-analyzer

310

from levnikolaevich/claude-code-skills

Analyzes single pattern implementation, calculates compliance/completeness/quality scores, identifies gaps. Use when auditing a specific pattern.

ln-637-test-structure-auditor

310

from levnikolaevich/claude-code-skills

Checks test file organization, directory layout, test-to-source mapping, domain grouping, co-location. Use when auditing test structure.

ln-636-manual-test-auditor

310

from levnikolaevich/claude-code-skills

Checks manual test scripts for harness adoption, golden files, fail-fast, config sourcing, idempotency. Use when auditing manual test quality.

ln-635-test-isolation-auditor

310

from levnikolaevich/claude-code-skills

Checks test isolation (API/DB/FS/Time/Network), determinism, flaky tests, order-dependency, anti-patterns. Use when auditing test isolation.

ln-634-test-coverage-auditor

310

from levnikolaevich/claude-code-skills

Identifies missing tests for critical paths (money, security, data integrity, core flows). Use when auditing test coverage gaps.

ln-633-test-value-auditor

310

from levnikolaevich/claude-code-skills

Scores each test by Impact x Probability, returns KEEP/REVIEW/REMOVE decisions. Use when auditing test value and pruning low-value tests.

ln-632-test-e2e-priority-auditor

310

from levnikolaevich/claude-code-skills

Validates E2E coverage for critical paths (money, security, data integrity). Risk-based prioritization. Use when auditing E2E test coverage.

ln-631-test-business-logic-auditor

310

from levnikolaevich/claude-code-skills

Detects tests validating framework/library behavior instead of project code. Use when auditing test business logic focus.

ln-630-test-auditor

310

from levnikolaevich/claude-code-skills

Coordinates test suite audit across business logic, E2E coverage, value, isolation, manual quality, and structure. Use when auditing entire test suite.

ln-523-auto-test-planner

310

from levnikolaevich/claude-code-skills

Plans automated tests (E2E/Integration/Unit) using Risk-Based Testing after manual testing. Use when Story needs a test task with prioritized scenarios.