python-logging-best-practices

Python logging with loguru, structlog, and orjson. TRIGGERS - loguru, structlog, structured logging, JSONL logs, log rotation, secret redaction, OTel logging, lightweight logging, print logging, systemd logging.

29 stars

byterrylica

View on GitHub Installation ↓

Best use case

python-logging-best-practices is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using python-logging-best-practices should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/python-logging-best-practices/SKILL.md --create-dirs "https://raw.githubusercontent.com/terrylica/cc-skills/main/plugins/devops-tools/skills/python-logging-best-practices/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/python-logging-best-practices/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How python-logging-best-practices Compares

Feature / Agent	python-logging-best-practices	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Python Logging Best Practices

> **Self-Evolving Skill**: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.

## When to Use This Skill

Use this skill when:

- Setting up Python logging for any service or script
- Configuring structured JSONL logging for analysis
- Implementing log rotation
- Choosing between lightweight (zero-dep) and full-featured logging
- Adding logging to containerized, systemd, or local applications

## Overview

Unified reference for Python logging patterns optimized for machine readability (Claude Code analysis) and operational reliability. **Starts with the lightest viable approach and scales up only when needed.**

## Decision Heuristic: Start Light, Scale Up

```
Is it < 5 services on a single machine, < 1 event/sec?
  YES → Lightweight Pattern (print + JSONL telemetry)
  NO  → Is it containerized / serverless?
    YES → stdout JSON (any library), no file rotation
    NO  → Is OTel tracing required?
      YES → structlog + OTel
      NO  → loguru (CLI tools) or stdlib RotatingFileHandler
```

| Approach        | Use Case                                             | Pros                                          | Cons                                         |
| --------------- | ---------------------------------------------------- | --------------------------------------------- | -------------------------------------------- |
| **Lightweight** | Small systemd services, self-hosted, single operator | Zero deps, journald integration, minimal code | No severity filtering, no per-module control |
| `loguru`        | CLI tools, scripts, local services                   | Zero-config, built-in rotation, great DX      | External dep, not truly schema-enforced      |
| `structlog`     | Production services, OTel integration                | ContextVars, processor chains, OTel-native    | Steeper learning curve                       |
| `stdlib`        | LaunchAgent daemons, zero-dep constraint             | No dependencies, Python 3.13 `merge_extra`    | More boilerplate, no structured defaults     |
| `Logfire`       | AI/LLM observability, Pydantic apps                  | Built on OTel, token/cost tracking, SQL       | SaaS dependency, newer ecosystem             |

---

## Preferred: Lightweight Pattern (Zero Dependencies)

**For: < 5 systemd services, single server, single operator. Battle-tested in production by [ccmax-monitor](https://github.com/terrylica/ccmax-monitor).**

This pattern uses a **two-channel architecture**:

- **Channel 1**: `print(flush=True)` → systemd journald (operational logs, human-readable)
- **Channel 2**: Append-only JSONL file (structured telemetry, machine-readable)

This maps to the 12-Factor App's "treat logs as event streams" principle. journald handles ops (rotation, filtering, metadata), while the JSONL file serves domain telemetry for post-mortem analysis.

### Architecture: Three-Concern Separation

| Concern            | Mechanism                      | Purpose                                     | Lifecycle                          |
| ------------------ | ------------------------------ | ------------------------------------------- | ---------------------------------- |
| **Ops logging**    | `print()` → journald           | Human debugging, `journalctl -u service -f` | Managed by journald (auto-rotated) |
| **Telemetry**      | JSONL file (`telemetry.jsonl`) | Structured audit trail, AI/LLM analysis     | Append-only, rotated by size       |
| **State recovery** | WAL file (optional)            | Crash recovery for irreversible operations  | Ephemeral, deleted on success      |

### Complete Lightweight Example

```python
"""Append-only JSONL telemetry logger with size-based rotation.

Zero external dependencies. Works with systemd journald for ops logging
and a separate JSONL file for structured machine-readable telemetry.
"""

import json
from datetime import datetime, timezone
from pathlib import Path

TELEMETRY_PATH = Path(__file__).parent / "telemetry.jsonl"
MAX_SIZE = 10 * 1024 * 1024  # 10 MB
BACKUP_COUNT = 3             # Keep 3 rotated backups (~30MB total)


def log_event(event_type: str, data: dict) -> None:
    """Append a structured JSON line to telemetry.jsonl."""
    entry = {
        "ts": datetime.now(timezone.utc).isoformat(),
        "type": event_type,
        **data,
    }
    line = json.dumps(entry, separators=(",", ":")) + "\n"

    try:
        try:
            if TELEMETRY_PATH.stat().st_size > MAX_SIZE:
                _rotate()
        except FileNotFoundError:
            pass

        with open(TELEMETRY_PATH, "a") as f:
            f.write(line)
    except OSError as e:
        # Fallback to stderr (captured by journald)
        print(f"[telemetry] write failed: {e}", file=__import__("sys").stderr, flush=True)


def _rotate() -> None:
    """Rotate telemetry files: .jsonl → .jsonl.1 → .jsonl.2 → .jsonl.3"""
    for i in range(BACKUP_COUNT, 1, -1):
        src = TELEMETRY_PATH.with_suffix(f".jsonl.{i - 1}")
        dst = TELEMETRY_PATH.with_suffix(f".jsonl.{i}")
        if src.exists():
            dst.unlink(missing_ok=True)
            src.rename(dst)
    backup = TELEMETRY_PATH.with_suffix(".jsonl.1")
    backup.unlink(missing_ok=True)
    TELEMETRY_PATH.rename(backup)


# === Ops logging (goes to journald via stdout) ===

def log(msg: str) -> None:
    """Human-readable operational log line. Captured by journald."""
    ts = datetime.now(timezone.utc).strftime("%H:%M:%S")
    print(f"[{ts}] {msg}", flush=True)
```

**Usage:**

```python
# Operational (human reads via journalctl -u myservice -f)
log("Refreshing token for account X")
log("Switch: account A → account B (reason: 5h breach)")

# Telemetry (machine reads via jq/DuckDB/Claude Code)
log_event("token_refresh", {"account": "X", "expires_in_h": 8.0, "token_fp": "abc12345"})
log_event("account_switch", {"from": "A", "to": "B", "reason": "5h_breach"})
```

### Security: Token Fingerprinting (Not Regex Redaction)

**Never pass secrets through the logging pipeline.** Log only a non-reversible fragment:

```python
def _token_fingerprint(token: str) -> str:
    """Extract uniquely identifiable chars from a token's mid-section.

    The prefix (sk-ant-oat01-) and suffix (...AA) are common across tokens.
    Chars 14-22 (after the prefix) are the most unique per-token.
    Middle-slice avoids leaking type-prefix metadata that prefix-based
    approaches expose.
    """
    if len(token) > 25:
        return token[14:22]
    return token[:8] if token else ""

# Usage: log the fingerprint, never the token
log_event("token_refresh", {"account": name, "token_fp": _token_fingerprint(token)})
```

**Why this is superior to regex redaction filters:**

| Approach                                    | Security                                  | Maintenance                               | Failure mode                    |
| ------------------------------------------- | ----------------------------------------- | ----------------------------------------- | ------------------------------- |
| **Token fingerprinting** (log only a slice) | Secret never enters logging pipeline      | Zero — works with any token format        | Cannot fail — nothing to redact |
| Regex redaction filter                      | Secret passes through, filtered on output | Must update regexes for new token formats | Silent miss = secret in logs    |

This aligns with OWASP Logging Cheat Sheet: "Ensure that no sensitive data is included in log entries." Major platforms (AWS, Stripe, GitHub) use separate non-secret identifiers or partial token display — never full tokens with regex scrubbing.

Regex filters remain useful as a **defense-in-depth backstop**, not a primary control.

### Health Endpoints as Observability

For small deployments, **rich JSON health endpoints replace log aggregation**:

```python
@app.get("/api/status")
def status():
    """White-box monitoring — current state on demand."""
    return {"active_account": ..., "accounts": [...], "polled_at": ...}

@app.get("/api/vault-health")
def vault_health():
    """Token health for all accounts."""
    return {name: {"status": "healthy", "expires_in": "7.5h", ...} for ...}
```

This is the **Health Endpoint Monitoring Pattern** (Microsoft Azure Architecture Center) / **Health Check API Pattern** (microservices.io). The dashboard IS the monitoring tool — no Grafana/Prometheus needed.

When the service itself serves its own operational state as structured JSON, you get:

- **Real-time** current state (not delayed by log ingestion pipelines)
- **Zero infrastructure** (no log shipper, storage, or query engine)
- **AI-parseable** (Claude Code can `curl` and analyze directly)

### Post-Mortem with FOSS CLI Tools

No log aggregation stack needed. These single-binary tools work directly on JSONL:

```bash
# DuckDB — SQL analytics on JSONL (most powerful)
duckdb -c "SELECT type, count(*) FROM read_json_auto('telemetry.jsonl') GROUP BY 1 ORDER BY 2 DESC"

# jq — ad-hoc JSON filtering
jq 'select(.type == "token_refresh")' telemetry.jsonl

# journalctl — already exports JSONL natively
journalctl -u ccmax-switcher -o json --since "1h ago" | jq 'select(.PRIORITY == "3")'

# lnav — interactive terminal log viewer with SQL
lnav telemetry.jsonl

# llm (Simon Willison) — pipe to LLM for AI post-mortem
journalctl -u myservice --since "2h ago" --priority=err -o json | llm "analyze root cause"
```

### When to Upgrade Beyond Lightweight

Upgrade to loguru/structlog when any of these become true:

- **> 5 services** across multiple hosts (need trace IDs for correlation)
- **> 10 events/sec** sustained (need async sinks, `orjson`)
- **Multiple operators** who need per-module log level filtering
- **Compliance requirements** that mandate structured audit trails with signatures
- **Container/K8s deployment** (stdout JSON is the standard)

---

## Full-Featured: Loguru + JSONL Pattern

For CLI tools, scripts, and services that benefit from a logging library:

### Log Rotation (ALWAYS CONFIGURE for local/CLI apps)

```python
from loguru import logger

logger.add(
    log_path,
    rotation="10 MB",
    retention="7 days",
    compression="gz"
)

# stdlib alternative (zero-dep)
from logging.handlers import RotatingFileHandler

handler = RotatingFileHandler(
    log_path,
    maxBytes=100 * 1024 * 1024,  # 100MB
    backupCount=5
)
```

> **Container/serverless apps**: Skip file rotation entirely. Log to **stdout/stderr as JSON**. Let the container runtime handle collection and rotation.

### JSONL Format (Machine-Readable)

```python
# One JSON object per line - jq-parseable
{"timestamp": "2026-01-14T12:45:23.456Z", "level": "info", "message": "..."}
```

**File extension**: Always use `.jsonl` (not `.json` or `.log`)

**Performance**: For >10k records/sec, use `orjson` instead of `json.dumps()`:

```python
import orjson

def json_formatter(record) -> str:
    log_entry = { ... }
    return orjson.dumps(log_entry).decode()
```

### Regex Redaction (Defense-in-Depth)

Use as a **backstop** alongside token fingerprinting, not as the primary control:

```python
import re

REDACT_PATTERNS = [
    (re.compile(r'AKIA[0-9A-Z]{16}'), '[REDACTED_AWS_KEY]'),
    (re.compile(r'sk-[a-zA-Z0-9]{48}'), '[REDACTED_API_KEY]'),
    (re.compile(r'(?i)bearer\s+[a-zA-Z0-9._~+/=-]+'), '[REDACTED_BEARER]'),
]

def redact_filter(record):
    for pattern, replacement in REDACT_PATTERNS:
        record["message"] = pattern.sub(replacement, record["message"])
    return True

logger.add(sink, filter=redact_filter)
```

### Shutdown — Always Flush Enqueued Messages

```python
import asyncio
from loguru import logger

async def main():
    logger.add("app.jsonl", enqueue=True)
    await logger.complete()

asyncio.run(main())
# Sync: logger.remove()
```

### Complete Loguru + JSONL Example

```python
#!/usr/bin/env python3
# /// script
# requires-python = ">=3.13"
# dependencies = ["loguru", "orjson"]
# ///

import re
import sys
from pathlib import Path
from uuid import uuid4

import orjson
from loguru import logger

REDACT_PATTERNS = [
    (re.compile(r'AKIA[0-9A-Z]{16}'), '[REDACTED_AWS_KEY]'),
    (re.compile(r'sk-[a-zA-Z0-9]{48}'), '[REDACTED_API_KEY]'),
]


def json_formatter(record) -> str:
    log_entry = {
        "timestamp": record["time"].strftime("%Y-%m-%dT%H:%M:%S.%f")[:-3] + "Z",
        "level": record["level"].name.lower(),
        "component": record["function"],
        "operation": record["extra"].get("operation", "unknown"),
        "operation_status": record["extra"].get("status", None),
        "trace_id": record["extra"].get("trace_id"),
        "message": record["message"],
        "context": {k: v for k, v in record["extra"].items()
                   if k not in ("operation", "status", "trace_id", "metrics")},
        "metrics": record["extra"].get("metrics", {}),
        "error": None
    }

    if record["exception"]:
        exc_type, exc_value, _ = record["exception"]
        log_entry["error"] = {
            "type": exc_type.__name__ if exc_type else "Unknown",
            "message": str(exc_value) if exc_value else "Unknown error",
        }

    return orjson.dumps(log_entry).decode()


def redact_filter(record):
    for pattern, replacement in REDACT_PATTERNS:
        record["message"] = pattern.sub(replacement, record["message"])
    return True


def setup_logger(app_name: str, log_dir: Path | None = None):
    logger.remove()
    logger.add(sys.stderr, format=json_formatter, filter=redact_filter, level="INFO")
    if log_dir is not None:
        log_dir.mkdir(parents=True, exist_ok=True)
        logger.add(
            str(log_dir / f"{app_name}.jsonl"),
            format=json_formatter,
            filter=redact_filter,
            rotation="10 MB",
            retention="7 days",
            compression="gz",
            level="DEBUG"
        )
    return logger
```

## Semantic Fields Reference

| Field               | Type          | Purpose                                                |
| ------------------- | ------------- | ------------------------------------------------------ |
| `timestamp` / `ts`  | ISO 8601      | Event ordering (millisecond precision minimum)         |
| `level` / `type`    | string        | Severity or event type                                 |
| `component` / `svc` | string        | Module, function, or service name                      |
| `operation`         | string        | What action is being performed                         |
| `operation_status`  | string        | started/success/failed/skipped                         |
| `trace_id`          | UUID4 or OTel | Correlation ID (OTel trace ID for production services) |
| `message`           | string        | Human-readable description                             |
| `context`           | object        | Operation-specific metadata                            |
| `metrics`           | object        | Quantitative data (counts, durations)                  |
| `error`             | object/null   | Exception details if failed                            |

## Related Resources

- [Health Endpoint Monitoring Pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/health-endpoint-monitoring) - Microsoft Azure Architecture Center
- [OWASP Logging Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html) - Security best practices
- [Write-Ahead Log pattern](https://martinfowler.com/articles/patterns-of-distributed-systems/write-ahead-log.html) - Martin Fowler
- [DuckDB JSON support](https://duckdb.org/docs/data/json/overview) - SQL analytics on JSONL
- [lnav](https://lnav.org/) - Terminal log file navigator with SQL
- [llm CLI](https://github.com/simonw/llm) - Pipe logs to LLMs for analysis
- [structlog docs](https://www.structlog.org/) - Structured logging for production services
- [Pydantic Logfire](https://pydantic.dev/logfire) - AI/LLM observability built on OTel
- [Langfuse](https://langfuse.com/) - Open-source LLM observability (self-hostable)

## Anti-Patterns to Avoid

1. **Unbounded logs** - Always configure rotation (local) or stdout (container)
2. **Logging full secrets** - Use token fingerprinting; regex redaction is a backstop, not primary
3. **Adding loguru/structlog to < 5 low-volume services** - print + JSONL is sufficient; dependency is not free
4. **Bare except without logging** - Catch specific exceptions, log them
5. **Silent failures** - Log errors before suppressing
6. **`enqueue=True` without `logger.complete()`** - Silent log loss on shutdown
7. **`enqueue=True` with slow sinks** - Unbounded memory growth
8. **`json.dumps()` at >10k events/sec** - Use orjson for 2-10x speedup
9. **UUID4 trace IDs in OTel services** - Use OTel-propagated trace IDs
10. **Prometheus/Grafana for < 5 services** - Health endpoints + Uptime Kuma is sufficient
11. **Conflating WAL and telemetry** - WAL is for crash recovery (ephemeral), telemetry is for audit (permanent)

---

## Troubleshooting

| Issue                         | Cause                            | Solution                                             |
| ----------------------------- | -------------------------------- | ---------------------------------------------------- |
| loguru not found              | Not installed                    | Run `uv add loguru`                                  |
| Logs not appearing            | Wrong log level                  | Set level to DEBUG for troubleshooting               |
| Log rotation not working      | Missing rotation config          | Add rotation param to logger.add()                   |
| JSONL parse errors            | Malformed log line               | Check for unescaped special characters               |
| OOM with enqueue=True         | Unbounded internal queue         | Monitor RSS; use structlog or avoid slow sinks       |
| Lost logs on shutdown         | Missing logger.complete()        | Call `await logger.complete()` or `logger.remove()`  |
| Slow JSONL serialization      | Using stdlib json at high volume | Switch to `orjson.dumps().decode()`                  |
| Secrets in logs               | No fingerprinting                | Log token slices, not full values                    |
| journald not capturing output | Missing flush                    | Use `print(..., flush=True)` or `PYTHONUNBUFFERED=1` |
| No alerts when services crash | No external monitor              | Add Uptime Kuma or Gatus polling health endpoints    |

## Post-Execution Reflection

After this skill completes, check before closing:

1. **Did the command succeed?** — If not, fix the instruction or error table that caused the failure.
2. **Did parameters or output change?** — If the underlying tool's interface drifted, update Usage examples and Parameters table to match.
3. **Was a workaround needed?** — If you had to improvise (different flags, extra steps), update this SKILL.md so the next invocation doesn't need the same workaround.

Only update if the issue is real and reproducible — not speculative.

Related Skills

python-workspace

from terrylica/cc-skills

Python workspace for MQL5 integration. TRIGGERS - MetaTrader 5 Python, mt5 package, MQL5-Python setup.

python-memory-safe-scripts

from terrylica/cc-skills

Memory-safe Python script patterns for long-running processes under systemd MemoryMax constraints. Covers allocator purge (mimalloc/glibc malloc_trim), HTTP response lifecycle, DataFrame cleanup, thread-local connection reuse, and periodic GC cadence. Battle-tested through 5 OOM optimization cycles on production GPU workstations. Use this skill proactively whenever writing or reviewing Python scripts that: run under systemd with MemoryMax, process data in loops (downloads, ETL, backfill), use ThreadPoolExecutor, or make repeated HTTP requests. Also use when diagnosing OOM kills, RSS creep, or fd exhaustion in Python services. TRIGGERS - memory optimization, OOM prevention, RSS reduction, malloc_trim, systemd MemoryMax, memory leak, allocator purge, memory-safe script, RSS creep, fd exhaustion, SIGKILL status 9, MemoryHigh, glibc arena, mimalloc purge, requests memory leak, ThreadPoolExecutor cleanup.

mlflow-python

from terrylica/cc-skills

MLflow experiment tracking via Python API. TRIGGERS - MLflow metrics, log backtest, experiment tracking, search runs.

voice-quality-audition

from terrylica/cc-skills

Audition Kokoro TTS voices to compare quality and grade. TRIGGERS - audition voices, kokoro voices, voice comparison, tts voice, voice quality, compare voices.

settings-and-tuning

from terrylica/cc-skills

Configure TTS voices, speed, timeouts, queue depth, and bot settings. TRIGGERS - configure tts, change voice, tts speed, queue depth, tts timeout, bot config, tune settings, adjust parameters.

full-stack-bootstrap

from terrylica/cc-skills

One-time bootstrap for Kokoro TTS engine, Telegram bot, and BotFather setup. TRIGGERS - setup tts, install kokoro, botfather, bootstrap tts-tg-sync, configure telegram bot, full stack setup.

diagnostic-issue-resolver

from terrylica/cc-skills

Diagnose and resolve TTS and Telegram bot issues. TRIGGERS - tts not working, bot not responding, kokoro error, audio not playing, lock stuck, telegram bot troubleshoot, diagnose issue.

component-version-upgrade

from terrylica/cc-skills

Upgrade Kokoro model, bot dependencies, or TTS components. TRIGGERS - upgrade kokoro, update model, upgrade bot, update dependencies, version bump, component update.

clean-component-removal

from terrylica/cc-skills

Remove TTS and Telegram sync components cleanly. TRIGGERS - uninstall tts, remove telegram bot, uninstall kokoro, clean tts, teardown, component removal.

send-message

from terrylica/cc-skills

Use when user wants to send a text message on Telegram as their personal account via MTProto, text someone, or message a contact by username, phone, or chat ID.

send-media

from terrylica/cc-skills

Use when user wants to send or upload a file, photo, video, voice note, or document on Telegram via their personal account.

search-messages

from terrylica/cc-skills

Use when user wants to search for messages across all Telegram chats or within a specific chat, find old messages by text, or look up Telegram message history filtered by sender.