SEXP Benchmark Strategy

## Goal

3,147 stars

Best use case

SEXP Benchmark Strategy is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

## Goal

Teams using SEXP Benchmark Strategy should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/sexp_bench/SKILL.md --create-dirs "https://raw.githubusercontent.com/atopile/atopile/main/.claude/skills/sexp_bench/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/sexp_bench/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How SEXP Benchmark Strategy Compares

Feature / AgentSEXP Benchmark StrategyStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

## Goal

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# SEXP Benchmark Strategy

## Goal
Measure and improve S-expression pipeline performance with a focus on:
- Throughput per stage
- Peak memory per stage
- End-to-end behavior on realistic KiCad PCB inputs

## Pipeline Stages
Benchmark these layers separately:
- `tokenizer`
- `ast`
- `parser` (typed decode)
- `encode` (typed encode to raw SEXP)
- `pretty` (formatting)

## Dataset Dimensions
Use a matrix over:
- `depth`: shallow vs deep nesting
- `size`: small, medium, large

Recommended size buckets:
- `small`: `< 64 KiB`
- `medium`: `64 KiB .. < 1 MiB`
- `large`: `1 MiB .. < 10 MiB`

## Measurement Model
For each stage and sample:
- `mean_ms` and percentile latency (`median`, `p80`)
- `mem_before`, `mem_after`, `mem_delta`
- `stage_peak_increment` (stage-local high watermark increase)
- `cumulative_pipeline_peak` and `cumulative_peak_over_start`

Key interpretation:
- Negative/near-zero `mem_delta` can still coexist with high `stage_peak_increment`.
- For allocator-heavy code, `peak` metrics are more informative than final deltas.

## Methodology
1. Warm caches with warmup runs.
2. Run at least one measured sample per cell (more for stable comparisons).
3. Keep run settings fixed when comparing commits:
   - same machine
   - same optimize mode
   - same dataset generator/inputs
4. Compare both synthetic matrix and a large real-world board.

## E2E Roundtrip Benchmark (panel.kicad_pcb)
Use this when validating real end-to-end impact (load -> dump -> reload) on a large board.

### Setup
1. Build `pyzig_sexp.so` in each checkout you want to compare:
```bash
source .venv/bin/activate
cd src/faebryk/core/zig
python -m ziglang build python-ext -Doptimize=ReleaseFast -Dpython-include=/usr/include/python3.14 -Dpython-lib=python3.14
```
2. For baseline vs current comparison, create a detached worktree for baseline:
```bash
git worktree add /tmp/atopile_pre_tokenizer_fix <baseline_commit>
```
3. Run the benchmark sequentially (not in parallel) to avoid cross-run CPU contention.

### Runner Pattern
Use Python with direct `importlib` loading of `pyzig_sexp.so` from each checkout. This avoids accidental rebuilds and keeps the comparison tied to the compiled artifact in that checkout.

```python
from pathlib import Path
from time import perf_counter
import gc, importlib.util, sys

spec = importlib.util.spec_from_file_location(
    "pyzig_sexp",
    "src/faebryk/core/zig/zig-out/lib/pyzig_sexp.so",
)
mod = importlib.util.module_from_spec(spec)
sys.modules["pyzig_sexp"] = mod
spec.loader.exec_module(mod)

text = Path("panel.kicad_pcb").read_text(encoding="utf-8")

# warmup
obj = mod.pcb.loads(text)
out = mod.pcb.dumps(obj)
obj2 = mod.pcb.loads(out)
del obj, out, obj2
gc.collect()

runs = []
for _ in range(5):
    gc.collect()
    t0 = perf_counter()
    obj = mod.pcb.loads(text)
    t1 = perf_counter()
    out = mod.pcb.dumps(obj)
    t2 = perf_counter()
    obj2 = mod.pcb.loads(out)
    t3 = perf_counter()
    runs.append((t1 - t0, t2 - t1, t3 - t2, t3 - t0))
    del obj, out, obj2

print("AVG", " ".join(f"{sum(r[i] for r in runs)/len(runs):.3f}" for i in range(4)))
```

### Report Format
Report at minimum:
- `load_avg_s`
- `dump_avg_s`
- `reload_avg_s`
- `total_roundtrip_avg_s`
- relative delta vs baseline (%)

## Optimization Approach
Prioritize stage-local wins first, then validate global effects:
1. Eliminate avoidable intermediate allocations.
2. Use streaming write paths for encode-heavy workloads.
3. Keep fast parse paths for success cases; fallback to richer diagnostics on error.
4. Re-check correctness via roundtrip and file-format tests after each change.

## Safety Checks
After each optimization pass:
- Build release artifacts.
- Run KiCad file-format tests.
- Run representative load-transform-dump flows.
- Re-run matrix + panel benchmark snapshots.

## Reporting
Summarize gains in two views:
- Matrix view: `(depth, size) x stage`
- Large-board view: stage timing + peak memory

Always report:
- absolute values
- speedup ratios
- peak-memory deltas vs baseline

Related Skills

sexp

3147
from atopile/atopile

How the Zig S-expression engine and typed KiCad models work, how they are exposed to Python (pyzig_sexp), and the invariants around parsing, formatting, and freeing. Use when working with KiCad file parsing, S-expression generation, or layout sync.

lsp

3147
from atopile/atopile

How the atopile Language Server works (pygls), how it builds per-document graphs for completion/hover/defs, and the invariants for keeping it fast and crash-proof.

solver

3147
from atopile/atopile

How the Faebryk parameter solver works (Sets/Literals, Parameters, Expressions), the core invariants enforced during mutation, and practical workflows for debugging and extending the solver. Use when implementing or modifying constraint solving, parameter bounds, or debugging expression simplification.

pyzig

3147
from atopile/atopile

How the Zig↔Python binding layer works (pyzig), including build-on-import, wrapper generation patterns, ownership rules, and where to add new exported APIs. Use when adding Zig-Python bindings, modifying native extensions, or debugging C-API interactions.

planning

3147
from atopile/atopile

Spec-driven planning for complex design tasks: when to plan, how to write specs as .ato files, and how to verify against requirements.

Package Agent

3147
from atopile/atopile

You are a package specialist.

library

3147
from atopile/atopile

How the Faebryk component library is structured, how `_F.py` is generated, and the conventions/invariants for adding new library modules. Use when adding or modifying library components, traits, or module definitions.

graph

3147
from atopile/atopile

How the Zig-backed instance graph works (GraphView/NodeReference/EdgeReference), the real Python API surface, and the invariants around allocation, attributes, and cleanup. Use when working with low-level graph APIs, memory management, or building systems that traverse the instance graph.

frontend

3147
from atopile/atopile

Frontend standards for atopile extension webviews: architecture, contracts, design system, and testing workflow.

faebryk

3147
from atopile/atopile

How Faebryk's TypeGraph works (GraphView + Zig edges), how to traverse/resolve references, and how FabLL types/traits map onto edge types. Use when working with TypeGraph traversal, edge types, or building type-aware queries.

fabll

3147
from atopile/atopile

How FabLL (faebryk.core.node) maps Python node/trait declarations into the TypeGraph + instance graph, including field/trait invariants and instantiation patterns. Use when defining new components or traits, working with the Node API, or understanding type registration.

domain-layer

3147
from atopile/atopile

Instructions for electronics-specific logic and build processes: netlists, PCBs, build steps, and exporters. Use when implementing or modifying build steps, exporters, PCB generation, or BOM/netlist output.