CLI-Agent Architecture Skill

> A single `run(command="...")` tool with Unix CLI commands outperforms typed function calls.

3,891 stars

Best use case

CLI-Agent Architecture Skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

> A single `run(command="...")` tool with Unix CLI commands outperforms typed function calls.

Teams using CLI-Agent Architecture Skill should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/cli-agent-architecture/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/1477009639zw-blip/cli-agent-architecture/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/cli-agent-architecture/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How CLI-Agent Architecture Skill Compares

Feature / Agent	CLI-Agent Architecture Skill	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

> A single `run(command="...")` tool with Unix CLI commands outperforms typed function calls.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

SKILL.md Source

# CLI-Agent Architecture Skill

> A single `run(command="...")` tool with Unix CLI commands outperforms typed function calls.

This skill teaches the **two-layer CLI architecture** derived from production lessons at Manus and r/LocalLLaMA research. It is the foundation for building robust, production-ready AI agents that execute shell commands.

---

## 1. Why CLI > Typed Functions

### The LLM-Native Interface

LLMs have seen **billions of Unix CLI examples** in training data. They understand:
- Pipe semantics (`|`, `>`, `>>`)
- Exit codes (`$?`, `||`, `&&`)
- Redirection (`2>&1`, `<`, `<<`)
- Globbing and expansion (`*`, `?`, `[...]`)

Typed function calls are **unfamiliar terrain** — a thin abstraction layer that maps poorly onto concepts LLMs already master.

### One Tool, Not Three

Typed functions for a file operation:
```
read_file(path) → content
analyze(content) → result
write_file(path, result)
```

CLI equivalent:
```
run(command="grep pattern file | jq '.key' > result.json")
```

The pipe chain replaces three function calls with one coherent primitive. LLMs already think in pipelines.

### Unified Namespace

- Typed functions create **context-switching overhead**: switching between "function call mode" and "shell mode"
- CLI provides a **single namespace** for all operations: files, processes, network, services, containers
- No schema drift, no SDK版本 mismatch, no function deprecation

---

## 2. Two-Layer Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                      AGENT (LLM)                            │
│         Thinks in pipelines. Speaks shell natively.         │
└────────────────────────┬────────────────────────────────────┘
                         │ command="..."
                         ▼
┌─────────────────────────────────────────────────────────────┐
│               LAYER 1 — Unix Execution                       │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  exec.run(command)  →  (stdout, stderr, exit_code)  │    │
│  └─────────────────────────────────────────────────────┘    │
│  • Pure execution, no abstraction                           │
│  • Lossless — binary stdout passes through unchanged        │
│  • Metadata-free — Layer 2 adds all presentation logic      │
└────────────────────────┬────────────────────────────────────┘
                         │ raw output
                         ▼
┌─────────────────────────────────────────────────────────────┐
│             LAYER 2 — LLM Presentation                        │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌─────────────┐   │
│  │ Binary   │  │ Overflow │  │ stderr   │  │  Metadata   │   │
│  │ Guard    │  │ Truncator│  │ Attachment│  │   Footer    │   │
│  └──────────┘  └──────────┘  └──────────┘  └─────────────┘   │
│  Binary → guidance   >200 lines →  • exit:N on failure      │
│  detected  → replaced  temp file     • duration on success  │
└────────────────────────┬────────────────────────────────────┘
                         │ optimized output
                         ▼
┌─────────────────────────────────────────────────────────────┐
│               AGENT (LLM) — receives processed view           │
└─────────────────────────────────────────────────────────────┘
```

### Why Separation Is Logically Necessary

Layer 1 must be **lossless** — it cannot make decisions about what to show the LLM, because it has no context about the task. Layer 2 is the **presentation layer** that adapts raw execution output for LLM consumption.

If Layer 1 filtered or truncated, it would make irreversible decisions without task context. If Layer 2 executed commands, it would mix concerns and lose the clarity of the pipeline.

---

## 3. Four Layer 2 Mechanisms

### 3A. Binary Guard

**Problem:** Binary data (images, PDFs, executables) blinds the LLM. A terminal full of PNG header bytes is meaningless and wastes context.

**Detection:** Read the first 8KB of stdout. If >30% non-printable bytes (outside 0x20-0x7E, 0x09, 0x0A, 0x0D), treat as binary.

**Replacement message format:**
```
[Binary file detected — 182KB PNG image]
Use: see <temp_path>
Or:  file <path>
```

**Script:** `scripts/binary_guard.py`

### 3B. Overflow Mode

**Problem:** Large outputs (>200 lines) cause attention collapse. The LLM loses the signal in the noise.

**Truncation strategy:**
1. Show first 50 lines (context anchor)
2. Write full output to temp file
3. Replace middle with: `[... N lines truncated. Full output: /tmp/out_abc123 ...]`
4. Show last 20 lines (recent context)

**Threshold:** 200 lines (configurable). Below threshold, pass through unchanged.

**Script:** `scripts/truncator.py`

### 3C. Metadata Footer

**Purpose:** Always tell the LLM the exit code and execution duration.

**On success:**
```
[exit:0 | 1.23s]
```

**On failure (combined with stderr attachment):**
```
[exit:127 | 0.45s]
```

The LLM uses this to decide retry, different command, or escalation — without needing to parse raw output.

### 3D. stderr Attachment

**Problem:** Silent `stderr` causes blind retries. The LLM sees exit code != 0 but has no clue what went wrong.

**Rule:** Never suppress stderr. On failure, always attach it.

**Format:**
```
--- stderr ---
/bin/grep: file: No such file or directory
--- end stderr ---
```

**On success:** stderr is discarded unless it contains warnings the LLM should know about (configurable).

**Script:** `scripts/stderr_capture.py`

---

## 4. Error Message Design

Every error message must have two parts:

1. **What went wrong** — concrete, specific
2. **What to do instead** — actionable next step

### Examples

| Command | Error | Good Message |
|---------|-------|--------------|
| `cat photo.png` | binary content | `[error] binary image (182KB PNG). Use: see photo.png` |
| `grep foo huge.log` | no match | `[error] no matches found in huge.log (0 results). Pattern: foo` |
| `rm -rf /` | permission denied | `[error] permission denied (exit:1). Do not run: rm -rf /. Use: rm file` |
| `nc -z host 443` | connection refused | `[error] connection refused to host:443. Check: is the service running?` |

### Anti-patterns

❌ `"error occurred"` — vague  
❌ `"command failed"` — no clue what went wrong  
❌ `"try again"` — no diagnostic info  
❌ `"file not found"` — no suggestion on what to try

---

## 5. Progressive Disclosure

Don't dump all documentation at once. Reveal on demand.

### Level 0 — Always Injected (Start of Session)

```
Available commands (one-line summaries):
  run     — Execute shell command, returns stdout/stderr/exit
  see     — Render binary file (image/video/audio) inline
  search  — Full-text search across files
  read    — Read file contents (text only)
  write   — Write text to file
  list    — List directory contents
```

### Level 1 — On-Demand Usage (no args or --help)

```
$ run
Usage: run <command>
Executes a shell command and returns processed output.
  --timeout=N   Max execution time in seconds (default: 60)
  --env=KEY=VAL Inject environment variable
```

### Level 2 — Parameter Drilling (explicit request)

Full parameter documentation, examples, edge cases, and security notes.

---

## 6. Implementation Guide

### Directory Structure

```
cli-agent-architecture/
├── SKILL.md
├── scripts/
│   ├── binary_guard.py
│   ├── truncator.py
│   └── stderr_capture.py
└── examples/
    └── two_layer_execution.py   # reference implementation
```

### Binary Detection (`binary_guard.py`)

```python
#!/usr/bin/env python3
"""Detect binary data in byte stream. Returns (is_binary, guidance_message)."""
import sys
import os
import stat

def detect_binary_stream(data: bytes, path: str = None) -> tuple[bool, str]:
    """Return (True, guidance) if data appears binary."""
    # Fast path: check file mode if path provided
    if path and os.path.exists(path):
        mode = os.stat(path).st_mode
        if stat.S_ISBLK(mode) or stat.S_ISCHR(mode) or stat.S_ISFIFO(mode):
            return True, f"[Binary device/fifo detected: {path}]"

    if not data:
        return False, ""

    # Sample first 8KB
    sample = data[:8192]
    non_printable = sum(
        1 for b in sample
        if b not in (9, 10, 13) and (b < 32 or b > 126)
    )

    ratio = non_printable / len(sample) if sample else 0

    if ratio > 0.30:
        # Try to identify type
        size = len(data)
        hint = ""
        if path:
            import mimetypes
            mime, _ = mimetypes.guess_type(path)
            if mime:
                hint = f" ({mime})"

        return True, f"[Binary file detected — {size} bytes{hint}]\nUse: see {path or '<tempfile>'}\nOr:  file {path or '<file>'}"

    return False, ""


if __name__ == "__main__":
    data = sys.stdin.buffer.read()
    is_bin, msg = detect_binary_stream(data)
    if is_bin:
        print(msg, file=sys.stderr)
        sys.exit(1)
```

### Overflow Truncation (`truncator.py`)

```python
#!/usr/bin/env python3
"""Truncate large output, write full content to temp file."""
import sys
import os
import tempfile

MAX_LINES = 200
SHOW_HEAD = 50
SHOW_TAIL = 20

def truncate_output(stdout: str, stderr: str = "") -> tuple[str, str | None]:
    """
    If stdout > MAX_LINES, truncate and write to temp file.
    Returns (processed_stdout, temp_file_path or None).
    """
    lines = stdout.splitlines()
    temp_path = None

    if len(lines) <= MAX_LINES:
        return stdout, None

    head = "\n".join(lines[:SHOW_HEAD])
    tail = "\n".join(lines[-SHOW_TAIL:])
    truncated_mid = f"[... {len(lines) - SHOW_HEAD - SHOW_TAIL} lines truncated ...]"

    # Write full output to temp file
    fd, temp_path = tempfile.mkstemp(prefix="cli_out_", suffix=".txt")
    try:
        os.write(fd, stdout.encode("utf-8", errors="replace"))
    finally:
        os.close(fd)

    return f"{head}\n{truncated_mid}\n{tail}", temp_path


if __name__ == "__main__":
    output = sys.stdin.read()
    truncated, path = truncate_output(output)
    print(truncated)
    if path:
        print(f"\n[Full output written to: {path}]", file=sys.stderr)
```

### stderr Capture (`stderr_capture.py`)

```python
#!/usr/bin/env python3
"""Capture and format stderr on command failure."""
import sys

def format_stderr_attachment(stderr: str, command: str = "") -> str:
    """Format stderr for display when a command fails."""
    if not stderr or not stderr.strip():
        return ""

    lines = stderr.strip().splitlines()
    # Limit to 30 lines to avoid flooding context
    if len(lines) > 30:
        lines = lines[:30] + ["[... additional stderr truncated ...]"]

    header = "--- stderr ---"
    if command:
        header += f" (command: {command})"
    footer = "--- end stderr ---"

    return "\n".join([header] + lines + [footer])


if __name__ == "__main__":
    stderr = sys.stdin.read()
    formatted = format_stderr_attachment(stderr)
    if formatted:
        print(formatted, file=sys.stderr)
```

---

## 7. When CLI Breaks Down

### Strongly-Typed Interactions

GraphQL APIs, complex DB queries with typed schemas, gRPC with protobuf — CLI's string-based interface loses type safety. Use typed function calls here, or build a thin CLI wrapper that validates types before passing to the underlying system.

### High-Security / Injection-Risk Environments

- SQL/shell injection risk with unsanitized user input
- Environments where arbitrary command execution is prohibited
- Audited systems where all actions must be logged and approved

In these cases, typed functions with explicit allowlists are preferable to unrestricted CLI access.

### Native Multimodal (Audio/Video Processing)

When the task is **transcoding**, **audio analysis**, or **video editing**, CLI tools exist but the LLM cannot "see" the output. For these tasks, typed functions that call domain-specific APIs (FFmpeg wrappers, audio analysis libraries) outperform raw CLI.

---

## 8. Business Application

### AI Agent Production Readiness Audit

Help companies assess whether their AI agent infrastructure is production-ready.

**Audit Scope ($500–$2,000):**

| Area | Checks |
|------|--------|
| Binary handling | Does the agent crash on binary output? |
| stderr visibility | Are errors opaque or diagnostic? |
| Output truncation | Does large output cause context overflow? |
| Error messages | Are they actionable? |
| Progressive disclosure | Is help available without overwhelming? |

**Deliverable:** Written report with findings, severity ratings, and recommendations.

**Implementation ($2,000–$5,000):**

- Implement the two-layer architecture
- Deploy binary guard, overflow truncation, stderr attachment
- Tune thresholds for the client's workload
- Train team on progressive disclosure patterns

**Pitch:**

> "Your agent works in demos. Does it work at 3am with a 500MB log file and a cryptic 'command failed' error? I audit the gap between 'it works' and 'it's production-ready' — and close it."

---

## Reference: Complete Two-Layer Execution Flow

```
1. Agent decides: run("grep -r 'ERROR' /var/log/app/*.log | tail -50")
2. Layer 1 exec:  stdout, stderr, exit_code = exec.run("grep ...")
3. Layer 2 processing:
   a. Binary guard  → if binary: replace with guidance
   b. Overflow mode → if >200 lines: truncate + temp file
   c. stderr attach → if exit != 0: include stderr
   d. metadata footer → attach [exit:N | duration]
4. Processed output → Agent
5. Agent interprets and decides next action
```

---

## See Also

- `scripts/binary_guard.py` — binary detection implementation
- `scripts/truncator.py` — overflow truncation implementation
- `scripts/stderr_capture.py` — stderr formatting on failure

Related Skills

Agent Memory Architecture

3891

from openclaw/skills

Complete zero-dependency memory system for AI agents — file-based architecture, daily notes, long-term curation, context management, heartbeat integration, and memory hygiene. No APIs, no databases, no external tools. Works with any agent framework.

architecture-paradigm-event-driven

3891

from openclaw/skills

Apply event-driven async messaging to decouple producers and consumers. Use for real-time processing

architecture-paradigm-cqrs-es

3891

from openclaw/skills

Apply CQRS and Event Sourcing for read/write separation and audit trails

architecture-paradigm-client-server

3891

from openclaw/skills

Client-server architecture for web/mobile apps with centralized services and API design

agent-architecture-patterns

3891

from openclaw/skills

AI Agent architecture patterns library with 10 patterns for single and multi-agent systems

langgraph-architecture

3891

from openclaw/skills

Guides architectural decisions for LangGraph applications. Use when deciding between LangGraph vs alternatives, choosing state management strategies, designing multi-agent systems, or selecting persistence and streaming approaches.

deepagents-architecture

3891

from openclaw/skills

Guides architectural decisions for Deep Agents applications. Use when deciding between Deep Agents vs alternatives, choosing backend strategies, designing subagent systems, or selecting middleware approaches.

agent-architecture-analysis

3891

from openclaw/skills

Perform 12-Factor Agents compliance analysis on any codebase. Use when evaluating agent architecture, reviewing LLM-powered systems, or auditing agentic applications against the 12-Factor methodology.

site-architecture

3891

from openclaw/skills

When the user wants to audit, redesign, or plan their website's structure, URL hierarchy, navigation design, or internal linking strategy. Use when the user mentions 'site architecture,' 'URL structure,' 'internal links,' 'site navigation,' 'breadcrumbs,' 'topic clusters,' 'hub pages,' 'orphan pages,' 'silo structure,' 'information architecture,' or 'website reorganization.' Also use when someone has SEO problems and the root cause is structural (not content or schema). NOT for content strategy decisions about what to write (use content-strategy) or for schema markup (use schema-markup).

agent-architecture-evaluator

3891

from openclaw/skills

Use when evaluating, testing, and optimizing an agent architecture or multi-agent system. Best for reviewing planning, routing, memory, tool use, reliability, observability, cost, and system-level failure modes.

architecture-governance-assessment

3891

from openclaw/skills

Architecture governance and assessment tool. Evaluate cloud architectures against best practices and generate actionable improvement reports.

react-flow-architecture

3880

from openclaw/skills

Architectural guidance for building node-based UIs with React Flow. Use when designing flow-based applications, making decisions about state management, integration patterns, or evaluating whether React Flow fits a use case.