Validating Inputs

Check all external inputs for validity - garbage in, nothing out, never garbage out

40 stars

byobra

View on GitHub Installation ↓

Best use case

Validating Inputs is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Check all external inputs for validity - garbage in, nothing out, never garbage out

Teams using Validating Inputs should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/validating-inputs/SKILL.md --create-dirs "https://raw.githubusercontent.com/obra/clank/main/skills/coding/validating-inputs/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/validating-inputs/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How Validating Inputs Compares

Feature / Agent	Validating Inputs	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Check all external inputs for validity - garbage in, nothing out, never garbage out

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Validating Inputs

## Overview

Professional-grade software never outputs garbage regardless of what it receives. "Garbage in, garbage out" is the mark of sloppy, insecure code.

**Core principle:** Check all data from external sources. Validate all routine parameters from untrusted sources. Decide consciously how to handle invalid data.

**Modern standard:** "Garbage in, nothing out" OR "Garbage in, error message out" OR "No garbage allowed in"

**Violating the letter of this rule is violating the spirit of defensive programming.**

## When to Use

**Always use when writing functions that receive:**
- User input (forms, command-line args, uploaded files)
- External API responses
- Database query results
- File contents
- Network data
- Configuration files
- Any data from outside your direct control

**Warning signs you need this:**
- Function assumes inputs are valid
- No validation beyond empty/null checks
- No assertions documenting assumptions
- Spec mentions constraints but code doesn't check them
- Silent failures or wrong results with bad data
- Security vulnerabilities (injection, overflow, etc.)
- Functions accept any input without question

**Don't skip when:**
- "Inputs will always be valid" (they won't)
- "Validation happens elsewhere" (defense in depth - check anyway)
- "It's just internal code" (today's internal is tomorrow's API)
- Under time pressure (validation prevents longer debugging)

## The Two-Level Defense

### Level 1: Assertions (Should NEVER Happen)

**Use for:** Conditions that indicate bugs in YOUR code

```python
def calculate_velocity(distance: float, time: float) -> float:
    # Preconditions: These should NEVER be violated if caller is correct
    assert distance >= 0, "distance cannot be negative"
    assert time > 0, "time must be positive"

    result = distance / time

    # Postcondition: Result should be reasonable
    assert result >= 0, f"velocity cannot be negative: {result}"

    return result
```

**Assertions are:**
- Executable documentation
- Compiled out in production (typically)
- For catching programmer errors during development
- Should fire = bug in code that needs fixing

### Level 2: Error Handling (MIGHT Happen)

**Use for:** Conditions you expect might occur in production

```python
def calculate_average_score(scores: list[float]) -> float:
    """Calculate average of test scores (must be 0-100)."""

    # Error handling: Validate external data
    if scores is None:
        raise ValueError("scores cannot be None")

    if not scores:
        raise ValueError("Cannot calculate average of empty score list")

    # Validate each score
    for i, score in enumerate(scores):
        if not isinstance(score, (int, float)):
            raise TypeError(f"Score {i} is not a number: {score}")
        if score < 0 or score > 100:
            raise ValueError(f"Score {i} out of range [0-100]: {score}")

    result = sum(scores) / len(scores)

    # Postcondition: Verify result is valid
    assert 0 <= result <= 100, f"Calculated average out of range: {result}"

    return result
```

**Error handling:**
- Stays in production code
- Handles expected anomalies gracefully
- Validates external/untrusted data
- Should trigger = need to handle error, not fix code

## Quick Reference

| Situation | Approach | Example |
|-----------|----------|---------|
| **External data** | Validate everything | Check ranges, types, formats, lengths |
| **Routine parameters** | Check if from untrusted source | Validate or document assumptions |
| **Internal invariants** | Assert they hold | Assert postconditions, state assumptions |
| **Null/None** | Check explicitly | `if value is None: raise ValueError()` |
| **Empty collections** | Decide if valid or error | Empty list error or return default? |
| **Type mismatches** | Check with isinstance | `if not isinstance(score, (int, float))` |
| **Range violations** | Check bounds | `if score < 0 or score > 100` |
| **Invalid formats** | Use regex/validators | Email, phone, URLs |
| **Security risks** | Validate aggressively | SQL injection, buffer overflow, path traversal |

## Validation Checklist

Before implementing any function receiving external data:

**1. Identify all inputs**
- [ ] What data comes from outside my control?
- [ ] Which parameters could be bad?
- [ ] What are the data sources? (user, API, DB, file, network)

**2. Document constraints**
- [ ] What are valid ranges? (0-100, positive only, etc.)
- [ ] What are valid types? (int, float, string)
- [ ] What are valid formats? (email, phone, date)
- [ ] What are valid lengths? (string max, array min/max)
- [ ] Are nulls allowed?
- [ ] Are empties allowed?

**3. Think "what could go wrong?"**
- [ ] Wrong type passed
- [ ] Null/None passed
- [ ] Empty collection passed
- [ ] Negative where positive expected
- [ ] Out of range values
- [ ] Invalid format (malformed email, etc.)
- [ ] Security attacks (injection, overflow)

**4. Implement validation**
- [ ] Check each constraint explicitly
- [ ] Use error handling for expected problems
- [ ] Use assertions for programmer errors
- [ ] Provide clear error messages
- [ ] Document assumptions in assertions

**5. Decide error response**
- [ ] Return neutral value? (0, empty string, None)
- [ ] Raise exception with clear message?
- [ ] Log and continue?
- [ ] Substitute closest valid value?
- [ ] Shut down? (safety-critical)

## Robustness vs Correctness

**Consciously choose based on domain:**

### Correctness (Never Return Wrong Answer)

**Prefer when:**
- Safety-critical (medical, aviation, financial)
- Security-critical
- Data integrity critical
- Wrong result is worse than no result

**Strategy:** Validate aggressively, fail fast with errors

```python
def calculate_radiation_dosage(params):
    # Medical system: wrong dosage could kill patient
    # Better to refuse than to guess
    if not all_params_valid(params):
        raise ValueError("Cannot calculate dosage with invalid parameters")
    # If ANY doubt, raise error
```

### Robustness (Keep Operating)

**Prefer when:**
- Consumer applications
- Non-critical features
- User convenience matters
- Some result better than crash

**Strategy:** Substitute reasonable values, log issues, continue

```python
def get_user_theme_color(color_code):
    # UI preference: wrong color annoying but not critical
    # Better to show default than crash
    if not is_valid_color(color_code):
        logger.warning(f"Invalid color code {color_code}, using default")
        return DEFAULT_COLOR
    return color_code
```

**Make this choice explicit in your design.** Don't just fall into one approach without thinking.

## Common Input Validation Patterns

### Pattern 1: Validate Numeric Ranges

```python
def process_temperature(temp_celsius: float) -> float:
    # Range validation
    if not isinstance(temp_celsius, (int, float)):
        raise TypeError(f"Temperature must be numeric, got {type(temp_celsius)}")

    if temp_celsius < -273.15:  # Absolute zero
        raise ValueError(f"Temperature cannot be below absolute zero: {temp_celsius}")

    if temp_celsius > 1000:  # Sanity check
        raise ValueError(f"Temperature seems unrealistic: {temp_celsius}")

    return temp_celsius + 273.15  # Convert to Kelvin
```

### Pattern 2: Validate String Formats

```python
import re

def send_email(email_address: str) -> None:
    # Format validation
    if not email_address or not isinstance(email_address, str):
        raise ValueError("Email address required")

    email_address = email_address.strip()

    if not re.match(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$', email_address):
        raise ValueError(f"Invalid email format: {email_address}")

    if len(email_address) > 254:  # RFC 5321 limit
        raise ValueError("Email address too long")

    # Proceed with valid email
    ...
```

### Pattern 3: Validate Collections

```python
def process_batch(items: list) -> None:
    # Collection validation
    if items is None:
        raise ValueError("items cannot be None")

    if not isinstance(items, list):
        raise TypeError(f"items must be a list, got {type(items)}")

    if not items:
        raise ValueError("items list cannot be empty")

    if len(items) > 1000:  # Sanity check
        raise ValueError(f"Batch too large: {len(items)} items (max 1000)")

    for i, item in enumerate(items):
        if item is None:
            raise ValueError(f"Item {i} cannot be None")
        # Validate each item...
```

### Pattern 4: Validate Required Fields

```python
def create_user(data: dict) -> None:
    # Required fields validation
    required_fields = ['username', 'email', 'password']

    for field in required_fields:
        if field not in data:
            raise ValueError(f"Missing required field: {field}")

        if not data[field] or not isinstance(data[field], str):
            raise ValueError(f"Field '{field}' must be non-empty string")

        if not data[field].strip():
            raise ValueError(f"Field '{field}' cannot be whitespace only")
```

### Pattern 5: Preconditions and Postconditions

```python
def withdraw_money(account_id: str, amount: float) -> float:
    # Preconditions (assertions for internal invariants)
    assert account_id, "account_id should never be empty"
    assert amount > 0, "amount should be positive (checked by caller)"

    # Validation (error handling for external data)
    balance = get_balance(account_id)

    if balance < amount:
        raise ValueError(f"Insufficient funds: balance {balance}, requested {amount}")

    new_balance = balance - amount

    # Postcondition (assertion for internal invariant)
    assert new_balance >= 0, "Balance should never be negative"
    assert new_balance == balance - amount, "Math error in withdrawal"

    update_balance(account_id, new_balance)
    return new_balance
```

## Security Validation

**Especially check for:**

- **SQL Injection:** Validate/sanitize database inputs, use parameterized queries
- **Command Injection:** Never pass user input directly to system calls
- **Path Traversal:** Validate file paths don't contain `../`
- **Buffer Overflow:** Check string/array lengths against limits
- **Integer Overflow:** Validate arithmetic won't overflow
- **XSS/HTML Injection:** Sanitize user content before display
- **XML/JSON Injection:** Validate structure and content

**Rule:** Be especially paranoid with anything that could attack your system.

## Common Mistakes

**❌ Only checking for null/empty:**
```python
if not scores:
    return 0.0
return sum(scores) / len(scores)  # Doesn't check constraints!
```

**✅ Check ALL constraints:**
```python
if not scores:
    raise ValueError("Cannot calculate average of empty list")
for score in scores:
    if score < 0 or score > 100:
        raise ValueError(f"Score out of range: {score}")
return sum(scores) / len(scores)
```

---

**❌ Assuming types are correct:**
```python
def add(a, b):
    return a + b  # What if a or b are strings? None? Lists?
```

**✅ Validate types:**
```python
def add(a: float, b: float) -> float:
    if not isinstance(a, (int, float)) or not isinstance(b, (int, float)):
        raise TypeError(f"Arguments must be numeric: {type(a)}, {type(b)}")
    return a + b
```

---

**❌ Silent failure or wrong default:**
```python
if not scores:
    return 0.0  # Is 0.0 the right answer for empty? Or should it error?
```

**✅ Explicit decision:**
```python
if not scores:
    raise ValueError("Cannot calculate average of empty list")
    # OR if 0.0 is intentional:
    # return 0.0  # Intentionally return 0 for empty list per business rules
```

---

**❌ No error message context:**
```python
if age < 18:
    raise ValueError("Invalid age")  # Which age? What was the value?
```

**✅ Informative error messages:**
```python
if age < 18:
    raise ValueError(f"Age must be 18+, got {age}")
```

## Red Flags - STOP and Add Validation

**Before implementing:**
- Haven't thought "what could go wrong?"
- No validation code written yet
- Only checking null/empty
- Assuming inputs are valid
- "Validation happens elsewhere" (maybe, but check anyway)

**After implementing:**
- Function accepts any input without checking
- No assertions documenting assumptions
- Spec mentions constraints but code doesn't enforce them
- Could pass wrong type and function wouldn't catch it
- Security review would fail

**All of these mean: Add comprehensive validation now.**

## Common Rationalizations

| Excuse | Reality |
|--------|---------|
| "Inputs will always be valid" | They won't. Users make mistakes, APIs change, bugs happen. |
| "Validation happens elsewhere" | Defense in depth. Check at every layer. |
| "It's just internal code" | Today's internal is tomorrow's API. Validate anyway. |
| "Adds too much code" | 5 lines of validation prevents hours of debugging. |
| "Slows down the code" | Correctness > speed. Optimize later if needed. |
| "Trust the caller" | Trust but verify. Catch bugs at boundaries. |
| "Users know what they're doing" | Users make mistakes. Software should help, not crash. |
| "I'll add validation later" | Later never comes. Add it now. |

## Three Levels of Validation

### Level 1: Type Validation

Check data is the expected type:

```python
if not isinstance(value, expected_type):
    raise TypeError(f"Expected {expected_type}, got {type(value)}")
```

### Level 2: Constraint Validation

Check data meets business rules:

```python
if value < min_value or value > max_value:
    raise ValueError(f"Value {value} out of range [{min_value}, {max_value}]")
```

### Level 3: Format/Semantic Validation

Check data is semantically valid:

```python
if not re.match(email_pattern, email):
    raise ValueError(f"Invalid email format: {email}")
```

**Apply all three levels to external data.**

## Assertions vs Error Handling

### Use Assertions When:

- Documenting internal invariants
- Checking preconditions from trusted callers
- Verifying postconditions you guarantee
- Catching programmer errors (bugs in YOUR code)
- Development/debugging (typically compiled out in production)

```python
def withdraw(self, amount):
    assert self.balance >= 0, "Balance invariant violated"  # Should never happen
    assert amount > 0, "Caller should have checked amount"   # Caller's bug
```

### Use Error Handling When:

- Validating external/untrusted data
- Handling expected anomalies
- User input could be wrong
- API might return bad data
- Production code must handle gracefully

```python
def withdraw(self, amount):
    if amount <= 0:  # User might request $0 or negative
        raise ValueError(f"Withdrawal amount must be positive, got {amount}")

    if amount > self.balance:  # User might request too much
        raise ValueError(f"Insufficient funds: {amount} requested, {self.balance} available")
```

**Rule:** Assertions for bugs, error handling for anomalies.

## Validation Strategy by Source

| Data Source | Trust Level | Validation Approach |
|-------------|-------------|---------------------|
| **User input** | Untrusted | Validate everything aggressively |
| **External API** | Untrusted | Validate responses, handle failures |
| **Database** | Semi-trusted | Check for corruption, missing data |
| **Config file** | Semi-trusted | Validate format and values |
| **Internal parameters** | Trusted | Use assertions to document assumptions |
| **Your own methods** | Trusted | Assertions for preconditions |

## Common Validation Scenarios

### Validating Numeric Input

```python
# Check type, range, special values
if not isinstance(value, (int, float)):
    raise TypeError(f"Expected number, got {type(value)}")

if math.isnan(value) or math.isinf(value):
    raise ValueError(f"Value cannot be NaN or Inf: {value}")

if value < minimum or value > maximum:
    raise ValueError(f"Value {value} out of range [{minimum}, {maximum}]")
```

### Validating String Input

```python
# Check type, emptiness, length, format
if not isinstance(value, str):
    raise TypeError(f"Expected string, got {type(value)}")

value = value.strip()

if not value:
    raise ValueError("Value cannot be empty or whitespace only")

if len(value) > max_length:
    raise ValueError(f"Value too long: {len(value)} chars (max {max_length})")

if not pattern.match(value):
    raise ValueError(f"Value doesn't match required format: {value}")
```

### Validating Collections

```python
# Check type, emptiness, size, element validity
if not isinstance(items, list):
    raise TypeError(f"Expected list, got {type(items)}")

if not items:
    raise ValueError("List cannot be empty")

if len(items) > max_items:
    raise ValueError(f"Too many items: {len(items)} (max {max_items})")

for i, item in enumerate(items):
    if item is None:
        raise ValueError(f"Item {i} cannot be None")
    # Validate each element...
```

## Error Response Strategies

Choose consciously based on domain:

### 1. Return Neutral Value

**When:** Non-critical, user convenience matters

```python
def get_color_preference(color_code):
    if not is_valid_color(color_code):
        return DEFAULT_COLOR  # Neutral, harmless
    return color_code
```

### 2. Substitute Valid Value

**When:** Can safely substitute without data loss

```python
def clamp_temperature(temp):
    # Thermometer calibrated 0-100°C
    if temp < 0:
        return 0  # Closest valid value
    if temp > 100:
        return 100
    return temp
```

### 3. Raise Exception

**When:** Caller must handle the error

```python
def charge_payment(amount):
    if amount <= 0:
        raise ValueError(f"Payment amount must be positive: {amount}")
    # Process payment
```

### 4. Log and Continue

**When:** Error isn't critical, want visibility

```python
def sync_data(data):
    if not is_valid(data):
        logger.warning(f"Invalid data encountered, skipping: {data}")
        return
    # Process valid data
```

### 5. Shut Down

**When:** Safety-critical, wrong result is dangerous

```python
def control_reactor(params):
    if not params_within_safe_limits(params):
        emergency_shutdown()
        raise CriticalError("Unsafe parameters detected, reactor shut down")
```

## Verification Before Shipping

Before marking validation complete:

- [ ] Identified ALL external data sources
- [ ] Validated ALL constraints from spec
- [ ] Used assertions for internal invariants
- [ ] Used error handling for external anomalies
- [ ] Provided clear, informative error messages
- [ ] Consciously chose: robustness vs correctness
- [ ] Tested with invalid inputs (not just valid ones)
- [ ] Security-reviewed for injection/overflow/attacks

## Real-World Impact

From Code Complete and baseline testing:

**Baseline test results:**
- Agent only checked empty list (most basic edge case)
- Ignored spec constraint (scores must be 0-100)
- No type checking, no assertions, no comprehensive validation
- Grade: D- for defensive programming

**With validation:**
- Catches bad data at boundary (not deep in call stack)
- Clear error messages aid debugging
- Assertions catch programmer errors early
- Production code is robust and secure

**Industry impact:**
- Security vulnerabilities often stem from missing input validation
- Defensive programming prevents "impossible" errors
- Validating early is cheaper than debugging later

## Integration with Other Skills

**For multi-layer validation:** See skills/debugging/defense-in-depth for validating at every layer data passes through

**For systematic debugging:** If validation fails in production, see skills/debugging/systematic-debugging for root cause analysis