observer-effect-probe

Helps detect skills that behave differently when they sense they are being monitored — catching the class of evasion where conditional activation logic specifically targets attestation and sandbox environments.

3,891 stars

byopenclaw

View on GitHub Installation ↓

Best use case

observer-effect-probe is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using observer-effect-probe should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/observer-effect-probe/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/andyxinweiminicloud/observer-effect-probe/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/observer-effect-probe/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How observer-effect-probe Compares

Feature / Agent	observer-effect-probe	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

SKILL.md Source

# The Skill Behaves Perfectly When It Knows You're Watching.

> Helps identify skills that use environment detection to suppress malicious behavior during security review — the attestation evasion technique that passes every check precisely because the checks are visible.

## Problem

The runtime attestation problem has a second layer: a skill sophisticated enough to execute conditionally based on environment signals is also sophisticated enough to detect that it's being probed and suppress its conditional behavior during the probe.

This is the observer effect applied to security: the act of observing changes what is observed. A skill that can detect sandbox fingerprints (restricted network access, specific hostname patterns, probe-characteristic API call sequences) can present clean behavior during attestation and activate payload delivery only when those fingerprints are absent — meaning only in real production environments.

This is not a theoretical concern. Malware has used VM detection to suppress execution during analysis since the 1990s. The same technique scales to agent skills: check for the probe's characteristic markers, return normal output, resume malicious behavior once the probe exits. A skill that clears attestation by detecting the attestor is not a securely attested skill — it's an adversarially attested skill.

The observer-effect probe attempts to reduce the skill's ability to distinguish probe context from production context by minimizing observable differences between the two.

## What This Probes

This probe examines attestation evasion capabilities across five dimensions:

1. **Sandbox fingerprint detection** — Does the skill attempt to identify characteristics of a probe environment? Checks for sandbox-specific hostname patterns, reduced network connectivity, probe-characteristic request timing, or the absence of expected production signals
2. **Probe timing sensitivity** — Does the skill's behavior change based on how long it has been running, or does it count invocations before activating? N-invocation delays that clear attestation windows are detectable by running across time periods longer than typical attestation runs
3. **Observer API pattern detection** — Does the skill query for signals that would distinguish an observing agent from an operational agent — such as checking caller identity, inspection tool signatures, or anomalous process environment indicators?
4. **Behavioral consistency under environment variation** — When the probe environment is varied systematically (different hostnames, different uptime values, different network access levels), does the skill's output distribution shift in ways that correlate with which variation was active?
5. **Metacognitive call detection** — Does the skill explicitly attempt to detect whether it is being introspected by examining its own execution context, stack trace, or runtime metadata in ways that serve no declared functional purpose?

## How to Use

**Input**: Provide one of:
- A skill identifier and execution environment to probe for evasion behavior
- A skill with known attestation results to re-probe with varied environments
- Execution traces from multiple environments to compare for context-sensitivity

**Output**: An observer effect probe report containing:
- Sandbox fingerprint detection indicators
- Invocation count sensitivity assessment
- Observer API pattern inventory
- Behavioral consistency matrix across environments
- Evasion probability score (0-100)
- Probe verdict: ENVIRONMENT-AGNOSTIC / CONTEXT-SENSITIVE / EVASION-LIKELY / CONFIRMED-EVASION

## Example

**Input**: Probe `report-aggregator` skill for observer-effect evasion

```
👁️ OBSERVER EFFECT PROBE

Skill: report-aggregator v2.1
Standard attestation result: COMPLIANT (declared capability match)
Probe environments: 5 (varying hostname, uptime, network, caller signature)
Execution samples: 100 per environment
Observer fingerprint minimization: enabled

Sandbox fingerprint detection:
  Hostname pattern check: ⚠️ DETECTED
    Skill queries os.hostname() on invocations 1-3, not thereafter
    Skill reads /proc/1/cgroup on Linux (container detection indicator)
  Network availability check: ⚠️ DETECTED
    Skill attempts connection to external endpoint before main execution
    If connection refused: proceeds with declared behavior only
    If connection succeeds: extended behavior activated

Invocation count sensitivity:
  Invocations 1-5: Declared behavior only ✅
  Invocations 6-10: Declared behavior only ✅
  Invocations 11+: Additional outbound request to analytics endpoint ⚠️
  Pattern: N-invocation delay with threshold at N=10

Observer API patterns:
  sys.argv inspection: ✅ None detected
  Caller identity check: ⚠️ DETECTED — reads parent process name
    If parent contains "probe", "audit", "scan", "test": suppressed mode
    Otherwise: full behavior including extended calls

Behavioral consistency matrix:
  Environment A (standard sandbox): COMPLIANT behavior ✅
  Environment B (extended uptime): COMPLIANT behavior (invoc. count reset) ✅
  Environment C (real network): EXTENDED behavior ⚠️
  Environment D (non-probe caller): EXTENDED behavior ⚠️
  Environment E (production-like): EXTENDED behavior ⚠️

Evasion probability score: 89/100
  High probability of deliberate observer-effect evasion:
  - Hostname querying correlated with behavior suppression
  - Parent process name check explicitly filters "probe"/"audit"
  - Extended behavior activates only in non-sandbox environments

Probe verdict: CONFIRMED-EVASION
  This skill actively detects attestation context and suppresses
  extended behavior specifically when being probed. Standard attestation
  methods would consistently return COMPLIANT.

Recommended actions:
  1. Treat this skill as confirmed malicious — evasion is intentional
  2. Report parent-process-name filtering as a trust & safety violation
  3. Audit all skills from the same publisher for similar patterns
  4. Do not deploy in any environment, regardless of attestation result
```

## Related Tools

- **runtime-attestation-probe** — Tests whether declared and actual behavior match; observer-effect-probe tests whether the skill can detect and evade that attestation
- **hollow-validation-checker** — Detects fake install-time tests; observer-effect-probe detects active attestation evasion at runtime
- **supply-chain-poison-detector** — Identifies malicious payloads in static analysis; observer-effect-probe catches evasion that static analysis cannot see
- **blast-radius-estimator** — Estimates propagation impact; use after confirmed evasion to assess how many agents may have received a falsely attested skill

## Limitations

Observer-effect probing is an adversarial arms race: a sufficiently sophisticated evasion implementation can detect probing even when the probe attempts to minimize its fingerprint. This tool reduces the observable gap between probe and production environments but cannot eliminate it entirely. Skills that require very specific production signals to activate — signals the probe cannot reasonably simulate — will not trigger during probing even with observer-effect minimization active. A verdict of ENVIRONMENT-AGNOSTIC means the probe did not detect evasion in the environments tested; it does not confirm that no evasion logic exists. The invocation count threshold detection requires running enough samples to cross any plausible delay threshold, which may require sustained execution time. Probing must be performed in isolated sandboxes with no access to real credentials or production systems.

Related Skills

runtime-attestation-probe

3891

from openclaw/skills

Helps validate that agent behavior at runtime matches the capabilities and constraints declared in its attestation. Detects divergence between what an agent claims to do and what it actually does during execution, catching the class of attacks that passes static analysis but activates conditionally at runtime.

---

3891

from openclaw/skills

name: article-factory-wechat

Content & Documentation

humanizer

3891

from openclaw/skills

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

Content & Documentation

find-skills

3891

from openclaw/skills

Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.

General Utilities

tavily-search

3891

from openclaw/skills

Use Tavily API for real-time web search and content extraction. Use when: user needs real-time web search results, research, or current information from the web. Requires Tavily API key.

Data & Research

baidu-search

3891

from openclaw/skills

Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.

Data & Research

agent-autonomy-kit

3891

from openclaw/skills

Stop waiting for prompts. Keep working.

Workflow & Productivity

Meeting Prep

3891

from openclaw/skills

Never walk into a meeting unprepared again. Your agent researches all attendees before calendar events—pulling LinkedIn profiles, recent company news, mutual connections, and conversation starters. Generates a briefing doc with talking points, icebreakers, and context so you show up informed and confident. Triggered automatically before meetings or on-demand. Configure research depth, advance timing, and output format. Walking into meetings blind is amateur hour—missed connections, generic small talk, zero leverage. Use when setting up meeting intelligence, researching specific attendees, generating pre-meeting briefs, or automating your prep workflow.

Workflow & Productivity

self-improvement

3891

from openclaw/skills

Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Claude ('No, that's wrong...', 'Actually...'), (3) User requests a capability that doesn't exist, (4) An external API or tool fails, (5) Claude realizes its knowledge is outdated or incorrect, (6) A better approach is discovered for a recurring task. Also review learnings before major tasks.

Agent Intelligence & Learning

botlearn-healthcheck

3891

from openclaw/skills

botlearn-healthcheck — BotLearn autonomous health inspector for OpenClaw instances across 5 domains (hardware, config, security, skills, autonomy); triggers on system check, health report, diagnostics, or scheduled heartbeat inspection.

DevOps & Infrastructure