assessing-external-test-risk

Assesses whether branch or PR changes are high-risk for externally hosted or embedded Streamlit usage and recommends whether external e2e coverage with `@pytest.mark.external_test` is needed. Use during code review, PR triage, or test planning when changes touch routing, auth, websocket/session behavior, embedding, assets, cross-origin behavior, SiS/Snowflake runtime, storage, or security headers.

44,152 stars
Complexity: easy

About this skill

This AI agent skill is designed to assist maintainers and contributors of the Streamlit framework in evaluating the potential risk of new code changes or pull requests on externally hosted or embedded Streamlit applications. It systematically analyzes whether a given change might affect crucial functionalities such as routing, authentication, websocket/session behavior, embedding mechanisms, asset loading, cross-origin interactions, security headers, or specific behaviors within environments like Streamlit-in-Snowflake (SiS) or Snowflake runtime. Based on this assessment, the skill recommends if additional end-to-end (e2e) testing, specifically using the `@pytest.mark.external_test` decorator, is necessary. This helps ensure the stability and security of external Streamlit deployments, which often operate under specific conditions involving proxies, iframes, Content Security Policy (CSP) constraints, and other browser security boundaries. It serves as a crucial tool during code review, PR triage, and test planning phases.

Best use case

To determine if a proposed code change in the Streamlit framework introduces a high risk to externally hosted or embedded Streamlit applications, and consequently, if additional external end-to-end test coverage is required to prevent regressions and security vulnerabilities.

Assesses whether branch or PR changes are high-risk for externally hosted or embedded Streamlit usage and recommends whether external e2e coverage with `@pytest.mark.external_test` is needed. Use during code review, PR triage, or test planning when changes touch routing, auth, websocket/session behavior, embedding, assets, cross-origin behavior, SiS/Snowflake runtime, storage, or security headers.

A clear, justified recommendation from the AI agent on whether the current branch or pull request requires external end-to-end (e2e) test coverage using `@pytest.mark.external_test`, based on an analysis of potential impacts on externally hosted or embedded Streamlit usage.

Practical example

Example input

Assess the external test risk for the current branch (feature/websocket-auth-refactor). Do these changes require `@pytest.mark.external_test` coverage?

Example output

**Assessment for branch `feature/websocket-auth-refactor`:**

**Risk Areas Identified:**
*   **Auth:** Direct modifications to authentication mechanisms.
*   **Websocket/Session behavior:** Changes impacting how client sessions are managed over websockets.

**Recommendation:**
**YES**, this branch **requires** `@pytest.mark.external_test` coverage.

**Justification:**
The proposed changes directly impact core authentication and websocket session handling. These are critical components for secure and stable external Streamlit deployments, especially in scenarios involving proxies or custom authentication flows. Ensuring `@pytest.mark.external_test` coverage will validate these interactions under conditions relevant to externally hosted applications, mitigating risks of regressions or security vulnerabilities in real-world environments.

When to use this skill

  • Use this skill during code review, pull request (PR) triage, or test planning when changes touch core areas like routing, authentication, websocket or session behavior, embedding logic, asset management, cross-origin behavior, Streamlit-in-Snowflake (SiS)/Snowflake runtime specifics, storage interactions, or security headers. It's particularly useful when protecting deployments that involve proxies, embedded iframe contexts, CSP constraints, and other browser security boundaries.

When not to use this skill

  • Do not use this skill for assessing general code quality, performance issues unrelated to external integration, or changes that are purely internal to Streamlit's UI components or backend logic that have no bearing on how Streamlit interacts with its hosting environment or other origins. It is not intended for generating tests, but rather for assessing the *need* for specific external tests.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/assessing-external-test-risk/SKILL.md --create-dirs "https://raw.githubusercontent.com/streamlit/streamlit/main/.claude/skills/assessing-external-test-risk/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/assessing-external-test-risk/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How assessing-external-test-risk Compares

Feature / Agentassessing-external-test-riskStandard Approach
Platform SupportClaudeLimited / Varies
Context Awareness High Baseline
Installation ComplexityeasyN/A

Frequently Asked Questions

What does this skill do?

Assesses whether branch or PR changes are high-risk for externally hosted or embedded Streamlit usage and recommends whether external e2e coverage with `@pytest.mark.external_test` is needed. Use during code review, PR triage, or test planning when changes touch routing, auth, websocket/session behavior, embedding, assets, cross-origin behavior, SiS/Snowflake runtime, storage, or security headers.

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Assessing external test risk

Use this skill to decide whether a branch or PR should include external e2e coverage using `@pytest.mark.external_test`.

This helps protect deployments that commonly involve proxies, embedded iframe contexts, CSP constraints, and other browser security boundaries.

This skill is for **risk assessment and recommendation**. It does not auto-mark tests unless explicitly requested.

## Decision rule

Use an **any-hit** policy:

- If any checklist category is hit, output **Recommend external_test: Yes**
- If no categories are hit, output **Recommend external_test: No**

## Inputs to review

- Branch or PR diff against its base branch
- Changed files and related tests
- PR description (if available)

## Assessment workflow

1. Gather the changed files and full diff against the base branch.
2. Evaluate each checklist category below as hit or not hit.
3. Record concrete evidence from file paths and diff snippets.
4. Produce a recommendation and specific external-test focus areas.

## Checklist categories

Evaluate all categories. A single hit is enough to recommend external coverage.

1. **Routing and URL behavior**
   - Hit when changes introduce or modify Starlette routes, `server.baseUrlPath`, catch-alls, request methods, URL resolution, redirects, or status codes.

2. **Auth, cookies, CSRF, and identity binding**
   - Hit when changes touch login/logout or OAuth flows, `_streamlit_user`, `_streamlit_xsrf`, CSRF/XSRF handling, `server.trustedUserHeaders`, or session-to-identity binding.

3. **Websocket handshake and session transport**
   - Hit when changes affect websocket handshake or subprotocols, session affinity, reconnect behavior, ping or timeout behavior, message size limits, or fragmentation.

4. **Embedding and iframe boundary**
   - Hit when changes modify host-to-guest communication (`postMessage`), iframe sizing or resize behavior, iframe sandbox or allow attributes, or permissions policy behavior in embedded contexts.

5. **Static and component asset serving**
   - Hit when changes alter asset handlers, cache headers, size limits, base paths (including `server.customComponentBaseUrlPath`), or proxying rules for static/component assets.

6. **Service worker, uploads, and downloads**
   - Hit when changes modify service worker registration, scope, or caching strategy; upload/download endpoints; JWT or CSRF wrapping; or download attribute behavior.

7. **Cross-origin behavior and external networking**
   - Hit when changes alter CORS allowlists, `crossOrigin` usage, external-origin fetches or external networks behavior, or backend URL discovery via `window.__streamlit.*`.

8. **Cross-origin theming and resource discovery**
   - Hit when changes introduce or modify theme/resource loading across origins (fonts, images, theme globals), CSS isolation with host pages, or manifest/asset discovery when HTML is not served by Starlette.

9. **SiS and Snowflake runtime dependencies**
   - Hit when changes rely on or modify SiS/Snowflake runtime behavior, including `running_in_sis()`, `get_active_session()`, Snowflake connection/session semantics, or SiS-specific environment flags.

10. **Client storage behavior**
    - Hit when changes introduce or modify cookies, `localStorage`, or `sessionStorage` usage that may differ in embedded or third-party contexts.

11. **Security headers and browser policies**
    - Hit when changes adjust CSP, Referrer-Policy, Permissions-Policy, or related headers that can impact embedding or resource loading.

## Output format

Use this exact structure:

```markdown
## External test recommendation

- Recommend external_test: [Yes/No]
- Triggered categories: [List category numbers and names, or "None"]
- Evidence:
  - `<path>`: [short reason from diff]
  - `<path>`: [short reason from diff]
- Suggested external_test focus areas:
  - [Concrete scenario to validate externally]
  - [Concrete scenario to validate externally]
- Confidence: [High/Medium/Low]
- Assumptions and gaps: [Unknowns, missing context, or why confidence is reduced]
```

## Interpretation guidance

- Prefer evidence over intuition. Tie each hit to concrete diff details.
- When in doubt, err toward **Yes** if externally hosted or embedded behavior could diverge from local runs.
- Keep focus areas specific and testable (route, auth handshake, iframe boundary, asset loading, SiS runtime behavior).

## Examples

### Example yes recommendation

Diff includes:

- `lib/streamlit/web/server/starlette/starlette_routes.py` route changes
- Cookie/XSRF handling updates in request auth middleware
- Frontend embed code changing iframe `allow` attributes

Expected output:

- `Recommend external_test: Yes`
- Triggered categories include routing, auth/cookies/CSRF, and embedding boundary
- Focus areas include external host iframe embedding + auth/session continuity checks

### Example no recommendation

Diff includes:

- Pure refactor in internal utility functions with no network, auth, embedding, storage, or runtime integration impact
- Docs and test name cleanup only

Expected output:

- `Recommend external_test: No`
- Triggered categories: `None`
- Confidence is high if no indirect integration points are touched

Related Skills

fixing-flaky-e2e-tests

44152
from streamlit/streamlit

Diagnose and fix flaky Playwright e2e tests. Use when tests fail intermittently, show timeout errors, have snapshot mismatches, or exhibit browser-specific failures.

Developer ToolsClaude

fixing-streamlit-ci

44152
from streamlit/streamlit

Analyze and fix failed GitHub Actions CI jobs for the current branch/PR. Use when CI checks fail, PR checks show failures, or you need to diagnose lint/type/test errors and verify fixes locally.

Developer ToolsClaude

finalizing-pr

44152
from streamlit/streamlit

Finalizes branch changes for merging by simplifying code, running checks, reviewing changes, and creating a PR if needed. Use when ready to merge changes into the target branch.

Developer ToolsClaude

discovering-make-commands

44152
from streamlit/streamlit

Lists available make commands for Streamlit development. Use for build, test, lint, or format tasks.

Developer ToolsClaude

debugging-streamlit

44152
from streamlit/streamlit

Debug Streamlit frontend and backend changes using make debug with hot-reload. Use when testing code changes, investigating bugs, checking UI behavior, or needing screenshots of the running app.

Developer ToolsClaude

creating-pull-requests

44152
from streamlit/streamlit

Creates a draft pull request on GitHub with proper labels, branch naming, and description formatting. Use when changes are ready to be submitted as a PR to the streamlit/streamlit repository.

Developer ToolsClaude

checking-changes

44152
from streamlit/streamlit

Validates all code changes before committing by running format, lint, type, and unit test checks. Use after making backend (Python) or frontend (TypeScript) changes, before committing or finishing a work session.

Developer ToolsClaude

addressing-pr-review-comments

44152
from streamlit/streamlit

Address all valid review comments on a PR for the current branch in the streamlit/streamlit repo. Covers both inline review comments and general PR (issue) comments. Use when a PR has reviewer feedback to address, including code changes, style fixes, and documentation updates.

Developer ToolsClaude

bats-testing-patterns

31392
from sickn33/antigravity-awesome-skills

Master Bash Automated Testing System (Bats) for comprehensive shell script testing. Use when writing tests for shell scripts, CI/CD pipelines, or requiring test-driven development of shell utilities.

Developer ToolsClaude

ui-demo

144923
from affaan-m/everything-claude-code

Record polished UI demo videos using Playwright. Use when the user asks to create a demo, walkthrough, screen recording, or tutorial video of a web application. Produces WebM videos with visible cursor, natural pacing, and professional feel.

Developer ToolsClaude

microsoft-docs

56166
from microsoft/ai-agents-for-beginners

Query official Microsoft documentation to find concepts, tutorials, and code examples across Azure, .NET, Agent Framework, Aspire, VS Code, GitHub, and more. Uses Microsoft Learn MCP as the default, with Context7 and Aspire MCP for content that lives outside learn.microsoft.com.

Developer ToolsChatGPTClaudeGitHub Copilot

jupyter-notebook

56166
from microsoft/ai-agents-for-beginners

Use when the user asks to create, scaffold, or edit Jupyter notebooks (`.ipynb`) for experiments, explorations, or tutorials; prefer the bundled templates and run the helper script `new_notebook.py` to generate a clean starting notebook.

Developer ToolsChatGPTClaudeGitHub Copilot