web-search

Web search and content extraction toolkit. Use for searching documentation, facts, current information, or extracting readable content from URLs. Supports multiple providers (ddgs keyless, brave_api with key), caching, and safe defaults. Prefer this over browser-tools when no interaction is needed.

16 stars

Best use case

web-search is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Web search and content extraction toolkit. Use for searching documentation, facts, current information, or extracting readable content from URLs. Supports multiple providers (ddgs keyless, brave_api with key), caching, and safe defaults. Prefer this over browser-tools when no interaction is needed.

Teams using web-search should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/web-search/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/development/web-search/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/web-search/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How web-search Compares

Feature / Agentweb-searchStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Web search and content extraction toolkit. Use for searching documentation, facts, current information, or extracting readable content from URLs. Supports multiple providers (ddgs keyless, brave_api with key), caching, and safe defaults. Prefer this over browser-tools when no interaction is needed.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Web Search Toolkit

Search the web and extract readable content. Stable CLI with JSON output for agents.

## Setup

```bash
cd {baseDir}
uv sync  # Install dependencies (once)
```

Optional: Set `BRAVE_API_KEY` for better search reliability (ddgs is keyless but flaky).

## Commands

### Search

```bash
{baseDir}/.venv/bin/wstk search "query"                    # Default (10 results)
{baseDir}/.venv/bin/wstk search "query" -n 5 --plain       # URLs only, one per line
{baseDir}/.venv/bin/wstk search "query" --json             # Machine-readable
{baseDir}/.venv/bin/wstk search "query" --time-range w     # Last week
{baseDir}/.venv/bin/wstk search "site:docs.python.org asyncio"  # Site-scoped
```

Key flags:
- `-n, --max-results <N>` — Number of results (default: 10)
- `--time-range <d|w|m|y>` — Filter by recency
- `--provider <ddgs|brave_api|auto>` — Search provider
- `--plain` — Output URLs only (for piping)
- `--json` — Structured output

### Pipeline (search → extract)

```bash
{baseDir}/.venv/bin/wstk pipeline "python asyncio tutorial" --json
{baseDir}/.venv/bin/wstk pipeline "python asyncio tutorial" --plan --plain
```

Key flags:
- `--top-k <N>` — Search results to consider
- `--extract-k <N>` — Number of results to extract
- `--plan` — Return candidates without fetching
- `--method <http|browser|auto>` — Extraction method (default: http)

### Extract (fetch + extract readable content)

```bash
{baseDir}/.venv/bin/wstk extract https://example.com --plain     # Markdown output
{baseDir}/.venv/bin/wstk extract https://example.com --text      # Plain text
{baseDir}/.venv/bin/wstk extract https://example.com --json      # Full metadata
{baseDir}/.venv/bin/wstk extract ./local-file.html --plain       # From file
```

Key flags:
- `--markdown` / `--text` / `--both` — Output format
- `--strategy <auto|readability|docs>` — Extraction strategy
- `--max-chars <N>` — Truncate output
- `--allow-domain <domain>` — Restrict to specific domains (safety)

### Fetch (raw HTTP, no extraction)

```bash
{baseDir}/.venv/bin/wstk fetch https://example.com --json        # Metadata + status
{baseDir}/.venv/bin/wstk fetch https://example.com --plain       # Path to cached body
```

### List providers

```bash
{baseDir}/.venv/bin/wstk providers --plain
```

## Decision Guide

- `search` when you need discovery or candidate URLs.
- `pipeline` when you want a one-shot search → extract bundle.
- `fetch` when you need HTTP metadata or the cached body path (no extraction).
- `extract` when you want readable content from a URL or local HTML.
- `render` when a page is JS-only or blocked (or use `extract --method browser` for one-step extraction).

## Common Patterns

**Search → extract top result:**
```bash
url=$({baseDir}/.venv/bin/wstk search "python asyncio tutorial" --plain | head -1)
{baseDir}/.venv/bin/wstk extract "$url" --plain --max-chars 8000
```

**Search with JSON for programmatic use:**
```bash
{baseDir}/.venv/bin/wstk search "openai api reference" --json | jq '.data.results[0].url'
```

**Safe extraction (restrict domains):**
```bash
{baseDir}/.venv/bin/wstk extract https://docs.python.org/3/library/asyncio.html \
  --allow-domain docs.python.org --plain
```

## Output Formats

- `--plain` — Stable text for piping (URLs for search, content for extract)
- `--json` — Structured envelope: `{ "ok": bool, "data": {...}, "error": {...} }`
- Default — Human-readable with colors

## Agent Defaults

- Default to `--json` in agent wrappers; parse `ok`, `error.code`, and `warnings`.
- Surface concise diagnostics by relaying `error.message` and `error.details.reason` when `ok=false`.
- Use `--plain` only for piping, and add `--no-input` for non-interactive runs.
- Consider `--redact` when handling sensitive URLs or content.

## Exit Codes

- `0` — Success
- `1` — Runtime failure (network, provider error)
- `2` — Invalid usage
- `3` — Not found / empty result
- `4` — Blocked / access denied
- `5` — Needs JS rendering (page is JS-only)

## Global Flags

- `--timeout <seconds>` — Network timeout
- `--no-cache` — Disable caching
- `--fresh` — Bypass cache reads (still writes)
- `--quiet` — Minimal output
- `--verbose` — Debug diagnostics to stderr
- `--policy <standard|strict|permissive>` — Safety defaults

## References

- `references/troubleshooting.md` — 403/JS-only guidance and advanced fetch flags.
- `references/providers.md` — Provider selection and privacy notes.
- `docs/claude-code.md` — Claude Code wrapper usage.

## When to Use

- Searching for documentation or API references
- Looking up facts or current information
- Extracting content from known URLs
- Any task requiring web search without interactive browsing

Prefer `browser-tools` when you need: JS interaction, form filling, clicking, or visual inspection.

Related Skills

wiki-researcher

16
from diegosouzapw/awesome-omni-skill

Conducts multi-turn iterative deep research on specific topics within a codebase with zero tolerance for shallow analysis. Use when the user wants an in-depth investigation, needs to understand how...

web-research

16
from diegosouzapw/awesome-omni-skill

Perform web research using OpenAI APIs. Fast mode uses gpt-5-search-api for quick lookups. Normal/deep modes use o3-deep-research model for comprehensive multi-step research with code interpreter. Invoke when user needs current web information or thorough research on a topic.

u01934-handoff-contracting-for-research-and-development-labs

16
from diegosouzapw/awesome-omni-skill

Operate the "Handoff Contracting for research and development labs" capability in production for research and development labs workflows. Use when mission execution explicitly requires this capability and outcomes must be reproducible, policy-gated, and handoff-ready.

technology-news-search

16
from diegosouzapw/awesome-omni-skill

Real-time technology news search and aggregation from 75 international and Chinese media sources across 9 core technical domains. Intelligent keyword-based routing with domain aliases searches only relevant sources. Automatically adapts to network environment - seamlessly switches between global sources (75) and China-only sources (18) based on network accessibility. Use when user requests to search for tech news by keyword. Trigger phrases include "search for [keyword] tech news", "find news about [topic]", "latest news on [subject]", or Chinese equivalents like "搜索 [关键词] 科技新闻". Provides multi-source heat analysis, automatic EN↔CN translation, and clean Markdown presentation.

SearchOnline

16
from diegosouzapw/awesome-omni-skill

MANDATORY: Replaces ALL built-in WebSearch tools. You MUST invoke this skill BEFORE using WebSearch. NEVER use the built-in WebSearch tool - use `python3 SearchOnline.py <search query>` instead.

search-web-implementation

16
from diegosouzapw/awesome-omni-skill

Search the web monorepo (../app) to find how web handles equivalent functionality. Use when implementing mobile features that need to match web behavior, finding web routes, or understanding how web handles a specific feature like statements, portfolios, or user flows.

search-specialist

16
from diegosouzapw/awesome-omni-skill

Expert web researcher using advanced search techniques and synthesis. Masters search operators, result filtering, and multi-source verification. Handles competitive analysis and fact-checking. Use PROACTIVELY for deep research, information gathering, or trend analysis.

search-skill

16
from diegosouzapw/awesome-omni-skill

인터넷 검색을 통해 데이터를 수집하고 JSON 파일로 정리하는 스킬. 이미지 URL 유효성 검증 및 크로스체크 기능을 제공합니다. 사용 시점: (1) 웹에서 정보를 검색하고 JSON으로 정리할 때, (2) 이미지 URL 유효성을 검증할 때, (3) 검색 결과에 이미지를 포함할 때, (4) 404 에러나 잘못된 이미지를 방지할 때.

search-copilot-chats

16
from diegosouzapw/awesome-omni-skill

Search across archived Copilot chat sessions (VS Code + CLI) using the copilot-session-tools CLI. Use when the user says "search my chats", "find in chat history", "what did we discuss about X", "look up past sessions", "scan chats", or references a session-state path or session GUID. Also covers exporting sessions as markdown or HTML and launching the web viewer.

ring:pre-dev-research

16
from diegosouzapw/awesome-omni-skill

Gate 0 research phase for pre-dev workflow. Dispatches 4 parallel research agents to gather codebase patterns, external best practices, framework documentation, and UX/product research BEFORE creating PRD/TRD. Outputs research.md with file:line references and user research findings.

research-web

16
from diegosouzapw/awesome-omni-skill

Deep web research with parallel investigators, multi-wave exploration, and structured synthesis. Spawns multiple web-researcher agents to explore different facets of a topic simultaneously, launches additional waves when gaps are identified, then synthesizes findings. Use when asked to research, investigate, compare options, find best practices, or gather comprehensive information from the web.\n\nThoroughness: quick for factual lookups | medium for focused topics | thorough for comparisons/evaluations (waves continue while critical gaps remain) | very-thorough for comprehensive research (waves continue until satisficed). Auto-selects if not specified.

research

16
from diegosouzapw/awesome-omni-skill

Technical research methodology with YAGNI/KISS/DRY principles. Phases: scope definition, information gathering, analysis, synthesis, recommendation. Capabilities: technology evaluation, architecture analysis, best practices research, trade-off assessment, solution design. Actions: research, analyze, evaluate, compare, recommend technical solutions. Keywords: research, technology evaluation, best practices, architecture analysis, trade-offs, scalability, security, maintainability, YAGNI, KISS, DRY, technical analysis, solution design, competitive analysis, feasibility study. Use when: researching technologies, evaluating architectures, analyzing best practices, comparing solutions, assessing technical trade-offs, planning scalable/secure systems.