web-research

Search the web to resolve context gaps — documentation, versions, CVEs, best practices. Auto-starts SearxNG Docker if available, falls back to WebSearch.

Best use case

web-research is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Search the web to resolve context gaps — documentation, versions, CVEs, best practices. Auto-starts SearxNG Docker if available, falls back to WebSearch.

Teams using web-research should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/web-research/SKILL.md --create-dirs "https://raw.githubusercontent.com/gonzalezpazmonica/pm-workspace/main/.claude/skills/web-research/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/web-research/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How web-research Compares

Feature / Agentweb-researchStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Search the web to resolve context gaps — documentation, versions, CVEs, best practices. Auto-starts SearxNG Docker if available, falls back to WebSearch.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Skill: Web Research

> 3-layer search: cache → SearxNG (Docker auto-start) → Claude WebSearch.
> Inspired by [FAIR-Perplexica](https://github.com/UB-Mannheim/FAIR-Perplexica).

## When to use

- User asks about external technology (versions, APIs, configs)
- Gap detected: CVE, deprecation, compatibility question
- Tech-research-agent needs web sources for investigation
- Developer encounters error from external library

## What it produces

1. **Search results** — reranked by relevance, cached locally
2. **Inline citations** — `[web:N]` with source URLs in footer
3. **Follow-up suggestions** — contextual next commands
4. **Gap detection** — automatic suggestion when external gap detected

## Prerequisites

```
1. Python 3.x available                    → always true in pm-workspace
2. Docker (optional) for SearxNG           → graceful fallback if missing
3. Internet connection (optional)           → cache-only mode if offline
```

## Flow

```
User query or gap detected
  → Sanitize (strip PII, projects, emails, IPs)
  → Check cache (TTL by category)
  → If miss: try SearxNG (auto-start Docker)
  → If SearxNG unavailable: use Claude WebSearch
  → Rerank results (keyword + domain authority)
  → Cache results
  → Format with [web:N] citations
  → Show follow-up suggestions
```

## Key modules

| Module | Lines | Purpose |
|--------|-------|---------|
| `cache.py` | 137 | LRU cache, TTL, stats |
| `sanitizer.py` | 107 | PII removal, classification |
| `rerank.py` | 86 | Heuristic scoring |
| `formatter.py` | 88 | Citation formatting |
| `gap_detector.py` | 110 | External vs internal detection |
| `searxng.py` | 149 | Docker auto-start, cross-platform |
| `search.py` | 88 | 3-layer orchestrator |
| `suggestions.py` | 81 | Post-command follow-ups |

## Scrapling enrichment (SE-061)

Para URLs resultantes de SearxNG/WebSearch que requieren extracción de contenido (más allá de snippet), invocar el wrapper adaptativo `scripts/scrapling-fetch.sh`:

```bash
bash scripts/scrapling-fetch.sh "${URL}" --json --timeout 25
```

- Backend `scrapling` si está instalado: bypass Cloudflare/DataDome nativo
- Fallback transparente a `curl` con user-agent `SaviaResearch/1.0`
- Exit 0/1/2, JSON con `status|title|url_final|text|backend`

Usar cuando WebFetch tool devuelve 403/429/503 o cuando el snippet no es suficiente. No usar para fetch masivo sin respetar robots.txt — ver `docs/rules/domain/research-stack.md`.

## References

- Spec: `docs/propuestas/SPEC-003-web-research-system.md`
- Scrapling backend: `docs/propuestas/SE-061-scrapling-research-backend.md`
- Config: `docs/rules/domain/web-research-config.md`
- Stack chain: `docs/rules/domain/research-stack.md`
- Docs ES: `docs/web-research.md`
- Docs EN: `docs/web-research.en.md`
- Tests: `tests/test-web-research.bats`

Related Skills

tech-research-agent

32
from gonzalezpazmonica/pm-workspace

Agente de investigación técnica autónoma — investiga temas, genera informes, notifica al humano designado

zoom-out

32
from gonzalezpazmonica/pm-workspace

Elevates perspective from trees to forest. Maps architecture, dependencies, and second-order effects before implementation decisions. Use when designing, when evaluating trade-offs, or at the start of design sessions.

workspace-integrity

32
from gonzalezpazmonica/pm-workspace

Catalogo de integrity auditors — drift CLAUDE.md, rule manifest, orphan rules, agents catalog sync, baseline, agent size

wellbeing-guardian

32
from gonzalezpazmonica/pm-workspace

Sistema proactivo de bienestar individual

voice-inbox

32
from gonzalezpazmonica/pm-workspace

Transcripción de audio y flujo audio→texto→acción para mensajes de voz

verification-lattice

32
from gonzalezpazmonica/pm-workspace

Multi-layer verification pipeline beyond Code Review

topic-cluster

32
from gonzalezpazmonica/pm-workspace

BERTopic clustering — agrupa retros/PBIs/incidents/lessons en topics tematicos con labels. Filtra ruido, descubre patrones cross-proyecto

time-tracking-report

32
from gonzalezpazmonica/pm-workspace

Generación de informes de imputación de horas a Excel/Word

tier3-probes

32
from gonzalezpazmonica/pm-workspace

Catalogo de feasibility probes para champions Tier 3 — Scrapling, Oumi, Memvid, BERTopic, Reranker, PDF extract

test-architect

32
from gonzalezpazmonica/pm-workspace

Design and generate highest-quality tests across 16 languages and 14 test types

team-onboarding

32
from gonzalezpazmonica/pm-workspace

Onboarding y evaluación de competencias para nuevos miembros del equipo

team-coordination

32
from gonzalezpazmonica/pm-workspace

Multi-team orchestration — create teams, assign members, detect cross-team blockers