web-research
Search the web to resolve context gaps — documentation, versions, CVEs, best practices. Auto-starts SearxNG Docker if available, falls back to WebSearch.
Best use case
web-research is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Search the web to resolve context gaps — documentation, versions, CVEs, best practices. Auto-starts SearxNG Docker if available, falls back to WebSearch.
Teams using web-research should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/web-research/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How web-research Compares
| Feature / Agent | web-research | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Search the web to resolve context gaps — documentation, versions, CVEs, best practices. Auto-starts SearxNG Docker if available, falls back to WebSearch.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Skill: Web Research
> 3-layer search: cache → SearxNG (Docker auto-start) → Claude WebSearch.
> Inspired by [FAIR-Perplexica](https://github.com/UB-Mannheim/FAIR-Perplexica).
## When to use
- User asks about external technology (versions, APIs, configs)
- Gap detected: CVE, deprecation, compatibility question
- Tech-research-agent needs web sources for investigation
- Developer encounters error from external library
## What it produces
1. **Search results** — reranked by relevance, cached locally
2. **Inline citations** — `[web:N]` with source URLs in footer
3. **Follow-up suggestions** — contextual next commands
4. **Gap detection** — automatic suggestion when external gap detected
## Prerequisites
```
1. Python 3.x available → always true in pm-workspace
2. Docker (optional) for SearxNG → graceful fallback if missing
3. Internet connection (optional) → cache-only mode if offline
```
## Flow
```
User query or gap detected
→ Sanitize (strip PII, projects, emails, IPs)
→ Check cache (TTL by category)
→ If miss: try SearxNG (auto-start Docker)
→ If SearxNG unavailable: use Claude WebSearch
→ Rerank results (keyword + domain authority)
→ Cache results
→ Format with [web:N] citations
→ Show follow-up suggestions
```
## Key modules
| Module | Lines | Purpose |
|--------|-------|---------|
| `cache.py` | 137 | LRU cache, TTL, stats |
| `sanitizer.py` | 107 | PII removal, classification |
| `rerank.py` | 86 | Heuristic scoring |
| `formatter.py` | 88 | Citation formatting |
| `gap_detector.py` | 110 | External vs internal detection |
| `searxng.py` | 149 | Docker auto-start, cross-platform |
| `search.py` | 88 | 3-layer orchestrator |
| `suggestions.py` | 81 | Post-command follow-ups |
## Scrapling enrichment (SE-061)
Para URLs resultantes de SearxNG/WebSearch que requieren extracción de contenido (más allá de snippet), invocar el wrapper adaptativo `scripts/scrapling-fetch.sh`:
```bash
bash scripts/scrapling-fetch.sh "${URL}" --json --timeout 25
```
- Backend `scrapling` si está instalado: bypass Cloudflare/DataDome nativo
- Fallback transparente a `curl` con user-agent `SaviaResearch/1.0`
- Exit 0/1/2, JSON con `status|title|url_final|text|backend`
Usar cuando WebFetch tool devuelve 403/429/503 o cuando el snippet no es suficiente. No usar para fetch masivo sin respetar robots.txt — ver `docs/rules/domain/research-stack.md`.
## References
- Spec: `docs/propuestas/SPEC-003-web-research-system.md`
- Scrapling backend: `docs/propuestas/SE-061-scrapling-research-backend.md`
- Config: `docs/rules/domain/web-research-config.md`
- Stack chain: `docs/rules/domain/research-stack.md`
- Docs ES: `docs/web-research.md`
- Docs EN: `docs/web-research.en.md`
- Tests: `tests/test-web-research.bats`Related Skills
tech-research-agent
Agente de investigación técnica autónoma — investiga temas, genera informes, notifica al humano designado
zoom-out
Elevates perspective from trees to forest. Maps architecture, dependencies, and second-order effects before implementation decisions. Use when designing, when evaluating trade-offs, or at the start of design sessions.
workspace-integrity
Catalogo de integrity auditors — drift CLAUDE.md, rule manifest, orphan rules, agents catalog sync, baseline, agent size
wellbeing-guardian
Sistema proactivo de bienestar individual
voice-inbox
Transcripción de audio y flujo audio→texto→acción para mensajes de voz
verification-lattice
Multi-layer verification pipeline beyond Code Review
topic-cluster
BERTopic clustering — agrupa retros/PBIs/incidents/lessons en topics tematicos con labels. Filtra ruido, descubre patrones cross-proyecto
time-tracking-report
Generación de informes de imputación de horas a Excel/Word
tier3-probes
Catalogo de feasibility probes para champions Tier 3 — Scrapling, Oumi, Memvid, BERTopic, Reranker, PDF extract
test-architect
Design and generate highest-quality tests across 16 languages and 14 test types
team-onboarding
Onboarding y evaluación de competencias para nuevos miembros del equipo
team-coordination
Multi-team orchestration — create teams, assign members, detect cross-team blockers