search-cluster

Aggregated search aggregator using Google CSE, GNews RSS, Wikipedia, Reddit, and Scrapling.

3,891 stars

Complexity: medium

About this skill

The `search-cluster` skill acts as a comprehensive, multi-source search aggregator designed for high-availability and security within AI agent workflows. It unifies search capabilities across popular platforms, including Google Custom Search Engine (CSE), Google News RSS feeds, Wikipedia's OpenSearch API, Reddit's JSON search API, and a headless scraping service called Scrapling (powered by DuckDuckGo). This skill is particularly useful for AI agents or automated systems that require gathering extensive information from diverse sources efficiently without the complexity of managing individual APIs or parsing different response formats. It abstracts away the intricacies of each provider, offering a consistent, structured JSON output that includes the source, title, link, and a sanitized snippet for each result. Key features include robust security measures like subprocess isolation for query inputs, mandatory SSL verification for all providers, and an integrated input sanitizer. Optional features like result caching via Redis further enhance performance and efficiency, making it a powerful tool for automated data collection, research, and analysis tasks.

Best use case

The primary use case for `search-cluster` is to empower AI agents and automated systems to perform comprehensive, multi-source web research and data collection. It greatly benefits developers, researchers, content creators, and data analysts who require aggregated, structured information from diverse online platforms for tasks such as trend analysis, content generation, fact-checking, competitive intelligence, or background research, all accessible through a single, secure interface.

Aggregated search aggregator using Google CSE, GNews RSS, Wikipedia, Reddit, and Scrapling.

Users should expect a structured JSON array of search results, each object containing the source, title, link, and a sanitized snippet, aggregated from the configured providers based on the provided query.

Practical example

Example input

Research the latest developments in AI ethics, specifically looking for new regulations, philosophical debates, and public opinion across news and community discussions.

Example output

[
  {"source": "gnews", "title": "EU Proposes New AI Ethics Regulations Amidst Public Debate", "link": "https://example.com/news/eu-ai-ethics", "snippet": "The European Union is advancing new legislation to govern AI, sparking discussions across various sectors..."},
  {"source": "wiki", "title": "Ethics of artificial intelligence - Wikipedia", "link": "https://en.wikipedia.org/wiki/Ethics_of_artificial_intelligence", "snippet": "The ethics of artificial intelligence is the branch of the ethics of technology specific to artificial intelligence..."},
  {"source": "reddit", "title": "r/ArtificialIntelligence - Latest on AI Ethics", "link": "https://reddit.com/r/ArtificialIntelligence/comments/thread", "snippet": "Community discussion on the practical implications of new AI ethical guidelines and recent controversies."}
]

When to use this skill

When an AI agent needs to perform secure, aggregated web research from multiple diverse sources.
When requiring structured JSON output for further automated processing of search results.
When consolidating access to Google, Wikipedia, Reddit, and Google News searches into one tool.
When deep dives into a topic require information from both traditional search engines and community discussions.

When not to use this skill

When a simple, single-source search (e.g., only Google) is sufficient for the task.
When real-time, highly specialized data from proprietary or internal databases is needed exclusively.
When the overhead of setting up multiple API keys and a Python virtual environment is undesirable for a one-off, casual query.
When offline or local file-system-based search is the sole requirement.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/search-cluster/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/1999azzar/search-cluster/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/search-cluster/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How search-cluster Compares

Feature / Agent	search-cluster	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	medium	N/A

Frequently Asked Questions

What does this skill do?

Aggregated search aggregator using Google CSE, GNews RSS, Wikipedia, Reddit, and Scrapling.

How difficult is it to install?

The installation complexity is rated as medium. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

Best AI Skills for ChatGPT

Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

SKILL.md Source

# Search Cluster (Industrial Standard v3.1)

A multi-provider search aggregator designed for high-availability and security.

## Installation
The scrapling provider requires a dedicated virtual environment.
1. Create a venv: python3 -m venv venv/scrapling
2. Install scrapling: venv/scrapling/bin/pip install scrapling
3. Provide the path to the venv binary in SCRAPLING_PYTHON_PATH.

## Security Posture
- Subprocess Isolation: Query inputs are passed as arguments to stealth_fetch.py.
- Strict TLS: Mandatory SSL verification on all providers.
- Sanitization: Integrated native internal scrubber (Path Neutral).

## Requirements and Environment
Declare these variables in your environment or vault:

| Variable | Requirement | Description |
|---|---|---|
| GOOGLE_API_KEY | Optional | API Key for Google Custom Search. |
| GOOGLE_CSE_ID | Optional | Search Engine ID for Google CSE. |
| SCRAPLING_PYTHON_PATH | Optional | Path to the scrapling venv python binary. |
| REDIS_HOST | Optional | Host for result caching. |
| REDIS_PORT | Optional | Port for result caching (Default: 6379). |
| SEARCH_USER_AGENT | Optional | Custom User-Agent string. |

## Providers
- google: Official Google Custom Search.
- wiki: Wikipedia OpenSearch API.
- reddit: Reddit JSON search API.
- gnews: Google News RSS aggregator.
- scrapling: Headless stealth scraping (via DuckDuckGo).

## Included Scripts
- scripts/search-cluster.py: Main entry point.
- scripts/stealth_fetch.py: Scrapling fetcher (REQUIRED for scrapling provider).

## Workflow
1. Execute: scripts/search-cluster.py all "<query>"
2. Output is structured JSON with source, title, link, and sanitized snippet.

Data & Research