wikidata-search

Search for items and properties on Wikidata and retrieve entity details, claims, and external identifiers. Supports both keyword search (Wikidata Action API) and semantic/hybrid search (Wikidata Vector Database), plus direct entity retrieval (Special:EntityData) and structured querying (WDQS SPARQL).

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

wikidata-search is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using wikidata-search should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/wikidata-search/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/data-ai/wikidata-search/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/wikidata-search/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How wikidata-search Compares

Feature / Agent	wikidata-search	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Wikidata Search Skill

Search and retrieve data from Wikidata, the free knowledge base.

## Choosing An Access Method

Use the method that matches the task to reduce load and improve accuracy:

- Keyword search by label/alias/description: Action API `wbsearchentities`
- Semantic exploration / fuzzy concept search: Wikidata Vector Database (hybrid vector + keyword via RRF)
- Fetch a known entity's current JSON quickly: Special:EntityData
- Complex graph relations / reporting: Wikidata Query Service (WDQS) SPARQL

## API Endpoints

Base URL: `https://www.wikidata.org/w/api.php`

Entity JSON (often faster for current state): `https://www.wikidata.org/wiki/Special:EntityData/{ID}.json`

SPARQL endpoint: `https://query.wikidata.org/sparql`

Vector DB API: `https://wd-vectordb.wmcloud.org`

## Core Functions

### 1. Search Items (wbsearchentities)

Search for entities by label or alias.

```bash
curl 'https://www.wikidata.org/w/api.php?action=wbsearchentities&search=QUERY&language=en&format=json&type=item&limit=10'
```

Parameters:
- `search`: Search term (required)
- `language`: Language code (default: en)
- `type`: `item` (Q-entities) or `property` (P-entities)
- `limit`: Max results (1-50, default: 7)
- `continue`: Offset for pagination

Response fields per result:
- `id`: Entity ID (e.g., Q42)
- `label`: Primary label
- `description`: Short description
- `aliases`: Alternative names
- `url`: Wikidata page URL

### 2. Get Entity Details (wbgetentities)

Retrieve full entity data including claims/identifiers.

```bash
curl 'https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q42&format=json&props=labels|descriptions|aliases|claims'
```

Parameters:
- `ids`: Pipe-separated entity IDs (max 50)
- `props`: `labels|descriptions|aliases|claims|sitelinks|info`
- `languages`: Filter languages (e.g., `en|fr|de`)

### 3. Get Claims Only (wbgetclaims)

Retrieve claims for specific entity/property.

```bash
curl 'https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=Q42&property=P31&format=json'
```

### 4. Semantic / Hybrid Search (Wikidata Vector Database)

When you don't know the exact label, or want "things like this" discovery, use the Vector DB.

Item search:
```bash
curl 'https://wd-vectordb.wmcloud.org/item/query/?query=QUERY&lang=all&K=20'
```

Property search:
```bash
curl 'https://wd-vectordb.wmcloud.org/property/query/?query=QUERY&lang=all&K=20&exclude_external_ids=false'
```

Optional parameters:
- `lang`: language code, or `all` for cross-language
- `K`: number of results
- `instanceof`: comma-separated QIDs to filter items by "instance of"
- `rerank`: `true|false` (slower)

Response fields:
- `QID` / `PID`
- `similarity_score`
- `rrf_score`
- `source`

### 5. Direct Entity JSON (Special:EntityData)

```bash
curl 'https://www.wikidata.org/wiki/Special:EntityData/Q42.json?flavor=simple'
```

`flavor`:
- `simple`: truthy statements + sitelinks/version
- `full`: full data

### 6. Structured Queries (WDQS SPARQL)

```bash
curl -G 'https://query.wikidata.org/sparql' --data-urlencode 'query=SELECT * WHERE { wd:Q42 ?p ?o } LIMIT 5' -H 'Accept: application/sparql-results+json'
```

## Extracting External Identifiers

External identifiers are stored as claims with datatype `external-id`. Common identifier properties:

| Property | Name                   | Example                |
| -------- | ---------------------- | ---------------------- |
| P214     | VIAF ID                | 75121530               |
| P227     | GND ID                 | 119033364              |
| P244     | Library of Congress ID | n79023811              |
| P213     | ISNI                   | 0000 0001 2144 9326    |
| P345     | IMDb ID                | nm0001354              |
| P646     | Freebase ID            | /m/0282x               |
| P349     | NDL ID                 | 00621256               |
| P268     | BnF ID                 | 11888092r              |
| P269     | IdRef ID               | 026927608              |
| P906     | SELIBR ID              | 182099                 |
| P396     | SBN author ID          | IT\\ICCU\\CFIV\\000163 |

To extract identifiers from `wbgetentities` response:
```python
# claims = response['entities']['Q42']['claims']
# For each property P:
#   claims[P][0]['mainsnak']['datavalue']['value'] -> identifier string
```

## Python Script Usage

Use `scripts/wikidata_api.py` for programmatic access:

```python
from scripts.wikidata_api import WikidataAPI

wd = WikidataAPI()

# Search for items
results = wd.search("Albert Einstein", language="en", limit=5)

# Get entity with identifiers
entity = wd.get_entity("Q937", props=["labels", "descriptions", "claims"])

# Get external identifiers only (all values by default)
identifiers = wd.get_identifiers("Q937")
# Returns: {'P214': ['75121530', ...], 'P227': '118529579', ...}

# Semantic search (Vector DB)
candidates = wd.vector_search_items("a famous science fiction writer", lang="en", k=5)

# SPARQL
raw = wd.execute_sparql("SELECT * WHERE { wd:Q42 ?p ?o } LIMIT 5")
```

## Response Handling

### Search Response Structure
```json
{
  "searchinfo": {"search": "query"},
  "search": [
    {
      "id": "Q42",
      "label": "Douglas Adams",
      "description": "English writer and humorist",
      "aliases": ["Douglas Noël Adams"],
      "url": "//www.wikidata.org/wiki/Q42"
    }
  ]
}
```

### Entity Response Structure
```json
{
  "entities": {
    "Q42": {
      "type": "item",
      "id": "Q42",
      "labels": {"en": {"language": "en", "value": "Douglas Adams"}},
      "descriptions": {"en": {"language": "en", "value": "..."}},
      "claims": {
        "P31": [...],  // instance of
        "P214": [{"mainsnak": {"datavalue": {"value": "113230702"}}}]  // VIAF
      }
    }
  }
}
```

## Best Practices

1. **Choose the right access method**: search vs vector search vs entity fetch vs SPARQL
2. **Rate limiting**: add 500ms-1s delay between requests
3. **Batch requests**: use pipe-separated IDs (max 50 per `wbgetentities` call)
4. **Set User-Agent**: include contact info in headers
5. **Handle 429**: respect `Retry-After` and back off
6. **Action API etiquette**: use `maxlag` and request only needed `props`

Related Skills

squall-deep-research

from diegosouzapw/awesome-omni-skill

Deep research via Codex web search and optionally Gemini deep research. Use when asked to 'deep research', 'squall deep research', 'research deeply', or when a question needs web-sourced evidence. Single-agent, not a swarm. (project)

searching-message-history

from diegosouzapw/awesome-omni-skill

Search Telegram conversation history and stored links. Use when finding past messages, what someone said, or links shared in chats.

research-leads

from diegosouzapw/awesome-omni-skill

Research new capabilities and changes for tracked AI coding agents. Use this skill when assigned a research-leads issue to discover new features, or when asked to revise a research PR.

research-deep

from diegosouzapw/awesome-omni-skill

Read research outline, launch independent agent for each item for deep research. Disable task output.

research-cog

from diegosouzapw/awesome-omni-skill

Deep research agent powered by CellCog. Market research, competitive analysis, stock analysis, investment research, academic research with citations. Your AI research analyst.

openrouter-research

from diegosouzapw/awesome-omni-skill

Research OpenRouter API docs, available Grok model IDs, vision capability for the judge service, and integration patterns. Use when implementing openrouter_tool.py, when checking which Grok model supports vision/image input for judge_service.py, when OpenRouter returns unexpected errors, or when verifying model availability and context limits.

multi-ai-research

from diegosouzapw/awesome-omni-skill

Comprehensive research and analysis using Claude (subagents), Gemini CLI, and Codex CLI. Multi-perspective research with cross-verification, iterative refinement, and 100% citation coverage. Use for security analysis, architecture research, code quality assessment, performance analysis, or any research requiring rigorous verification and multiple AI perspectives.

gpt-researcher

from diegosouzapw/awesome-omni-skill

Run GPT-Researcher multi-agent deep research framework locally using OpenAI GPT-5.2. Replaces ChatGPT Deep Research with local control. Researches 100+ sources in parallel, provides comprehensive citations. Use for Phase 3 industry/technical research or comprehensive synthesis. Takes 6-20 min depending on report type. Supports multiple LLM providers.

deep-research

from diegosouzapw/awesome-omni-skill

Web research with Graph-of-Thoughts for fast-changing topics. Use when user requests research, analysis, investigation, or comparison requiring current information. Features hypothesis testing, source triangulation, claim verification, Red Team, self-critique, and gap analysis. Supports Quick/Standard/Deep/Exhaustive tiers. Creative Mode for cross-industry innovation.

brutal-deepresearch

from diegosouzapw/awesome-omni-skill

Structured deep research pipeline with confirmation gates and resume support. Generates outline, launches parallel research agents, produces validated JSON results and markdown report.

agent-market-researcher

from diegosouzapw/awesome-omni-skill

Expert market researcher specializing in market analysis, consumer insights, and competitive intelligence. Masters market sizing, segmentation, and trend analysis with focus on identifying opportunities and informing strategic business decisions.

agent-data-researcher

from diegosouzapw/awesome-omni-skill

Expert data researcher specializing in discovering, collecting, and analyzing diverse data sources. Masters data mining, statistical analysis, and pattern recognition with focus on extracting meaningful insights from complex datasets to support evidence-based decisions.