wikidata-search
Search for items and properties on Wikidata and retrieve entity details, claims, and external identifiers. Supports both keyword search (Wikidata Action API) and semantic/hybrid search (Wikidata Vector Database), plus direct entity retrieval (Special:EntityData) and structured querying (WDQS SPARQL).
Best use case
wikidata-search is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Search for items and properties on Wikidata and retrieve entity details, claims, and external identifiers. Supports both keyword search (Wikidata Action API) and semantic/hybrid search (Wikidata Vector Database), plus direct entity retrieval (Special:EntityData) and structured querying (WDQS SPARQL).
Teams using wikidata-search should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/wikidata-search/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How wikidata-search Compares
| Feature / Agent | wikidata-search | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Search for items and properties on Wikidata and retrieve entity details, claims, and external identifiers. Supports both keyword search (Wikidata Action API) and semantic/hybrid search (Wikidata Vector Database), plus direct entity retrieval (Special:EntityData) and structured querying (WDQS SPARQL).
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Wikidata Search Skill
Search and retrieve data from Wikidata, the free knowledge base.
## Choosing An Access Method
Use the method that matches the task to reduce load and improve accuracy:
- Keyword search by label/alias/description: Action API `wbsearchentities`
- Semantic exploration / fuzzy concept search: Wikidata Vector Database (hybrid vector + keyword via RRF)
- Fetch a known entity's current JSON quickly: Special:EntityData
- Complex graph relations / reporting: Wikidata Query Service (WDQS) SPARQL
## API Endpoints
Base URL: `https://www.wikidata.org/w/api.php`
Entity JSON (often faster for current state): `https://www.wikidata.org/wiki/Special:EntityData/{ID}.json`
SPARQL endpoint: `https://query.wikidata.org/sparql`
Vector DB API: `https://wd-vectordb.wmcloud.org`
## Core Functions
### 1. Search Items (wbsearchentities)
Search for entities by label or alias.
```bash
curl 'https://www.wikidata.org/w/api.php?action=wbsearchentities&search=QUERY&language=en&format=json&type=item&limit=10'
```
Parameters:
- `search`: Search term (required)
- `language`: Language code (default: en)
- `type`: `item` (Q-entities) or `property` (P-entities)
- `limit`: Max results (1-50, default: 7)
- `continue`: Offset for pagination
Response fields per result:
- `id`: Entity ID (e.g., Q42)
- `label`: Primary label
- `description`: Short description
- `aliases`: Alternative names
- `url`: Wikidata page URL
### 2. Get Entity Details (wbgetentities)
Retrieve full entity data including claims/identifiers.
```bash
curl 'https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q42&format=json&props=labels|descriptions|aliases|claims'
```
Parameters:
- `ids`: Pipe-separated entity IDs (max 50)
- `props`: `labels|descriptions|aliases|claims|sitelinks|info`
- `languages`: Filter languages (e.g., `en|fr|de`)
### 3. Get Claims Only (wbgetclaims)
Retrieve claims for specific entity/property.
```bash
curl 'https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=Q42&property=P31&format=json'
```
### 4. Semantic / Hybrid Search (Wikidata Vector Database)
When you don't know the exact label, or want "things like this" discovery, use the Vector DB.
Item search:
```bash
curl 'https://wd-vectordb.wmcloud.org/item/query/?query=QUERY&lang=all&K=20'
```
Property search:
```bash
curl 'https://wd-vectordb.wmcloud.org/property/query/?query=QUERY&lang=all&K=20&exclude_external_ids=false'
```
Optional parameters:
- `lang`: language code, or `all` for cross-language
- `K`: number of results
- `instanceof`: comma-separated QIDs to filter items by "instance of"
- `rerank`: `true|false` (slower)
Response fields:
- `QID` / `PID`
- `similarity_score`
- `rrf_score`
- `source`
### 5. Direct Entity JSON (Special:EntityData)
```bash
curl 'https://www.wikidata.org/wiki/Special:EntityData/Q42.json?flavor=simple'
```
`flavor`:
- `simple`: truthy statements + sitelinks/version
- `full`: full data
### 6. Structured Queries (WDQS SPARQL)
```bash
curl -G 'https://query.wikidata.org/sparql' --data-urlencode 'query=SELECT * WHERE { wd:Q42 ?p ?o } LIMIT 5' -H 'Accept: application/sparql-results+json'
```
## Extracting External Identifiers
External identifiers are stored as claims with datatype `external-id`. Common identifier properties:
| Property | Name | Example |
| -------- | ---------------------- | ---------------------- |
| P214 | VIAF ID | 75121530 |
| P227 | GND ID | 119033364 |
| P244 | Library of Congress ID | n79023811 |
| P213 | ISNI | 0000 0001 2144 9326 |
| P345 | IMDb ID | nm0001354 |
| P646 | Freebase ID | /m/0282x |
| P349 | NDL ID | 00621256 |
| P268 | BnF ID | 11888092r |
| P269 | IdRef ID | 026927608 |
| P906 | SELIBR ID | 182099 |
| P396 | SBN author ID | IT\\ICCU\\CFIV\\000163 |
To extract identifiers from `wbgetentities` response:
```python
# claims = response['entities']['Q42']['claims']
# For each property P:
# claims[P][0]['mainsnak']['datavalue']['value'] -> identifier string
```
## Python Script Usage
Use `scripts/wikidata_api.py` for programmatic access:
```python
from scripts.wikidata_api import WikidataAPI
wd = WikidataAPI()
# Search for items
results = wd.search("Albert Einstein", language="en", limit=5)
# Get entity with identifiers
entity = wd.get_entity("Q937", props=["labels", "descriptions", "claims"])
# Get external identifiers only (all values by default)
identifiers = wd.get_identifiers("Q937")
# Returns: {'P214': ['75121530', ...], 'P227': '118529579', ...}
# Semantic search (Vector DB)
candidates = wd.vector_search_items("a famous science fiction writer", lang="en", k=5)
# SPARQL
raw = wd.execute_sparql("SELECT * WHERE { wd:Q42 ?p ?o } LIMIT 5")
```
## Response Handling
### Search Response Structure
```json
{
"searchinfo": {"search": "query"},
"search": [
{
"id": "Q42",
"label": "Douglas Adams",
"description": "English writer and humorist",
"aliases": ["Douglas Noël Adams"],
"url": "//www.wikidata.org/wiki/Q42"
}
]
}
```
### Entity Response Structure
```json
{
"entities": {
"Q42": {
"type": "item",
"id": "Q42",
"labels": {"en": {"language": "en", "value": "Douglas Adams"}},
"descriptions": {"en": {"language": "en", "value": "..."}},
"claims": {
"P31": [...], // instance of
"P214": [{"mainsnak": {"datavalue": {"value": "113230702"}}}] // VIAF
}
}
}
}
```
## Best Practices
1. **Choose the right access method**: search vs vector search vs entity fetch vs SPARQL
2. **Rate limiting**: add 500ms-1s delay between requests
3. **Batch requests**: use pipe-separated IDs (max 50 per `wbgetentities` call)
4. **Set User-Agent**: include contact info in headers
5. **Handle 429**: respect `Retry-After` and back off
6. **Action API etiquette**: use `maxlag` and request only needed `props`Related Skills
squall-deep-research
Deep research via Codex web search and optionally Gemini deep research. Use when asked to 'deep research', 'squall deep research', 'research deeply', or when a question needs web-sourced evidence. Single-agent, not a swarm. (project)
searching-message-history
Search Telegram conversation history and stored links. Use when finding past messages, what someone said, or links shared in chats.
research-leads
Research new capabilities and changes for tracked AI coding agents. Use this skill when assigned a research-leads issue to discover new features, or when asked to revise a research PR.
research-deep
Read research outline, launch independent agent for each item for deep research. Disable task output.
research-cog
Deep research agent powered by CellCog. Market research, competitive analysis, stock analysis, investment research, academic research with citations. Your AI research analyst.
openrouter-research
Research OpenRouter API docs, available Grok model IDs, vision capability for the judge service, and integration patterns. Use when implementing openrouter_tool.py, when checking which Grok model supports vision/image input for judge_service.py, when OpenRouter returns unexpected errors, or when verifying model availability and context limits.
multi-ai-research
Comprehensive research and analysis using Claude (subagents), Gemini CLI, and Codex CLI. Multi-perspective research with cross-verification, iterative refinement, and 100% citation coverage. Use for security analysis, architecture research, code quality assessment, performance analysis, or any research requiring rigorous verification and multiple AI perspectives.
gpt-researcher
Run GPT-Researcher multi-agent deep research framework locally using OpenAI GPT-5.2. Replaces ChatGPT Deep Research with local control. Researches 100+ sources in parallel, provides comprehensive citations. Use for Phase 3 industry/technical research or comprehensive synthesis. Takes 6-20 min depending on report type. Supports multiple LLM providers.
deep-research
Web research with Graph-of-Thoughts for fast-changing topics. Use when user requests research, analysis, investigation, or comparison requiring current information. Features hypothesis testing, source triangulation, claim verification, Red Team, self-critique, and gap analysis. Supports Quick/Standard/Deep/Exhaustive tiers. Creative Mode for cross-industry innovation.
brutal-deepresearch
Structured deep research pipeline with confirmation gates and resume support. Generates outline, launches parallel research agents, produces validated JSON results and markdown report.
agent-market-researcher
Expert market researcher specializing in market analysis, consumer insights, and competitive intelligence. Masters market sizing, segmentation, and trend analysis with focus on identifying opportunities and informing strategic business decisions.
agent-data-researcher
Expert data researcher specializing in discovering, collecting, and analyzing diverse data sources. Masters data mining, statistical analysis, and pattern recognition with focus on extracting meaningful insights from complex datasets to support evidence-based decisions.