Best use case
dataverse-api is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Deposit and discover research datasets via Harvard Dataverse API
Teams using dataverse-api should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/dataverse-api/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How dataverse-api Compares
| Feature / Agent | dataverse-api | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Deposit and discover research datasets via Harvard Dataverse API
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Harvard Dataverse API
## Overview
Dataverse is an open-source research data repository platform developed by Harvard IQSS, hosting 150K+ datasets across 80+ installations worldwide. The Harvard Dataverse alone has 130K+ datasets covering social science, natural science, and humanities. The API supports search, metadata retrieval, file download, and dataset deposit. Free, no authentication for read access.
## API Endpoints
### Base URL
```
https://dataverse.harvard.edu/api
```
### Search
```bash
# Search datasets
curl "https://dataverse.harvard.edu/api/search?q=climate+change&type=dataset&per_page=20"
# Search files within datasets
curl "https://dataverse.harvard.edu/api/search?q=temperature+data&type=file&per_page=20"
# Filter by subject
curl "https://dataverse.harvard.edu/api/search?q=survey+data&type=dataset&\
fq=subject_ss:\"Social Sciences\""
# Filter by publication date
curl "https://dataverse.harvard.edu/api/search?q=genomics&type=dataset&\
fq=dateSort:[2024-01-01T00:00:00Z TO *]"
# Sort by relevance or date
curl "https://dataverse.harvard.edu/api/search?q=machine+learning&type=dataset&\
sort=date&order=desc"
```
### Get Dataset Metadata
```bash
# By persistent ID (DOI)
curl "https://dataverse.harvard.edu/api/datasets/:persistentId/?persistentId=doi:10.7910/DVN/EXAMPLE"
# By dataset ID
curl "https://dataverse.harvard.edu/api/datasets/12345"
# Get dataset versions
curl "https://dataverse.harvard.edu/api/datasets/:persistentId/versions?persistentId=doi:10.7910/DVN/EXAMPLE"
```
### Download Files
```bash
# Download a specific file by ID
curl -O "https://dataverse.harvard.edu/api/access/datafile/67890"
# Download with original format
curl -O "https://dataverse.harvard.edu/api/access/datafile/67890?format=original"
# Download all files in a dataset (as zip)
curl -O "https://dataverse.harvard.edu/api/access/dataset/:persistentId/?persistentId=doi:10.7910/DVN/EXAMPLE"
```
### Query Parameters (Search)
| Parameter | Description | Example |
|-----------|-------------|---------|
| `q` | Search query | `q=voter+turnout` |
| `type` | Item type | `dataset`, `file`, `dataverse` |
| `per_page` | Results per page (max 1000) | `per_page=50` |
| `start` | Pagination offset | `start=50` |
| `sort` | Sort field | `name`, `date` |
| `order` | Sort order | `asc`, `desc` |
| `fq` | Filter query (Solr) | `fq=subject_ss:"Medicine"` |
## Response Structure
```json
{
"status": "OK",
"data": {
"q": "climate change",
"total_count": 2450,
"items": [
{
"name": "Global Temperature Dataset 2024",
"type": "dataset",
"url": "https://doi.org/10.7910/DVN/EXAMPLE",
"global_id": "doi:10.7910/DVN/EXAMPLE",
"description": "Monthly global temperature anomalies...",
"published_at": "2024-03-15",
"publisher": "Harvard Dataverse",
"subjects": ["Earth and Environmental Sciences"],
"fileCount": 12,
"citation": "Smith, J. (2024). Global Temperature Dataset..."
}
]
}
}
```
## Python Usage
```python
import requests
BASE_URL = "https://dataverse.harvard.edu/api"
def search_datasets(query: str, per_page: int = 20,
subject: str = None) -> list:
"""Search Harvard Dataverse for datasets."""
params = {
"q": query,
"type": "dataset",
"per_page": per_page,
"sort": "date",
"order": "desc",
}
if subject:
params["fq"] = f'subject_ss:"{subject}"'
resp = requests.get(f"{BASE_URL}/search", params=params)
resp.raise_for_status()
data = resp.json()
results = []
for item in data.get("data", {}).get("items", []):
results.append({
"name": item.get("name"),
"doi": item.get("global_id"),
"description": item.get("description", "")[:300],
"published": item.get("published_at"),
"subjects": item.get("subjects", []),
"files": item.get("fileCount", 0),
"url": item.get("url"),
})
return results
def get_dataset_files(doi: str) -> list:
"""List files in a dataset."""
resp = requests.get(
f"{BASE_URL}/datasets/:persistentId/",
params={"persistentId": doi},
)
resp.raise_for_status()
data = resp.json().get("data", {})
files = []
version = data.get("latestVersion", {})
for f in version.get("files", []):
df = f.get("dataFile", {})
files.append({
"id": df.get("id"),
"filename": df.get("filename"),
"size": df.get("filesize"),
"content_type": df.get("contentType"),
"md5": df.get("md5"),
})
return files
def download_file(file_id: int, output_path: str):
"""Download a file from Dataverse."""
resp = requests.get(
f"{BASE_URL}/access/datafile/{file_id}",
stream=True,
)
resp.raise_for_status()
with open(output_path, "wb") as f:
for chunk in resp.iter_content(chunk_size=8192):
f.write(chunk)
# Example: find social science datasets
datasets = search_datasets("income inequality",
subject="Social Sciences")
for ds in datasets:
print(f"[{ds['published']}] {ds['name']} ({ds['files']} files)")
print(f" DOI: {ds['doi']}")
# Example: list files in a dataset
# files = get_dataset_files("doi:10.7910/DVN/EXAMPLE")
# for f in files:
# print(f" {f['filename']} ({f['size']} bytes)")
```
## Other Dataverse Installations
| Installation | URL | Focus |
|-------------|-----|-------|
| Harvard Dataverse | dataverse.harvard.edu | Multi-discipline |
| UNC Dataverse | dataverse.unc.edu | Social science |
| AUSSDA | data.aussda.at | Austrian social science |
| Borealis (Canada) | borealisdata.ca | Canadian research |
| DataverseNL | dataverse.nl | Dutch research |
## References
- [Harvard Dataverse](https://dataverse.harvard.edu/)
- [Dataverse API Guide](https://guides.dataverse.org/en/latest/api/)
- [Dataverse Project](https://dataverse.org/)
- King, G. (2007). "An Introduction to the Dataverse Network." *Sociological Methods & Research* 36(2).Related Skills
thuthesis-guide
Write Tsinghua University theses using the ThuThesis LaTeX template
thesis-writing-guide
Templates, formatting rules, and strategies for thesis and dissertation writing
thesis-template-guide
Set up LaTeX templates for PhD and Master's thesis documents
sjtuthesis-guide
Write SJTU theses using the SJTUThesis LaTeX template with full compliance
scientific-article-pdf
Generate publication-ready scientific article PDFs from templates
novathesis-guide
LaTeX thesis template supporting multiple universities and formats
graphical-abstract-guide
Create SVG graphical abstracts for journal paper submissions
elegant-paper-template
Beautiful LaTeX template for working papers and technical reports
conference-paper-template
Templates and formatting guides for major academic conference submissions
beamer-presentation-guide
Guide to creating academic presentations with LaTeX Beamer
plagiarism-detection-guide
Use plagiarism detection tools and ensure manuscript originality
paper-polish-guide
Review and polish LaTeX research papers for clarity and style