academic-benchmark-researcher

When the user requests information about academic benchmarks, datasets, or research papers, particularly in machine learning, deep learning, or logical reasoning domains. This skill enables systematic research of academic benchmarks by searching web sources, downloading and analyzing arXiv papers, extracting key metadata (number of tasks, training availability, difficulty levels), and compiling comparative summaries. It triggers on requests involving dataset comparisons, benchmark analysis, or academic paper research for table creation.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

academic-benchmark-researcher is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using academic-benchmark-researcher should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/academic-benchmark-researcher/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/data-ai/academic-benchmark-researcher/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/academic-benchmark-researcher/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How academic-benchmark-researcher Compares

Feature / Agent	academic-benchmark-researcher	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Instructions

## Primary Objective
Systematically research academic benchmarks, datasets, or research papers to extract and compile comparative information (e.g., into a summary table). The core workflow involves: 1) Identifying relevant sources, 2) Extracting key metadata, 3) Synthesizing findings into a structured output (like a LaTeX table).

## Core Workflow
1. **Clarify & Parse Request:** Identify the specific benchmarks/datasets/papers mentioned by the user. Note any required output format (e.g., LaTeX table with specific columns) and constraints (e.g., "no commented lines").
2. **Initial Information Gathering:** For each identified entity (dataset/paper):
* Use `local-web_search` to find general information, official pages (GitHub, project sites), and relevant arXiv IDs.
* For arXiv papers, use `arxiv_local-download_paper` or `fetch-fetch_markdown` to obtain the paper content.
* Search for specific attributes requested by the user (e.g., "number of tasks," "training set," "difficulty levels").
3. **Deep Dive & Verification:** Read paper abstracts, introductions, and methodology sections (using `arxiv_local-read_paper` or parsed markdown) to confirm key details. Cross-reference information from multiple sources (official site, paper, blog posts) for accuracy.
4. **Information Synthesis:** Compile the extracted metadata into a structured format aligned with the user's request. Resolve any ambiguities (e.g., if a "task" count refers to broad categories or individual instances) based on the most authoritative source (typically the original paper).
5. **Output Generation:** Create the final deliverable (e.g., a `.tex` file). Ensure it strictly adheres to the user's formatting specifications. Optionally, provide a concise textual summary of the findings.

## Key Metadata to Extract
When researching a benchmark/dataset, prioritize finding:
* **Full Name & Acronym**
* **Number of Tasks/Categories:** Distinguish between broad task categories and individual task instances.
* **Training Data Availability:** Does it include a dedicated training set, or is it for evaluation only?
* **Difficulty Levels:** Does it feature adjustable or tiered difficulty levels?
* **Core Purpose/Description**
* **Primary Source (arXiv ID, GitHub repo)**

## Tool Usage Guidelines
* `local-web_search`: Use for initial discovery and finding high-level descriptions. Employ specific queries combining the dataset name and target attributes (e.g., "BBH training set few-shot examples").
* `arxiv_local-download_paper` / `fetch-fetch_markdown`: Use to access the canonical source for detailed information. Prefer `arxiv_local-download_paper` for full text analysis when needed.
* `filesystem-write_file` / `filesystem-read_file`: Use for creating and verifying final output files in the workspace.
* `local-claim_done`: Use only after successfully delivering the requested output and providing a final summary.

## Output Standards
* **LaTeX Tables:** Ensure the output contains only the specified table content, without extra comments, document headers, or unrelated text.
* **Summaries:** Be concise but complete, highlighting the sourced information for each dataset.
* **Accuracy:** Base conclusions on the original paper or official project documentation where possible. Acknowledge if information is not explicitly stated.

## Common Pitfalls & Resolutions
* **Ambiguous Task Counts:** If a paper mentions "5 task categories" (like KOR-Bench), report that as the task count unless the user specifies otherwise. Clarify in the summary if needed.
* **Missing Information:** If a key attribute (e.g., training set) is not mentioned in primary sources, infer based on benchmark type (e.g., many evaluation benchmarks lack training sets) and denote with `\ding{55}`. State the assumption in your summary.
* **arXiv Paper Processing:** If `arxiv_local-download_paper` returns a "converting" status, use `fetch-fetch_markdown` on the arXiv abstract page as a reliable fallback to get the paper's metadata and abstract.

Related Skills

gpt-researcher

from diegosouzapw/awesome-omni-skill

Run GPT-Researcher multi-agent deep research framework locally using OpenAI GPT-5.2. Replaces ChatGPT Deep Research with local control. Researches 100+ sources in parallel, provides comprehensive citations. Use for Phase 3 industry/technical research or comprehensive synthesis. Takes 6-20 min depending on report type. Supports multiple LLM providers.

agent-market-researcher

from diegosouzapw/awesome-omni-skill

Expert market researcher specializing in market analysis, consumer insights, and competitive intelligence. Masters market sizing, segmentation, and trend analysis with focus on identifying opportunities and informing strategic business decisions.

agent-data-researcher

from diegosouzapw/awesome-omni-skill

Expert data researcher specializing in discovering, collecting, and analyzing diverse data sources. Masters data mining, statistical analysis, and pattern recognition with focus on extracting meaningful insights from complex datasets to support evidence-based decisions.

agency-researcher

from diegosouzapw/awesome-omni-skill

Find and qualify real estate agencies in a given suburb

academic-search

from diegosouzapw/awesome-omni-skill

Search academic paper repositories (arXiv, Semantic Scholar) for scholarly articles in physics, mathematics, computer science, quantitative biology, AI/ML, and related fields

academic-data-integration

from diegosouzapw/awesome-omni-skill

When the user needs to integrate multiple data sources (Canvas API, user memory, file systems) to create comprehensive academic reports. This skill combines course information, assignment details, submission status, and user context to generate actionable insights. Triggers include requests that involve cross-referencing multiple data sources or creating consolidated academic reports from disparate systems.

academic-course-setup-automator

from diegosouzapw/awesome-omni-skill

When the user needs to set up multiple academic courses in a learning management system (Canvas/LMS) from structured data sources. This skill automates the entire workflow extracting course schedules from emails/attachments, matching instructors from CSV files, creating courses, enrolling teachers, publishing announcements with class details, uploading syllabi, enabling resource sharing for instructors teaching multiple courses, and publishing all courses. Triggers include course schedule setup, Canvas/LMS administration, academic term preparation, instructor assignment, syllabus distribution, and multi-course management.

academic-writing-style

from diegosouzapw/awesome-omni-skill

Personalized academic writing assistant for university assignments in Chinese and English. Use when users need help writing/revising academic reports, project docs, technical analyses, research reviews, or case studies. Produces natural prose avoiding AI markers. Triggers: academic writing, assignment, report, technical analysis, research review, case study. | 个性化学术写作助手，适用于中英文大学作业。触发词：学术写作、作业、报告、技术分析、研究综述、案例研究、项目文档。

academic-writing-standards

from diegosouzapw/awesome-omni-skill

Expert knowledge of academic writing standards for peer-reviewed papers, including citation integrity, style compliance, clarity, and scientific writing best practices. Use when reviewing or editing academic manuscripts, papers, or research documentation.

academic-research-writing

from diegosouzapw/awesome-omni-skill

Use when writing CS research papers (conference, journal, thesis), reviewing scientific manuscripts, improving academic writing clarity, or preparing IEEE/ACM submissions. Invoke when user mentions paper, manuscript, research writing, journal submission, or needs help with academic structure, formatting, or revision.

benchmark-email-automation

from diegosouzapw/awesome-omni-skill

Automate Benchmark Email tasks via Rube MCP (Composio). Always search tools first for current schemas.

api-researcher

from diegosouzapw/awesome-omni-skill

Expert API research including discovery, evaluation, integration analysis, and documentation review