academic-search

Search academic paper repositories (arXiv, Semantic Scholar) for scholarly articles in physics, mathematics, computer science, quantitative biology, AI/ML, and related fields

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

academic-search is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Search academic paper repositories (arXiv, Semantic Scholar) for scholarly articles in physics, mathematics, computer science, quantitative biology, AI/ML, and related fields

Teams using academic-search should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/academic-search/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/data-ai/academic-search/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/academic-search/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How academic-search Compares

Feature / Agent	academic-search	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Search academic paper repositories (arXiv, Semantic Scholar) for scholarly articles in physics, mathematics, computer science, quantitative biology, AI/ML, and related fields

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Academic Search Skill

This skill provides access to academic paper repositories, primarily arXiv, for searching scholarly articles. arXiv is a free distribution service and open-access archive for preprints in physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering, systems science, and economics.

## When to Use This Skill

Use this skill when you need to:

- **Find cutting-edge research**: Access preprints and recent papers before formal journal publication
- **Search AI/ML papers**: Find machine learning, deep learning, and artificial intelligence research
- **Explore computational methods**: Search for algorithms, theoretical frameworks, and mathematical foundations
- **Research interdisciplinary topics**: Find papers spanning computer science, biology, physics, and mathematics
- **Gather literature reviews**: Collect relevant papers for comprehensive topic overviews
- **Track state-of-the-art**: Find the latest advances in rapidly evolving fields

### Ideal Use Cases

| Scenario | Example Query |
|----------|---------------|
| Understanding new architectures | "transformer attention mechanism" |
| Exploring applications | "large language models code generation" |
| Finding benchmarks | "image classification benchmark ImageNet" |
| Surveying methods | "reinforcement learning robotics" |
| Technical deep-dives | "backpropagation neural networks" |

## How to Use

The skill provides a Python script that searches arXiv and returns formatted results with titles and abstracts.

### Basic Usage

**Note:** Always use the absolute path from your skills directory.

If running from a virtual environment:
```bash
.venv/bin/python [YOUR_SKILLS_DIR]/academic-search/arxiv_search.py "your search query"
```

Or for system Python:
```bash
python3 [YOUR_SKILLS_DIR]/academic-search/arxiv_search.py "your search query"
```

Replace `[YOUR_SKILLS_DIR]` with the absolute skills directory path from your system prompt.

### Command-Line Arguments

| Argument | Required | Default | Description |
|----------|----------|---------|-------------|
| `query` | Yes | - | The search query string |
| `--max-papers` | No | 10 | Maximum number of papers to retrieve |
| `--output-format` | No | text | Output format: `text`, `json`, or `markdown` |

### Examples

**Search for transformer architecture papers:**
```bash
python3 arxiv_search.py "attention is all you need transformer" --max-papers 5
```

**Search for reinforcement learning papers:**
```bash
python3 arxiv_search.py "deep reinforcement learning continuous control" --max-papers 10
```

**Search for LLM papers with JSON output:**
```bash
python3 arxiv_search.py "large language model reasoning" --output-format json
```

**Search for specific author or topic:**
```bash
python3 arxiv_search.py "author:Hinton deep learning"
```

**Search in specific arXiv categories:**
```bash
python3 arxiv_search.py "cat:cs.LG neural network pruning"
```

## Step-by-Step Workflow

### 1. Formulate Your Query

- Use specific, technical terms (e.g., "convolutional neural network image segmentation" not "AI for pictures")
- Include key authors if known: `author:Bengio`
- Specify arXiv categories for focused results: `cat:cs.CL` (Computation and Language)
- Combine terms for intersection: `"graph neural network" AND "molecular property"`

### 2. Execute the Search

```bash
python3 [SKILLS_DIR]/academic-search/arxiv_search.py "your refined query" --max-papers 10
```

### 3. Review Results

The output includes:
- **Title**: Full paper title
- **Authors**: List of paper authors
- **Published**: Publication date
- **arXiv ID**: Unique identifier (useful for citing)
- **URL**: Direct link to the paper
- **Summary**: Abstract text

### 4. Iterate if Needed

- Too many irrelevant results? Add more specific terms or use category filters
- Too few results? Broaden the query or remove restrictive terms
- Looking for recent work? arXiv sorts by relevance by default

### 5. Save and Synthesize

Save relevant findings to your research workspace for later synthesis:
```
research_workspace/
  papers/
    topic_findings.md
```

## Output Formats

### Text Format (Default)
```
================================================================================
Title: Attention Is All You Need
Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, ...
Published: 2017-06-12
arXiv ID: 1706.03762
URL: https://arxiv.org/abs/1706.03762
--------------------------------------------------------------------------------
Summary: The dominant sequence transduction models are based on complex
recurrent or convolutional neural networks...
================================================================================
```

### JSON Format
```json
{
  "query": "transformer attention",
  "total_results": 5,
  "papers": [
    {
      "title": "Attention Is All You Need",
      "authors": ["Ashish Vaswani", "Noam Shazeer", ...],
      "published": "2017-06-12",
      "arxiv_id": "1706.03762",
      "url": "https://arxiv.org/abs/1706.03762",
      "summary": "The dominant sequence transduction models..."
    }
  ]
}
```

### Markdown Format
```markdown
## Attention Is All You Need

**Authors:** Ashish Vaswani, Noam Shazeer, ...
**Published:** 2017-06-12
**arXiv ID:** [1706.03762](https://arxiv.org/abs/1706.03762)

### Abstract
The dominant sequence transduction models are based on complex...
```

## arXiv Category Reference

Common categories for AI/ML research:

| Category | Description |
|----------|-------------|
| `cs.LG` | Machine Learning |
| `cs.AI` | Artificial Intelligence |
| `cs.CL` | Computation and Language (NLP) |
| `cs.CV` | Computer Vision |
| `cs.NE` | Neural and Evolutionary Computing |
| `cs.RO` | Robotics |
| `stat.ML` | Machine Learning (Statistics) |
| `q-bio` | Quantitative Biology |
| `math.OC` | Optimization and Control |

## Best Practices

### Query Construction

1. **Be specific**: "graph attention network node classification" > "graph neural network"
2. **Use quotation marks**: For exact phrases: `"self-supervised learning"`
3. **Combine operators**: `cat:cs.CV AND "object detection" AND 2023`
4. **Include variations**: Search for both "LLM" and "large language model"

### Research Workflow Integration

1. **Start broad, then narrow**: Begin with general queries, refine based on initial results
2. **Track paper IDs**: Save arXiv IDs for citing and revisiting
3. **Check references**: Seminal papers often cite foundational work
4. **Note publication dates**: Preprints may be superseded by updated versions

### Limitations to Consider

- **Preprint status**: Papers may not be peer-reviewed
- **Version updates**: Check for newer versions (v2, v3, etc.)
- **Coverage gaps**: Not all fields are well-represented on arXiv
- **Rate limiting**: Avoid excessive rapid queries

## Dependencies

This skill requires the `arxiv` Python package:

```bash
# Virtual environment (recommended)
.venv/bin/python -m pip install arxiv

# System-wide
python3 -m pip install arxiv
```

The script will detect if the package is missing and display installation instructions.

## Troubleshooting

### "Error: arxiv package not installed"
Install the arxiv package as shown in Dependencies section.

### No results returned
- Try broader search terms
- Remove category restrictions
- Check for typos in technical terms

### Rate limiting errors
- Wait a few seconds between queries
- Reduce `--max-papers` value

### Connection errors
- Check internet connectivity
- arXiv API may have temporary outages

## Integration with Research Workflow

This skill works well with the web-research skill for comprehensive research:

1. **Use academic-search** for foundational/theoretical papers
2. **Use web-research** for current implementations, tutorials, and practical guides
3. **Synthesize** findings from both sources in your research report

## Notes

- arXiv is particularly strong for:
  - Computer Science (cs.*)
  - Physics (physics.*, hep-*, cond-mat.*)
  - Mathematics (math.*)
  - Quantitative Biology (q-bio.*)
  - Statistics (stat.*)
- Results are sorted by relevance by default
- The arXiv API is free and requires no authentication
- Consider checking cited papers for deeper understanding

Related Skills

gpt-researcher

from diegosouzapw/awesome-omni-skill

Run GPT-Researcher multi-agent deep research framework locally using OpenAI GPT-5.2. Replaces ChatGPT Deep Research with local control. Researches 100+ sources in parallel, provides comprehensive citations. Use for Phase 3 industry/technical research or comprehensive synthesis. Takes 6-20 min depending on report type. Supports multiple LLM providers.

deep-research

from diegosouzapw/awesome-omni-skill

Web research with Graph-of-Thoughts for fast-changing topics. Use when user requests research, analysis, investigation, or comparison requiring current information. Features hypothesis testing, source triangulation, claim verification, Red Team, self-critique, and gap analysis. Supports Quick/Standard/Deep/Exhaustive tiers. Creative Mode for cross-industry innovation.

brutal-deepresearch

from diegosouzapw/awesome-omni-skill

Structured deep research pipeline with confirmation gates and resume support. Generates outline, launches parallel research agents, produces validated JSON results and markdown report.

agent-market-researcher

from diegosouzapw/awesome-omni-skill

Expert market researcher specializing in market analysis, consumer insights, and competitive intelligence. Masters market sizing, segmentation, and trend analysis with focus on identifying opportunities and informing strategic business decisions.

agent-data-researcher

from diegosouzapw/awesome-omni-skill

Expert data researcher specializing in discovering, collecting, and analyzing diverse data sources. Masters data mining, statistical analysis, and pattern recognition with focus on extracting meaningful insights from complex datasets to support evidence-based decisions.

agency-researcher

from diegosouzapw/awesome-omni-skill

Find and qualify real estate agencies in a given suburb

add-search-engine

from diegosouzapw/awesome-omni-skill

Integrate a new LLM search provider into Mentha

academic-data-integration

from diegosouzapw/awesome-omni-skill

When the user needs to integrate multiple data sources (Canvas API, user memory, file systems) to create comprehensive academic reports. This skill combines course information, assignment details, submission status, and user context to generate actionable insights. Triggers include requests that involve cross-referencing multiple data sources or creating consolidated academic reports from disparate systems.

academic-course-setup-automator

from diegosouzapw/awesome-omni-skill

When the user needs to set up multiple academic courses in a learning management system (Canvas/LMS) from structured data sources. This skill automates the entire workflow extracting course schedules from emails/attachments, matching instructors from CSV files, creating courses, enrolling teachers, publishing announcements with class details, uploading syllabi, enabling resource sharing for instructors teaching multiple courses, and publishing all courses. Triggers include course schedule setup, Canvas/LMS administration, academic term preparation, instructor assignment, syllabus distribution, and multi-course management.

academic-benchmark-researcher

from diegosouzapw/awesome-omni-skill

When the user requests information about academic benchmarks, datasets, or research papers, particularly in machine learning, deep learning, or logical reasoning domains. This skill enables systematic research of academic benchmarks by searching web sources, downloading and analyzing arXiv papers, extracting key metadata (number of tasks, training availability, difficulty levels), and compiling comparative summaries. It triggers on requests involving dataset comparisons, benchmark analysis, or academic paper research for table creation.

content-research-writer

from diegosouzapw/awesome-omni-skill

Assists in writing high-quality content by conducting research, adding citations, improving hooks, iterating on outlines, and providing real-time feedback on each section. Transforms your writing process from solo effort to collaborative partnership.

Automate YouTube Top-Ten Video Creation with OpenAI and Safe Image Search

from diegosouzapw/awesome-omni-skill

Integrates OpenAI API for content generation, Bing Image Search API for safe image retrieval, and Pexels API for video footage. Handles authentication via Bearer token, enforces safe search, formats ChatGPT responses into a top-ten list, and includes error handling for API failures.