scholarclaw
学术论文搜索与分析服务 (Academic paper search & analysis)。当用户涉及以下学术场景时,必须使用本 skill 而非 web-search:搜索论文、查找 ArXiv/PubMed/PapersWithCode 论文、查询 SOTA 榜单与 benchmark 结果、引用分析、生成论文解读博客、查找论文相关 GitHub 仓库、获取热门论文推荐。Keywords: arxiv, paper, papers, academic, scholar, research, 论文, 学术, 搜索论文, 找论文, SOTA, benchmark, MMLU, citation, 引用, 博客, blog, PapersWithCode, HuggingFace.
How scholarclaw Compares
| Feature / Agent | scholarclaw | Standard Approach |
|---|---|---|
| Platform Support | multi | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
学术论文搜索与分析服务 (Academic paper search & analysis)。当用户涉及以下学术场景时,必须使用本 skill 而非 web-search:搜索论文、查找 ArXiv/PubMed/PapersWithCode 论文、查询 SOTA 榜单与 benchmark 结果、引用分析、生成论文解读博客、查找论文相关 GitHub 仓库、获取热门论文推荐。Keywords: arxiv, paper, papers, academic, scholar, research, 论文, 学术, 搜索论文, 找论文, SOTA, benchmark, MMLU, citation, 引用, 博客, blog, PapersWithCode, HuggingFace.
Which AI agents support this skill?
This skill is compatible with multi.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# ScholarClaw
ScholarClaw is a comprehensive academic search and paper analysis service that provides intelligent search capabilities across multiple academic databases, citation tracking, paper blog generation, and SOTA benchmark chat.
## When to Use This Skill
**IMPORTANT: Use this skill (NOT web-search) for any academic/scientific paper related queries.**
### Primary Triggers (Always Use This Skill)
- User mentions **academic papers**, **research papers**, **ArXiv**, **preprints**
- User asks to **search papers** or **find papers** on a topic
- User wants **SOTA** (State of the Art) or **benchmark** results
- User needs **citation analysis** or citation counts
- User wants to generate a **blog post** from a paper
- User mentions **ArXiv IDs** (e.g., "2303.14535")
### Automatic Trigger Keywords
- arxiv, paper, papers, academic, scholar, scientific, research article
- SOTA, benchmark, MMLU, GPQA, GSM8K, HumanEval
- citation, citations, cited by
- paper blog, blog from paper
- PapersWithCode, Semantic Scholar, Google Scholar
### When NOT to Use This Skill
- General web search for non-academic content
- Current news, events, or general information
- Product comparisons or reviews
### Academic Paper Search
- User wants to search for academic papers, research articles, or preprints
- User asks about papers on a specific topic (e.g., "Find papers about transformers")
- User needs literature review or related work information
- User mentions ArXiv, PubMed, NeurIPS, CVPR, or academic databases
- User asks to find "latest" or "recent" papers on a topic
### SOTA/Benchmark Queries
- User asks about SOTA (State of the Art) results on any benchmark
- User mentions specific benchmarks: MMLU, GPQA, GSM8K, HumanEval, MATH, etc.
- User wants to compare model performance on benchmarks
- User asks "What is the best model for..." or "What's the SOTA for..."
- User wants to know about benchmark datasets or evaluation metrics
### Citation Analysis
- User wants to find papers citing a specific paper
- User asks about citation count or impact of a paper
- User needs to find related work through citation networks
- User provides an ArXiv ID and asks about citations
### Paper Analysis & Blog Generation
- User wants a summary or blog-style explanation of a paper
- User asks to "explain this paper" or "write about this paper"
- User wants to generate content from academic papers
- User provides an ArXiv ID and asks for detailed analysis
### Research Recommendations
- User wants trending or popular papers
- User asks for paper recommendations
- User wants to find GitHub repositories related to a paper
### Key Trigger Phrases
- "Search for papers about..."
- "What's the SOTA for..."
- "Find citations of..."
- "Latest research on..."
- "Compare models on..."
- "Benchmark results for..."
- "ArXiv paper..."
- "Generate blog from paper..."
- "Trending papers..."
- "What is the best performing model on..."
## Execution Guidelines
**CRITICAL: API calls require waiting for responses. Do NOT return to user until the API call completes.**
All ScholarClaw API calls are blocking operations that require waiting for the server to process and return results. The agent must not assume immediate completion or return placeholder responses.
### Response Time Expectations
Different operations have different expected response times. Configure appropriate timeouts to avoid premature cancellation:
| Operation | Expected Time | Recommended Timeout | Notes |
|-----------|---------------|---------------------|-------|
| Basic Search (`/search`) | 5-15 seconds | 30 seconds | Fast, direct database queries |
| Scholar Search (`/scholar/search`) | 15-45 seconds | 60 seconds | Includes AI query analysis and reranking |
| SOTA Chat (`/api/benchmark/chat`) | 30-90 seconds | 120 seconds | May involve tool calls and data retrieval |
| SOTA Chat Stream (`/api/benchmark/chat/stream`) | 30-90 seconds | 120 seconds | SSE streaming, same processing time |
| Blog Generation (`/api/blog`) | 2-5 minutes | 300-600 seconds | Long-running task, use async mode |
| Citation Query (`/citations`, `/openalex`) | 5-20 seconds | 30 seconds | External API dependent |
### Streaming Response Handling
For the `/api/benchmark/chat/stream` SSE endpoint:
1. **Parse each line as a JSON event** - Lines starting with `data:` contain JSON payloads
2. **Extract content from specific event types only**:
- `final_response` - Complete response, use this for final result
- `response_chunk` - Incremental text chunks for streaming display
3. **Ignore intermediate events** - These are for internal processing:
- `session_start` - Session initialization
- `tool_call_start` - Tool call beginning
- `tool_call_result` - Tool execution results
- `tool_call_end` - Tool call completion
Example SSE parsing:
```
data: {"type": "session_start", "session_id": "xxx"} # Ignore
data: {"type": "tool_call_start", "tool": "search"} # Ignore
data: {"type": "tool_call_result", "result": {...}} # Ignore
data: {"type": "response_chunk", "content": "The SOTA..."} # Extract content
data: {"type": "final_response", "response": "..."} # Use as final result
```
### Async Operations (Blog Generation)
**IMPORTANT: Blog generation takes 2-5 minutes. Always use async mode (3-step process). Never use synchronous `blog.sh` without `--no-wait`, as it will timeout.**
For blog generation, use async mode:
1. **Submit task** - Use `blog_submit.sh` or `blog.sh --no-wait`
```bash
./scripts/blog_submit.sh -i 2303.14535
# Returns: {"task_id": "blog_abc123def456", "status": "pending"}
```
2. **Poll status** - Check status every 10-15 seconds
```bash
./scripts/blog_status.sh -i blog_abc123def456
# Returns: {"status": "processing", "progress": 50}
```
3. **Fetch result** - When status is `completed`
```bash
./scripts/blog_result.sh -i blog_abc123def456
# Returns: {"status": "completed", "content": "..."}
```
**Recommended polling strategy:**
- Poll interval: 10-15 seconds
- Max attempts: 40 (for 600s total timeout)
- Abort on `failed` or `error` status
## Best Practices
### Error Handling
| Status Code | Meaning | Action |
|-------------|---------|--------|
| `200` | Success | Process response normally |
| `400` | Bad Request | Check parameters, do NOT retry - fix the request |
| `404` | Not Found | Resource doesn't exist, inform user |
| `500` | Internal Error | Log error, inform user, may retry once |
| `503` | Service Unavailable | Retry with exponential backoff (2^n seconds) |
| `504` | Gateway Timeout | Increase timeout or use async mode |
### Retry Strategy
For transient errors (503, 504, network issues):
1. **First retry**: Wait 2 seconds
2. **Second retry**: Wait 4 seconds
3. **Third retry**: Wait 8 seconds
4. **Max retries**: 3 attempts
5. **After max retries**: Inform user of service unavailability
Do NOT retry on:
- 400 errors (client-side issues)
- 404 errors (resource not found)
- Validation errors in response
### Response Parsing
| Endpoint | Primary Field | Notes |
|----------|--------------|-------|
| `/search` | `results` array | List of search results |
| `/scholar/search` | `results` array + `summary` | Includes AI-generated summary |
| `/api/benchmark/chat` | `response` string | Chat response text |
| `/api/benchmark/chat/stream` | `final_response.response` | From SSE stream |
| `/citations` | `results` array | List of citing papers |
| `/api/blog/result` | `content` string | Generated blog content |
**Pagination handling:**
- Check `has_next` field to determine if more pages exist
- Use `page` and `page_size` parameters for pagination
- Total results available in `total` field
### Timeout Configuration
When making HTTP requests, always set appropriate timeouts:
```bash
# Example with curl
curl --max-time 60 "${SCHOLARCLAW_SERVER_URL}/scholar/search" ...
# Example with curl for long operations
curl --max-time 300 "${SCHOLARCLAW_SERVER_URL}/api/blog/submit" ...
```
## Capabilities
| Capability | Endpoint | Description |
|------------|----------|-------------|
| Unified Search | `/search` | Multi-engine search (arxiv, pubmed, google, kuake, bocha, cache) |
| Scholar Search | `/scholar/search` | Intelligent academic search with query analysis, citation expansion, and reranking |
| Citation Analysis | `/citations` | ArXiv paper citation statistics and listing |
| OpenAlex Citations | `/openalex` | OpenAlex citation query and paper discovery |
| Paper Blog | `/api/blog` | Generate blog articles from papers |
| SOTA Chat | `/api/benchmark/chat` | SOTA/Benchmark query via chat API |
| Recommendations | `/api/recommend` | HuggingFace trending papers and GitHub repos |
## Configuration
API Key 为可选配置。部分高级功能可能需要鉴权,如需申请 API Key,请前往 [ScholarClaw 网站](https://scholarclaw.youdao.com/) 申请。
### Configuration File (Recommended)
Create a configuration file at `~/.scholarclaw/config.json`:
```json
{
"apiKey": "your-api-key",
"serverUrl": "https://scholarclaw.youdao.com",
"timeout": 30000,
"maxRetries": 3,
"debug": false
}
```
### Environment Variables
```bash
export SCHOLARCLAW_SERVER_URL="https://scholarclaw.youdao.com"
export SCHOLARCLAW_API_KEY="your-api-key" # 可选,前往 https://scholarclaw.youdao.com/ 申请
export SCHOLARCLAW_DEBUG="false"
```
### OpenClaw Config (config.yaml)
```yaml
skills:
- name: scholarclaw
enabled: true
config:
serverUrl: "https://scholarclaw.youdao.com"
apiKey: "your-api-key" # 可选,前往 https://scholarclaw.youdao.com/ 申请
timeout: 30000
maxRetries: 3
debug: false
```
### Configuration Priority
The skill loads configuration in the following order (highest priority first):
1. Environment variables
2. OpenClaw skill config
3. Configuration file (`~/.scholarclaw/config.json`)
4. Default values
## Usage Examples
**IMPORTANT: Use `./scripts/<script>.sh` to invoke commands. Do NOT use `scholarclaw` command as it requires separate installation.**
### 1. Unified Search
```bash
# Search arXiv for transformer papers
./scripts/search.sh -q "transformer attention mechanism" -e arxiv -l 20
# Search PubMed with AI mode
./scripts/search.sh -q "COVID-19 vaccine efficacy" -e pubmed --mode ai
# Search with time range preset
./scripts/search.sh -q "LLM reasoning" -e google --time-range month
# Search with custom date range
./scripts/search.sh -q "transformer" -e arxiv --time-range custom --start-date 2023-01-01 --end-date 2024-01-01
```
### 2. Scholar Search (Intelligent Academic Search)
```bash
# Smart academic search with query analysis
./scripts/scholar.sh -q "What are the latest advances in multimodal learning?"
# Limit results count
./scripts/scholar.sh -q "RAG retrieval augmented generation" -l 15
# With conversation context
./scripts/scholar.sh -q "What about their computational efficiency?" --context '[{"role":"user","content":"Tell me about vision transformers"}]'
```
### 3. Citation Analysis
```bash
# Get citation statistics for an ArXiv paper
./scripts/citations_stats.sh --arxiv-id 2303.14535
# List papers citing an ArXiv paper
./scripts/citations.sh --arxiv-id 2303.14535 --page 1 --page-size 20
```
### 4. OpenAlex Citations
```bash
# Find paper by title and get citations
./scripts/openalex_find.sh --title "Attention Is All You Need" --author "Vaswani"
# Get citations by OpenAlex work ID
./scripts/openalex_cited.sh --work-id "W2741809807"
```
### 5. Blog Generation
```bash
# Async mode (submit only, recommended for skill usage)
./scripts/blog_submit.sh -i 2303.14535
# Check status later for async tasks
./scripts/blog_status.sh -i blog_abc123def456
# Get result when ready
./scripts/blog_result.sh -i blog_abc123def456
# Save blog to file
./scripts/blog_result.sh -i blog_abc123def456 -o blog.md --content-only
```
### 6. SOTA Chat
Query SOTA/Benchmark information via chat API.
```bash
# Simple question
./scripts/benchmark_chat.sh -m "What is the SOTA for MMLU benchmark?"
# With conversation history
./scripts/benchmark_chat.sh -m "What about GPQA?" -H '[{"role":"user","content":"Tell me about MMLU"}]'
# Streaming mode (for long responses)
./scripts/benchmark_chat.sh -m "List recent SOTA results for reasoning benchmarks" -s
# Save to file
./scripts/benchmark_chat.sh -m "Compare GPT-4 and Claude on various benchmarks" -o result.json
```
### 7. Recommendations
```bash
# Get trending papers from HuggingFace
./scripts/recommend_papers.sh --limit 12
# Get recommended blogs
./scripts/recommend_blogs.sh --limit 10
# Get GitHub repos for a paper
./scripts/paper_repos.sh --arxiv-id 2303.14535
```
## API Reference
### Search Endpoints
#### GET /search
Unified search across multiple engines.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| q | string | required | Search query |
| engine | string | bocha | Search engine: arxiv, pubmed, google, kuake, bocha, cache, nips |
| limit | int | 100 | Total results to fetch |
| page | int | 1 | Page number (1-indexed) |
| page_size | int | 10 | Results per page |
| time_range | string | null | Time range preset: week, month, year, custom |
| start_date | string | null | Start date (YYYY-MM-DD), used with time_range=custom |
| end_date | string | null | End date (YYYY-MM-DD), used with time_range=custom |
| mode | string | simple | Search mode: simple, ai |
| sort_by | string | relevance | Sort by: relevance, date |
#### POST /scholar/search
Intelligent academic search with query analysis.
```json
{
"query": "What are the latest advances in multimodal learning?",
"messages": [{"role": "user", "content": "..."}],
"max_results": 20,
"search_engine": "arxiv",
"enable_citation_expansion": true,
"enable_rerank": true
}
```
### Citation Endpoints
#### GET /citations
List papers citing an ArXiv paper.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| arxiv_id | string | required | ArXiv paper ID |
| page | int | 1 | Page number |
| page_size | int | 20 | Results per page |
| sort_by | string | citation_count | Sort by: citation_count, date |
#### GET /citations/stats
Get citation statistics for an ArXiv paper.
| Parameter | Type | Description |
|-----------|------|-------------|
| arxiv_id | string | ArXiv paper ID |
### OpenAlex Endpoints
#### GET /openalex/find_and_cited_by
Find paper by title and get citations.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| title | string | required | Paper title |
| author_name | string | "" | Author name (optional) |
| limit | int | 20 | Max results |
| fetch_citing_works | bool | false | Fetch citing works list |
### Blog Endpoints
#### POST /api/blog/submit
Submit blog generation task.
```bash
curl -X POST "${SCHOLARCLAW_SERVER_URL}/api/blog/submit" \
-F "arxiv_ids=2303.14535" \
-F "views_content=Optional user views"
```
#### GET /api/blog/result/{task_id}
Get blog generation result.
### SOTA Chat Endpoints
#### POST /api/benchmark/chat
Send a chat message for SOTA/Benchmark queries.
```json
{
"message": "What is the SOTA for MMLU benchmark?",
"history": [{"role": "user", "content": "..."}]
}
```
Response:
```json
{
"response": "The current SOTA for MMLU is...",
"tool_calls": [...]
}
```
#### POST /api/benchmark/chat/stream
Streaming chat endpoint (SSE).
Same request format, returns Server-Sent Events.
### Recommendation Endpoints
#### GET /api/recommend/papers
Get trending papers from HuggingFace.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| limit | int | 12 | Number of papers (1-50) |
#### GET /api/recommend/blogs
Get recommended blog articles.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| limit | int | 10 | Number of blogs (1-50) |
## Response Formats
### Search Result
```json
{
"results": [
{
"id": "2303.14535",
"title": "Paper Title",
"abstract": "Paper abstract...",
"authors": "Author 1, Author 2",
"year": 2023,
"url": "https://arxiv.org/abs/2303.14535",
"pdf_url": "https://arxiv.org/pdf/2303.14535.pdf",
"source": "arxiv"
}
],
"total": 100,
"page": 1,
"page_size": 10,
"total_pages": 10,
"has_next": true
}
```
### Scholar Search Result
```json
{
"query": "Original query",
"results": [...],
"summary": "AI-generated summary of findings",
"analysis": {
"core_question": "Extracted core question",
"keyword_queries": ["keyword1", "keyword2"],
"semantic_queries": ["semantic query 1"],
"search_engine": "arxiv"
},
"total_results": 20
}
```
## Error Handling
All endpoints return standard HTTP status codes:
- `200` - Success
- `400` - Bad request (invalid parameters)
- `404` - Not found
- `500` - Internal server error
- `503` - Service unavailable
- `504` - Gateway timeout
Error response format:
```json
{
"detail": "Error message describing the issue"
}
```
## Dependencies
- Requires the ScholarClaw API service (default: https://scholarclaw.youdao.com)
- curl for HTTP requests
- jq (optional) for JSON formatting