ai-agent-papers-guide

Curated 2024-2026 AI agent research papers collection

191 stars

Best use case

ai-agent-papers-guide is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Curated 2024-2026 AI agent research papers collection

Teams using ai-agent-papers-guide should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ai-agent-papers-guide/SKILL.md --create-dirs "https://raw.githubusercontent.com/wentorai/research-plugins/main/skills/domains/ai-ml/ai-agent-papers-guide/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ai-agent-papers-guide/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ai-agent-papers-guide Compares

Feature / Agent	ai-agent-papers-guide	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Curated 2024-2026 AI agent research papers collection

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

SKILL.md Source

# AI Agent Papers Guide (2024-2026)

## Overview

A focused collection of AI agent research papers from 2024-2026, tracking the latest developments in LLM-based agent systems. Unlike broader collections, this focuses on recent breakthroughs — new architectures, benchmarks, multi-agent coordination, and real-world applications. Updated frequently as the field evolves rapidly.

## Paper Categories

```
Recent AI Agent Research
├── Agent Architectures
│   ├── Planning (o1-style reasoning, search-augmented)
│   ├── Memory (long-term, episodic, working)
│   └── Tool use (function calling, code execution)
├── Multi-Agent Systems
│   ├── Collaboration (task decomposition, debate)
│   ├── Competition (red team, adversarial)
│   └── Emergence (self-organization, culture)
├── Evaluation
│   ├── Benchmarks (SWE-bench, WebArena, GAIA)
│   ├── Safety (jailbreak, misuse, alignment)
│   └── Reliability (error recovery, hallucination)
├── Applications
│   ├── Software engineering (coding agents)
│   ├── Scientific research (lab automation)
│   ├── Web automation (browsing, form-filling)
│   └── Enterprise (workflow, data analysis)
└── Infrastructure
    ├── Frameworks (LangGraph, CrewAI, AutoGen)
    ├── Protocols (MCP, A2A, tool standards)
    └── Deployment (scaling, monitoring, cost)
```

## Highlighted Papers (2024-2025)

| Paper | Venue | Key Contribution |
|-------|-------|-----------------|
| SWE-agent | ICLR 2025 | Agent interface design for SE |
| OpenHands | 2024 | Open platform for coding agents |
| AgentBench | ICLR 2024 | Multi-environment agent benchmark |
| GAIA | ICLR 2024 | General AI assistant benchmark |
| Voyager | NeurIPS 2024 | Lifelong learning in Minecraft |
| OS-Copilot | 2024 | Self-improving computer agent |
| AutoGen | 2024 | Multi-agent conversation framework |
| Agent-FLAN | ACL 2024 | Agent fine-tuning methodology |

## Tracking New Papers

```python
import arxiv
from datetime import datetime, timedelta

def find_recent_agent_papers(days=14):
    """Find cutting-edge agent papers."""
    queries = [
        "ti:agent AND (ti:LLM OR ti:language model)",
        "abs:autonomous agent AND abs:tool use AND abs:2024",
        "ti:multi-agent AND abs:large language",
        "abs:coding agent OR abs:software agent",
    ]

    seen = set()
    papers = []

    for q in queries:
        search = arxiv.Search(
            query=q, max_results=15,
            sort_by=arxiv.SortCriterion.SubmittedDate,
        )
        for r in search.results():
            if r.entry_id not in seen:
                seen.add(r.entry_id)
                papers.append({
                    "title": r.title,
                    "date": r.published.strftime("%Y-%m-%d"),
                    "url": r.entry_id,
                })

    papers.sort(key=lambda x: x["date"], reverse=True)
    for p in papers[:20]:
        print(f"[{p['date']}] {p['title']}")
        print(f"  {p['url']}")

find_recent_agent_papers()
```

## Framework Comparison

```python
frameworks = {
    "LangGraph": {
        "paradigm": "Graph-based workflows",
        "persistence": "Built-in checkpointing",
        "multi_agent": "Yes",
        "language": "Python/JS",
    },
    "CrewAI": {
        "paradigm": "Role-based agents",
        "persistence": "Memory module",
        "multi_agent": "Yes (crew)",
        "language": "Python",
    },
    "AutoGen": {
        "paradigm": "Conversational agents",
        "persistence": "Chat history",
        "multi_agent": "Yes (group chat)",
        "language": "Python/.NET",
    },
    "OpenHands": {
        "paradigm": "Computer use agent",
        "persistence": "Workspace state",
        "multi_agent": "No",
        "language": "Python",
    },
}

for name, info in frameworks.items():
    print(f"\n{name}:")
    for k, v in info.items():
        print(f"  {k}: {v}")
```

## Use Cases

1. **Literature tracking**: Stay current on agent research
2. **Framework selection**: Compare agent development tools
3. **Research planning**: Identify open problems and trends
4. **Course material**: Teach cutting-edge agent systems
5. **Benchmark tracking**: Compare agent capabilities

## References

- [awesome-ai-agent-papers](https://github.com/VoltAgent/awesome-ai-agent-papers)
- [VoltAgent Framework](https://github.com/VoltAgent/voltagent)