ai-agent-papers-guide

Curated 2024-2026 AI agent research papers collection

191 stars

Best use case

ai-agent-papers-guide is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Curated 2024-2026 AI agent research papers collection

Teams using ai-agent-papers-guide should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ai-agent-papers-guide/SKILL.md --create-dirs "https://raw.githubusercontent.com/wentorai/research-plugins/main/skills/domains/ai-ml/ai-agent-papers-guide/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/ai-agent-papers-guide/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How ai-agent-papers-guide Compares

Feature / Agentai-agent-papers-guideStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Curated 2024-2026 AI agent research papers collection

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# AI Agent Papers Guide (2024-2026)

## Overview

A focused collection of AI agent research papers from 2024-2026, tracking the latest developments in LLM-based agent systems. Unlike broader collections, this focuses on recent breakthroughs — new architectures, benchmarks, multi-agent coordination, and real-world applications. Updated frequently as the field evolves rapidly.

## Paper Categories

```
Recent AI Agent Research
├── Agent Architectures
│   ├── Planning (o1-style reasoning, search-augmented)
│   ├── Memory (long-term, episodic, working)
│   └── Tool use (function calling, code execution)
├── Multi-Agent Systems
│   ├── Collaboration (task decomposition, debate)
│   ├── Competition (red team, adversarial)
│   └── Emergence (self-organization, culture)
├── Evaluation
│   ├── Benchmarks (SWE-bench, WebArena, GAIA)
│   ├── Safety (jailbreak, misuse, alignment)
│   └── Reliability (error recovery, hallucination)
├── Applications
│   ├── Software engineering (coding agents)
│   ├── Scientific research (lab automation)
│   ├── Web automation (browsing, form-filling)
│   └── Enterprise (workflow, data analysis)
└── Infrastructure
    ├── Frameworks (LangGraph, CrewAI, AutoGen)
    ├── Protocols (MCP, A2A, tool standards)
    └── Deployment (scaling, monitoring, cost)
```

## Highlighted Papers (2024-2025)

| Paper | Venue | Key Contribution |
|-------|-------|-----------------|
| SWE-agent | ICLR 2025 | Agent interface design for SE |
| OpenHands | 2024 | Open platform for coding agents |
| AgentBench | ICLR 2024 | Multi-environment agent benchmark |
| GAIA | ICLR 2024 | General AI assistant benchmark |
| Voyager | NeurIPS 2024 | Lifelong learning in Minecraft |
| OS-Copilot | 2024 | Self-improving computer agent |
| AutoGen | 2024 | Multi-agent conversation framework |
| Agent-FLAN | ACL 2024 | Agent fine-tuning methodology |

## Tracking New Papers

```python
import arxiv
from datetime import datetime, timedelta

def find_recent_agent_papers(days=14):
    """Find cutting-edge agent papers."""
    queries = [
        "ti:agent AND (ti:LLM OR ti:language model)",
        "abs:autonomous agent AND abs:tool use AND abs:2024",
        "ti:multi-agent AND abs:large language",
        "abs:coding agent OR abs:software agent",
    ]

    seen = set()
    papers = []

    for q in queries:
        search = arxiv.Search(
            query=q, max_results=15,
            sort_by=arxiv.SortCriterion.SubmittedDate,
        )
        for r in search.results():
            if r.entry_id not in seen:
                seen.add(r.entry_id)
                papers.append({
                    "title": r.title,
                    "date": r.published.strftime("%Y-%m-%d"),
                    "url": r.entry_id,
                })

    papers.sort(key=lambda x: x["date"], reverse=True)
    for p in papers[:20]:
        print(f"[{p['date']}] {p['title']}")
        print(f"  {p['url']}")

find_recent_agent_papers()
```

## Framework Comparison

```python
frameworks = {
    "LangGraph": {
        "paradigm": "Graph-based workflows",
        "persistence": "Built-in checkpointing",
        "multi_agent": "Yes",
        "language": "Python/JS",
    },
    "CrewAI": {
        "paradigm": "Role-based agents",
        "persistence": "Memory module",
        "multi_agent": "Yes (crew)",
        "language": "Python",
    },
    "AutoGen": {
        "paradigm": "Conversational agents",
        "persistence": "Chat history",
        "multi_agent": "Yes (group chat)",
        "language": "Python/.NET",
    },
    "OpenHands": {
        "paradigm": "Computer use agent",
        "persistence": "Workspace state",
        "multi_agent": "No",
        "language": "Python",
    },
}

for name, info in frameworks.items():
    print(f"\n{name}:")
    for k, v in info.items():
        print(f"  {k}: {v}")
```

## Use Cases

1. **Literature tracking**: Stay current on agent research
2. **Framework selection**: Compare agent development tools
3. **Research planning**: Identify open problems and trends
4. **Course material**: Teach cutting-edge agent systems
5. **Benchmark tracking**: Compare agent capabilities

## References

- [awesome-ai-agent-papers](https://github.com/VoltAgent/awesome-ai-agent-papers)
- [VoltAgent Framework](https://github.com/VoltAgent/voltagent)