agent-communication-debugger

Diagnoses and debugs A2A agent communication issues including agent status, message routing, transport connectivity, and log analysis. Use when agents aren't responding, messages aren't being delivered, routing is incorrect, or when debugging orchestrator, coder-agent, tester-agent communication problems.

242 stars

byaiskillstore

View on GitHub Installation ↓

Best use case

agent-communication-debugger is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Diagnoses and debugs A2A agent communication issues including agent status, message routing, transport connectivity, and log analysis. Use when agents aren't responding, messages aren't being delivered, routing is incorrect, or when debugging orchestrator, coder-agent, tester-agent communication problems.

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "agent-communication-debugger" skill to help with this workflow task. Context: Diagnoses and debugs A2A agent communication issues including agent status, message routing, transport connectivity, and log analysis. Use when agents aren't responding, messages aren't being delivered, routing is incorrect, or when debugging orchestrator, coder-agent, tester-agent communication problems.

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

Do not use this when you only need a one-off answer and do not need a reusable workflow.
Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/agent-communication-debugger/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/egadams/agent-communication-debugger/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/agent-communication-debugger/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How agent-communication-debugger Compares

Feature / Agent	agent-communication-debugger	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

Cursor vs Codex for AI Workflows

Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.

SKILL.md Source

# Agent Communication Debugger

Debug and diagnose issues with the A2A (Agent-to-Agent) communication system, including the orchestrator, coder-agent, tester-agent, and message transport layers.

## Prerequisites

- A2A agent system located in `a2a_communicating_agents/`
- Python 3.10+ environment
- Access to agent logs in `logs/` directory
- Agent configurations in respective `agent.json` files

## Instructions

### 1. Check Agent Status

First, determine which agents are running:

```bash
# Check all agent processes
ps aux | grep -E "(orchestrator|coder|tester|websocket)_agent|main.py" | grep -v grep
```

Look for:
- `orchestrator_agent/main.py`
- `coder_agent/main.py`
- `tester_agent/main.py`
- `websocket_server.py`

**Common issues:**
- Agent process not found → Agent isn't running, needs to be started
- Multiple instances → Duplicate processes causing conflicts

### 2. Inspect Agent Configurations

Read the agent configuration files to verify capabilities and topics:

```bash
# View orchestrator config
cat a2a_communicating_agents/orchestrator_agent/agent.json

# View coder agent config
cat a2a_communicating_agents/coder_agent/agent.json

# View tester agent config (if exists)
cat a2a_communicating_agents/tester_agent/agent.json
```

**Verify:**
- Agent names match expected values
- Topics are correctly defined
- Capabilities describe what the agent does
- No JSON syntax errors

### 3. Check Agent Logs

Examine logs for errors and message flow:

```bash
# View orchestrator logs (last 50 lines)
tail -50 logs/orchestrator.log

# View all logs with timestamps
tail -f logs/*.log

# Search for specific errors
grep -i "error\|exception\|failed" logs/*.log

# Check for routing decisions
grep -i "routing to\|routed to" logs/orchestrator.log
```

**Look for:**
- Connection errors
- Routing decisions showing wrong agent selection
- JSON parsing errors
- Message processing failures

### 4. Verify Message Transport

Check if the message transport (WebSocket or RAG board) is working:

```bash
# Check if WebSocket server is running
ps aux | grep websocket_server | grep -v grep
netstat -tlnp 2>/dev/null | grep 8765 || ss -tlnp 2>/dev/null | grep 8765

# Check RAG board storage
ls -lh a2a_communicating_agents/storage/
ls -lh storage/

# Check recent messages in message board
tail -20 storage/message_board.jsonl 2>/dev/null || echo "Message board not found"
```

**Expected:**
- WebSocket server on port 8765 (if using WebSocket transport)
- Recent messages in storage/message_board.jsonl (if using RAG transport)
- No permission errors accessing storage

### 5. Test Message Sending

Use the provided test script to send a message and verify delivery:

```bash
# Send a test message to orchestrator
python .claude/skills/agent-debug/scripts/test_message.py
```

This script will:
1. Send a test message to the orchestrator topic
2. Wait for response
3. Show message delivery status
4. Display any responses received

### 6. Diagnose Routing Issues

If messages reach orchestrator but route to wrong agent:

**Check orchestrator's routing logic:**
```bash
# View the decide_route method
grep -A 50 "def decide_route" a2a_communicating_agents/orchestrator_agent/main.py
```

**Check priority keyword mappings:**
```bash
# View fallback routing keywords
grep -A 20 "priority_mappings = {" a2a_communicating_agents/orchestrator_agent/main.py
```

**Verify agent discovery:**
```bash
# Check discovered agents in logs
grep "Discovered.*agents" logs/orchestrator.log | tail -5
```

**Common routing issues:**
- Agent not discovered → Check agent.json exists and is valid
- Wrong agent selected → Keywords don't match, update priority_mappings
- Null target → No suitable agent found, check agent topics/capabilities

### 7. Check Environment Variables

Verify API keys and configuration:

```bash
# Check if OPENAI_API_KEY is set (don't display value)
env | grep -E "(OPENAI|API_KEY)" | sed 's/=.*/=***HIDDEN***/'

# Check model configuration
grep -E "(model|MODEL)" .env 2>/dev/null | sed 's/=.*/=***HIDDEN***/' || echo "No .env file"
```

**Required environment variables:**
- `OPENAI_API_KEY` - For LLM-based routing and code generation
- `ORCHESTRATOR_MODEL` or `OPENAI_MODEL` - Model to use (default: gpt-5-mini)
- `CODER_MODEL` - Model for coder agent (optional, defaults to OPENAI_MODEL)

### 8. Restart Agents (if needed)

If agents are stuck or not responding:

```bash
# Stop all agents
pkill -f "orchestrator_agent/main.py"
pkill -f "coder_agent/main.py"
pkill -f "tester_agent/main.py"
pkill -f "websocket_server.py"

# Wait a moment
sleep 2

# Start WebSocket server (if using)
cd a2a_communicating_agents
nohup python agent_messaging/websocket_server.py > ../logs/websocket.log 2>&1 &

# Start orchestrator
nohup python orchestrator_agent/main.py > ../logs/orchestrator.log 2>&1 &

# Start coder agent
nohup python coder_agent/main.py > ../logs/coder.log 2>&1 &

# Verify they started
sleep 3
ps aux | grep -E "(orchestrator|coder|websocket)" | grep -v grep
```

### 9. Common Issues and Solutions

See [common_issues.md](common_issues.md) for a detailed troubleshooting guide covering:
- Messages not being delivered
- Routing to wrong agent
- Agent not generating responses
- Duplicate message processing
- Transport connectivity problems

## Quick Diagnostic Checklist

Run through this checklist systematically:

- [ ] All required agents are running (orchestrator, coder, tester)
- [ ] WebSocket server is running (if using WebSocket transport)
- [ ] Agent configuration files are valid JSON
- [ ] Orchestrator discovered all agents (check logs)
- [ ] OPENAI_API_KEY is set in environment
- [ ] Recent log entries show activity
- [ ] No Python exceptions in logs
- [ ] Test message sends and receives successfully
- [ ] Routing decisions select correct agent

## Examples

### Example 1: Agent Not Responding to Messages

User problem:
```
I'm sending messages to the orchestrator but getting no response
```

Debug workflow:

1. Check if orchestrator is running:
   ```bash
   ps aux | grep orchestrator_agent | grep -v grep
   ```
   Result: No process found → Orchestrator isn't running

2. Check logs for crash:
   ```bash
   tail -50 logs/orchestrator.log
   ```
   Result: ImportError for OpenAI package

3. Solution: Install missing dependency
   ```bash
   pip install openai
   ```

4. Restart orchestrator:
   ```bash
   cd a2a_communicating_agents
   nohup python orchestrator_agent/main.py > ../logs/orchestrator.log 2>&1 &
   ```

5. Verify it's running:
   ```bash
   ps aux | grep orchestrator_agent | grep -v grep
   tail -10 logs/orchestrator.log
   ```

### Example 2: Messages Routing to Wrong Agent

User problem:
```
I asked for code but it routed to dashboard-agent instead of coder-agent
```

Debug workflow:

1. Check orchestrator discovered coder-agent:
   ```bash
   grep "Discovered.*agents" logs/orchestrator.log | tail -1
   ```
   Result: Shows coder-agent in list ✓

2. Check routing decision in logs:
   ```bash
   grep -A 5 "please write.*code" logs/orchestrator.log
   ```
   Result: Shows routing to dashboard-agent

3. Check routing logic:
   ```bash
   grep -A 30 "priority_mappings = {" a2a_communicating_agents/orchestrator_agent/main.py
   ```
   Result: Keywords look correct

4. Check LLM routing decision:
   ```bash
   grep "Error in decision making" logs/orchestrator.log
   ```
   Result: LLM routing failed, falling back to heuristic

5. Check API key:
   ```bash
   env | grep OPENAI_API_KEY | sed 's/=.*/=***HIDDEN***/'
   ```
   Result: Variable not set

6. Solution: Set API key and restart orchestrator:
   ```bash
   export OPENAI_API_KEY="your-key-here"
   # Or add to .env file
   echo "OPENAI_API_KEY=your-key-here" >> .env
   ```

7. Restart orchestrator to pick up new environment

### Example 3: Coder Agent Acknowledges But Doesn't Generate Code

User problem:
```
Coder agent receives the message but only acknowledges, doesn't generate code
```

Debug workflow:

1. Check coder agent logs:
   ```bash
   grep -i "generate\|code" logs/coder.log | tail -20
   ```
   Result: "OpenAI package not available. Code generation will be limited."

2. Check if OpenAI is installed:
   ```bash
   python -c "import openai; print(openai.__version__)" 2>&1
   ```
   Result: ModuleNotFoundError

3. Install OpenAI package:
   ```bash
   pip install openai
   ```

4. Restart coder agent:
   ```bash
   pkill -f "coder_agent/main.py"
   cd a2a_communicating_agents
   nohup python coder_agent/main.py > ../logs/coder.log 2>&1 &
   ```

5. Verify initialization:
   ```bash
   grep "Initialized with model" logs/coder.log | tail -1
   ```
   Result: Should show model name (e.g., gpt-5-mini)

6. Send test message and verify code generation

### Example 4: Complete System Health Check

User request:
```
Run a complete diagnostic on the agent system
```

Complete diagnostic workflow:

1. Check all agents running:
   ```bash
   echo "=== Agent Processes ==="
   ps aux | grep -E "(orchestrator|coder|tester|websocket)" | grep -v grep
   ```

2. Check agent configs:
   ```bash
   echo "=== Agent Configurations ==="
   for agent in orchestrator_agent coder_agent tester_agent; do
     if [ -f "a2a_communicating_agents/$agent/agent.json" ]; then
       echo "--- $agent ---"
       cat "a2a_communicating_agents/$agent/agent.json" | python -m json.tool
     fi
   done
   ```

3. Check environment:
   ```bash
   echo "=== Environment Variables ==="
   env | grep -E "(OPENAI|MODEL)" | sed 's/=.*/=***HIDDEN***/'
   ```

4. Check recent logs:
   ```bash
   echo "=== Recent Log Activity ==="
   tail -5 logs/*.log 2>/dev/null
   ```

5. Check for errors:
   ```bash
   echo "=== Recent Errors ==="
   grep -i "error\|exception" logs/*.log | tail -10
   ```

6. Test message sending:
   ```bash
   echo "=== Message Transport Test ==="
   python .claude/skills/agent-debug/scripts/test_message.py
   ```

7. Provide summary report with:
   - Agent status (running/stopped)
   - Configuration validity
   - Environment completeness
   - Recent error count
   - Transport test result

## Related Tools

- `orchestrator_chat.py` - Interactive chat interface for testing
- `send_agent_message.py` - Send messages programmatically
- Agent start/stop scripts in `a2a_communicating_agents/`

## Summary

This skill provides systematic debugging for the A2A agent communication system. Use it whenever:
- Agents aren't communicating
- Messages aren't being delivered
- Routing is incorrect
- System behavior is unexpected

Follow the diagnostic steps in order, checking status → configuration → logs → transport → routing. Most issues are:
1. Agent not running
2. Missing dependencies
3. Missing API keys
4. Invalid configurations
5. Routing logic issues

Start with the Quick Diagnostic Checklist and drill down based on what fails.

Related Skills

professional-communication

242

from aiskillstore/marketplace

Guide technical communication for software developers. Covers email structure, team messaging etiquette, meeting agendas, and adapting messages for technical vs non-technical audiences. Use when drafting professional messages, preparing meeting communications, or improving written communication.

twilio-communications

242

from aiskillstore/marketplace

Build communication features with Twilio: SMS messaging, voice calls, WhatsApp Business API, and user verification (2FA). Covers the full spectrum from simple notifications to complex IVR systems and multi-channel authentication. Critical focus on compliance, rate limits, and error handling. Use when: twilio, send SMS, text message, voice call, phone verification.

debugger

242

from aiskillstore/marketplace

Debugging specialist for errors, test failures, and unexpected behavior. Use proactively when encountering any issues.

azure-communication-sms-java

242

from aiskillstore/marketplace

Send SMS messages with Azure Communication Services SMS Java SDK. Use when implementing SMS notifications, alerts, OTP delivery, bulk messaging, or delivery reports.

azure-communication-common-java

242

from aiskillstore/marketplace

Azure Communication Services common utilities for Java. Use when working with CommunicationTokenCredential, user identifiers, token refresh, or shared authentication across ACS services.

azure-communication-chat-java

242

from aiskillstore/marketplace

Build real-time chat applications with Azure Communication Services Chat Java SDK. Use when implementing chat threads, messaging, participants, read receipts, typing notifications, or real-time chat features.

azure-communication-callingserver-java

242

from aiskillstore/marketplace

Azure Communication Services CallingServer (legacy) Java SDK. Note - This SDK is deprecated. Use azure-communication-callautomation instead for new projects. Only use this skill when maintaining legacy code.

azure-communication-callautomation-java

242

from aiskillstore/marketplace

Build call automation workflows with Azure Communication Services Call Automation Java SDK. Use when implementing IVR systems, call routing, call recording, DTMF recognition, text-to-speech, or AI-powered call flows.

when-debugging-ml-training-use-ml-training-debugger

242

from aiskillstore/marketplace

Debug ML training issues and optimize performance including loss divergence, overfitting, and slow convergence

ml-training-debugger

242

from aiskillstore/marketplace

Diagnose machine learning training failures including loss divergence, mode collapse, gradient issues, architecture problems, and optimization failures. This skill spawns a specialist ML debugging ...

ios-debugger-agent

242

from aiskillstore/marketplace

Use XcodeBuildMCP to build, run, launch, and debug the current iOS project on a booted simulator. Trigger when asked to run an iOS app, interact with the simulator UI, inspect on-screen state, capture logs/console output, or diagnose runtime behavior using XcodeBuildMCP tools.

azure-quotas

242

from aiskillstore/marketplace

Check/manage Azure quotas and usage across providers. For deployment planning, capacity validation, region selection. WHEN: "check quotas", "service limits", "current usage", "request quota increase", "quota exceeded", "validate capacity", "regional availability", "provisioning limits", "vCPU limit", "how many vCPUs available in my subscription".

DevOps & Infrastructure