multiAI Summary Pending

skywork-excel

Excel generator with AI-powered data analysis, charts, formulas, and web search. Create spreadsheets, analyze CSV/Excel/PDF files, generate HTML reports, and get real-time data.

231 stars

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/skywork-excel/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/skyworkai/skywork-excel/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/skywork-excel/SKILL.md inside your project
  3. Restart your AI agent β€” it will auto-discover the skill

How skywork-excel Compares

Feature / Agentskywork-excelStandard Approach
Platform SupportmultiLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Excel generator with AI-powered data analysis, charts, formulas, and web search. Create spreadsheets, analyze CSV/Excel/PDF files, generate HTML reports, and get real-time data.

Which AI agents support this skill?

This skill is compatible with multi.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Excel Generator
## ⚠️ Multi-Turn Session Rule (CRITICAL)

**When the user's request is a continuation of a previous Excel task, you MUST use the `--session` parameter.**

**How to detect a continuation task:**
- User says: "continue", "modify", "optimize", "adjust", "based on the previous...", "improve the last one..."
- User references previous output: "that report", "the previous Excel", "the analysis above"
- User asks for changes to existing work: "add a chart", "change to purple", "add a column"

**How to use session_id:**
1. Look for the previous task's output: `πŸ’‘ To continue this conversation, use: --session xxx`
2. Add `--session xxx` to the command:
```bash
python3 scripts/excel_api_client.py "user's follow-up request" \
  --session abc123def456 \
  --language zh-CN \
  --log-path "$EXCEL_LOG" \
  > /dev/null 2>&1 &
```

**β›” If you don't pass `--session`, the agent will start fresh and lose all previous context!**

---


## 🚫 CRITICAL: DO NOT READ USER FILES

**NEVER use the `read` tool on user-provided files (Excel, PDF, CSV, images, etc.).**

The backend agent will read and process files itself. You only need to:
1. **Upload files** using the CLI script `--files` parameter
2. **Pass the user's query** directly without modification

Reading files wastes time and causes timeouts. Just upload and let the backend handle everything.

---

## ⚠️ IMPORTANT: Background Execution with Progress Monitoring

Excel tasks take 5-25 minutes. **You MUST run the script in background and poll the log every 5 seconds** to keep the UI responsive and avoid timeouts.

### Step 1: Start script in background

```bash
EXCEL_LOG=/tmp/excel_$(date +%s).log

python3 scripts/excel_api_client.py "user query" \
  --files "/path/to/file1.xlsx" "/path/to/file2.pdf" \
  --language zh-CN \
  --log-path "$EXCEL_LOG" \
  > /dev/null 2>&1 &

echo "Log: $EXCEL_LOG"
```

### Step 2: Monitor progress (REQUIRED)

> **STRICT RULES β€” no exceptions:**
> 1. **Poll every 60 seconds by calling exec tool repeatedly.** Excel tasks take 10-25 minutes.
> 2. **Only show the LAST TASK PROGRESS UPDATE.** Do NOT output full log or summarize it.
> 3. **NEVER restart the task.** The Excel agent handles errors internally and will auto-recover.
> 4. **Ignore all errors in the log** (e.g., `❌`, `Missing parameter`, `jupyter_execute` errors). These are normal β€” the agent will retry automatically.

**Every 60 seconds, run this exact sequence (call exec tool each time, do NOT use while loop):**
```bash
# Step 1: Wait 60 seconds
sleep 60
# Step 2: Check progress
echo "=== Progress Update ==="
grep -A8 "TASK PROGRESS UPDATE" "$EXCEL_LOG" | tail -10
# Step 3: Check if done
grep -E "\[DONE\]|All done" "$EXCEL_LOG" | tail -1
```

- If log contains `[DONE]` or `βœ… All done!` β†’ **stop polling**, read final output with `tail -30 "$EXCEL_LOG"`, then deliver results.
- If NOT done β†’ **report progress to user**, then **call exec tool again** with the same command after 60 seconds.
- **Repeat until done** β€” you must keep calling exec tool every 60 seconds until you see `[DONE]` or `All done`.

### Rules

- **Call exec tool repeatedly** β€” do NOT use a while loop (it blocks output). Call exec every 60 seconds yourself.
- **NEVER restart the task** even if you see errors. The agent handles errors internally.
- **Do NOT summarize or interpret the log** β€” just show the raw TASK PROGRESS UPDATE block.

### What to report to user

> **CRITICAL: Output ONLY the current status. Do NOT repeat or accumulate previous status messages. Each update should be a single, fresh line.**

**After each log read, output ONLY ONE LINE showing the current status:**
```
[Main stage] | [current action] | Elapsed: Xs
```
**Example (output only this single line, nothing else):**
```
Data Processing | Generating charts | Elapsed: 120s
```

**Map TASK PROGRESS UPDATE to main stages:**

| Progress contains | Main stage |
|------------------|------------|
| "读取" / "read" / "load" | Loading data |
| "εˆ†ζž" / "analysis" | Data analysis |
| "图葨" / "chart" / "visualization" | Generating charts |
| "Excel" / "xlsx" | Creating Excel file |
| "HTML" / "ζŠ₯ε‘Š" / "report" | Generating report |
| "保存" / "save" / "output" | Saving output |

**Example status updates:**
```
Loading data | Reading Excel files | Elapsed: 30s
Data analysis | Processing stock data | Elapsed: 90s
Generating report | Creating HTML analysis | Elapsed: 180s
```

**Do NOT:**
- Output `tail -50` or full log content
- Mention errors or heartbeat messages
- Restart the task for any reason
- **Repeat or accumulate previous status messages** β€” each update must be fresh, not appended to previous ones

A professional skill for generating and updating high-quality Excel files using a sophisticated backend service with AI-powered data analysis, charting, formula validation, and report generation capabilities.

---

## Authentication (Required First)

Before using this skill, authentication must be completed. Run the auth script first:

```bash
# Authenticate: checks env token / cached token / browser login
python3 <skill-dir>/scripts/skywork_auth.py || exit 1
```

**Token priority**:
1. Environment variable `SKYBOT_TOKEN` β†’ if set, use directly
2. Cached token file `~/.skywork_token` β†’ validate via API, if valid, use it
3. No valid token β†’ opens browser for login, polls until complete, saves token

**IMPORTANT - Login URL handling**: If script output contains a line starting with `[LOGIN_URL]`, you **MUST** immediately send that URL to the user in a clickable message (e.g. "Please open this link to log in: <url>"). The user may be in an environment where the browser cannot open automatically, so always surface the login URL.

---

## When to Use This Skill

Use this skill when the user wants to:

- **Create Excel files from scratch** with data, formulas, charts, and professional formatting
- **Analyze existing data files** (Excel, CSV, PDF) and generate summary reports or visualizations
- **Update or transform Excel files** (add calculations, charts, pivot tables, formatting)
- **Generate data-driven reports** - outputs can be Excel (.xlsx) or professional HTML reports for viewing/sharing
- **Perform complex data analysis** requiring pandas, numpy, or statistical operations
- **Create dashboards or visualizations** with charts, conditional formatting, and styled tables
- **Extract and structure data** from uploaded documents into Excel format
- **Search the web for data** - the agent can search for real-time information to include in generated outputs (no external search tools required from your side)

### Output Format

The agent supports multiple output formats:
- **Excel (.xlsx)** - for data manipulation and editing
- **HTML reports** - for viewing and sharing

The backend agent automatically chooses the appropriate format based on the user's request. Just pass the user's natural language request directly.

The backend service is particularly powerful for tasks that benefit from specialized Excel knowledge, formula validation, and visual quality assurance.

## How It Works

The skill uses a ReAct agent loop that:

1. **Accepts user requests** via natural language (in English or Chinese)
2. **Processes uploaded files** (Excel, CSV, PDF) if provided
3. **Streams real-time progress** showing LLM reasoning and tool execution
4. **Executes specialized tools** like `jupyter_execute` for data manipulation, `validate_excel_formulas`, `validate_excel_charts`, etc.
5. **Produces output files** in `/workspace/output/` (automatically registered for download)
6. **Supports multi-turn conversations** for iterative refinement

### ⚠️ IMPORTANT: Preserve User's Original Query (Strict No-Rewrite Policy)

When sending requests to the Excel Agent:

- **Keep the user's original query exactly as-is** - do NOT rewrite, expand, or reinterpret the query
- **Pass the query as-is** to the backend agent, which has its own understanding capabilities
- **Only TWO modifications are allowed:**
  1. **Time info**: For time-sensitive queries (e.g., "latest data", "this year", "this quarter"), prepend current time. **Only add if you can reliably obtain the real time:**
     ```
     [Current time: 2026-03-14] User request: Get Xiaomi stock price this week...
     ```
  2. **File paths**: Replace absolute paths with filenames only (e.g., `/Users/xxx/report.xlsx` β†’ `report.xlsx`)
- **All files mentioned in the query MUST be uploaded** - use `upload_file()` for each file before calling `run_agent()`. If you cannot find the file at the specified path, **ask the user to provide the correct file path** before proceeding. Pass the returned `file_ids` to `run_agent()` so the backend can access the uploaded files
- **DO NOT read file contents to modify the query** - just upload the files directly. The backend agent will read and process the files itself. In your query, only provide the mapping between `file_id` and filename (e.g., "file_id abc123 is sales_data.xlsx")
- **NO other modifications allowed** - do not add extra instructions, do not expand requirements, do not "optimize" user's wording

## Core Workflow

### Step 1: Health Check (Required)

Always start by checking if the backend service is healthy:

```python
from excel_api_client import ExcelAgentClient

# Auto-login: will prompt browser login if no token available
client = ExcelAgentClient()

if not client.health_check():
    print("Service unavailable or authentication failed")
    exit(1)

print("Service is ready!")
```

### Step 2: Upload Files (If Needed)

If the user mentions existing files or you have files to analyze, upload them first:

```python
from excel_api_client import ExcelAgentClient

client = ExcelAgentClient()  # Auto-login

# Upload file
file_id = client.upload_file("/path/to/data.xlsx")
print(f"Uploaded: {file_id}")
```

### Step 3: Call the Excel Agent

Send the user's request to the backend via SSE streaming endpoint:

```python
from excel_api_client import ExcelAgentClient

client = ExcelAgentClient()  # Auto-login

# Run agent with streaming progress
output_files = client.run_agent(
    message="Create a sales report with charts",
    file_ids=["uploaded_file_id"],  # Optional
    language="zh-CN"  # or "en-US"
)

# output_files contains: [{"file_id": "...", "name": "...", "size": ...}, ...]
```

The client handles all SSE streaming internally and displays progress to stdout.

### Step 4: Download Generated Files

After the agent completes, download the output files for the user:

```python
from excel_api_client import ExcelAgentClient

client = ExcelAgentClient()  # Auto-login

# Download all output files
for f in output_files:
    client.download_file(f["file_id"], f"./{f['name']}")
```

**CLI output includes both paths** β€” when using the command-line script, it automatically outputs:
- `πŸ“ Local:` β€” the absolute local file path where the file was downloaded
- `☁️ OSS:` β€” the cloud download URL for sharing

**When summarizing results to the user**, always include BOTH the local file path AND the OSS download link from the script output.

## Important Implementation Notes

### Progress Streaming

The SSE endpoint returns real-time progress updates. Always display these to the user so they understand what's happening:

- **`progress` events**: Show the agent's reasoning and thought process
- **`tool_start` events**: Indicate when tools like `jupyter_execute` start running
- **`tool_result` events**: Show whether tools succeeded and their output summaries

This transparency is crucial because Excel generation can take 30-120 seconds for complex tasks.

### Multi-Turn Conversations (IMPORTANT)

**The backend fully supports multi-turn sessions via `session_id`.** This is critical for iterative refinement tasks.

#### How Multi-Turn Works

1. **Generate a unique `session_id`** at the start of a conversation (e.g., `uuid.uuid4()[:12]`)
2. **Pass the same `session_id`** to ALL subsequent `run_agent()` calls in the same conversation
3. The agent automatically:
   - Remembers previous conversation history (up to 40 messages)
   - Preserves Python variables in the Jupyter namespace
   - Keeps output files in the same `/workspace/<session_id>/output/` directory

#### Multi-Turn Example (Recommended: Let Server Generate session_id)

```python
from excel_api_client import ExcelAgentClient

client = ExcelAgentClient()

# First turn: don't pass session_id, server generates one and returns it
output_files, session_id = client.run_agent(
    message="Create a sales report with Q1 data"
)
# session_id is now available, e.g., "a1b2c3d4e5f6"

# Second turn: pass the returned session_id
output_files, _ = client.run_agent(
    message="Add a pie chart showing product category breakdown",
    session_id=session_id  # ← Use the returned session_id
)

# Third turn: continue with same session_id
output_files, _ = client.run_agent(
    message="Change the chart colors to blue theme",
    session_id=session_id
)
```

#### Multi-Turn Example (Alternative: Pre-generate session_id)

```python
import uuid

# Generate session_id upfront
session_id = str(uuid.uuid4())[:12]

# All calls use the same session_id
client.run_agent(message="Create a report", session_id=session_id)
client.run_agent(message="Add charts", session_id=session_id)
```

#### CLI Multi-Turn Example

```bash
# First turn (no --session, server generates one)
python scripts/excel_api_client.py "Create a sales report"
# Output: πŸ’‘ To continue this conversation, use: --session abc123def456

# Second turn (use the printed session_id)
python scripts/excel_api_client.py "Add charts to the report" --session abc123def456
```

#### Clear Session (Start Fresh)

To clear conversation history and start fresh with the same session_id:

```python
# Option 1: Use new_session=True (clears history but keeps session_id)
output_files, _ = client.run_agent(message="...", session_id=session_id, new_session=True)

# Option 2: Don't pass session_id to get a fresh one
output_files, new_session_id = client.run_agent(message="...")
```

#### ⚠️ Common Mistakes

- **Using different `session_id` values**: Agent treats each call as independent
- **Not capturing the returned `session_id`**: If you don't pass one, capture the returned value for subsequent calls

### Error Handling

Common issues and how to handle them:

1. **Authentication failed (401)**: Token is invalid or expired - ask user to provide a valid token
2. **File upload fails**: Ensure file paths are correct, check file size limits
3. **Agent timeout**: Complex tasks may hit the 300s timeout - inform user and suggest breaking into smaller steps
4. **Clarification needed**: When `clarification_needed` event fires, pause and get user input before continuing
5. **Insufficient benefit**: Script or log may show e.g. `Insufficient benefit. Please upgrade your account at {url}` β€” reply per "How to reply when benefit is insufficient" below

#### How to reply when benefit is insufficient

When you detect the above, **reply in the user's current language** β€” do not echo the English message. Use this pattern:

- Convey: "Sorry, Excel/report generation failed. This skill requires upgrading your Skywork membership to use." then a single call-to-action link.
- **Format**: One short sentence in the user's language + a link like `[Upgrade now β†’](url)` or the equivalent in their language.
- **URL**: Extract the upgrade URL from the log/script output (e.g. the `at https://...` part).

### Language Selection

The backend supports both Chinese and English:

- `"language": "zh-CN"` - Chinese prompts and output (default)
- `"language": "en-US"` - English prompts and output

Choose based on the user's language or their explicit preference.

## Backend Capabilities (Tools Available to Agent)

The backend agent has access to these powerful tools:

- **`jupyter_execute`**: Run Python code with pandas, openpyxl, matplotlib, etc. for data manipulation and Excel generation
- **`validate_excel_formulas`**: Verify Excel formulas are syntactically correct before saving
- **`validate_excel_charts`**: Render charts as images to visually verify they look correct
- **`excel_data_llm_analysis`**: Perform semantic analysis on large datasets (translation, classification, summarization)
- **`grep_by_keyword`**: Search uploaded files for specific content
- **`read_document_pages`**: Extract text from PDF/DOCX files
- **`excel_visual_agent`**: Extract structured data from images/PDFs into Excel
- **`parallel_search_full`**: Search the web for data to include in reports
- **`browse_urls`**: Fetch content from specific URLs
- **`todo_write`**: Maintain task lists to prevent goal drift during complex multi-step tasks

You don't need to explicitly call these tools - the agent automatically decides which tools to use based on the user's request.

## Example Usage Patterns

Common scenarios (see Core Workflow for full code):

| Pattern | Description | Key Points |
|---------|-------------|------------|
| **Create from Scratch** | Create Excel from scratch | Pass message directly, no file_ids needed |
| **Analyze Existing File** | Analyze an existing file | Call `upload_file()` first, then pass `file_ids` |
| **Generate HTML Report** | Generate an HTML report | Ideal for sharing and presentation, format auto-selected |
| **Multi-Turn Refinement** | Iterative multi-turn edits | Keep the same `session_id` |
| **Merge Multiple Files** | Merge multiple files | Upload multiple files, process in one request |

**Example requests:**
- "Create a monthly expense tracker with Date, Category, Amount columns"
- "Analyze sales.xlsx, show top 10 customers by revenue with bar chart"
- "Generate an report summarizing the quarterly sales data"
- "Merge jan.csv, feb.csv, mar.csv into one workbook with summary sheet"

## Output File Handling

All output files are saved to `/workspace/<session_id>/output/` on the backend server. The agent automatically:

1. Registers each output file in the file registry
2. Uploads files to OSS for CDN access (xlsx, csv, html, pdf, png, jpg, zip)
3. Returns file metadata in the `output_files` event:
   ```json
   {
     "file_id": "abc123xyz",
     "name": "report.xlsx",
     "size": 15360,
     "mime_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
     "path": "/tmp/excel_agent_workspace/session_123/output/report.xlsx",
     "oss_url": "https://xxx.oss-cn-xxx.aliyuncs.com/excel-agent/session_123/report.xlsx"
   }
   ```
4. Makes files available via `/api/download/{file_id}` (fallback if OSS unavailable)

### ⚠️ IMPORTANT: Present Download Links to User

**After the agent completes, you MUST display the OSS download links to the user:**

- **Show the raw OSS URL directly** (do NOT use sandbox:// or other formats)
- If the file was downloaded locally, also provide the local path
- Example response to user:
  ```
  βœ… Report generated successfully!
  
  πŸ“₯ Download link:
  - report.xlsx: https://picture-search.skywork.ai/skills/upload/2026-03-14/xxx.xlsx
  
  πŸ’Ύ Local file: /Users/xxx/.openclaw/workspace/report.xlsx
  ```
- **Do NOT use** `sandbox://` or `[filename](sandbox://...)` format - these are not clickable
- If `oss_url` is not available, inform user the file was saved locally and provide the full path

## Tips for Best Results

1. **Be specific in requests**: The more detail you provide, the better the output
   - ❌ "Make a sales report"
   - βœ… "Create a sales report with columns: Date, Product, Quantity, Revenue. Include a pivot table summarizing by product category and a bar chart of top 5 products."

2. **Use the helper script**: For convenience, use `scripts/excel_api_client.py` which handles SSE streaming, file upload/download, and error handling

3. **Monitor progress**: Always display progress events to the user - Excel generation can take time for complex tasks

4. **Handle clarifications**: If the agent sends a `clarification_needed` event, pause and get user input before continuing

5. **Session management**: Use consistent session_ids for related tasks to maintain context

6. **Verify outputs**: After downloading files, inform the user of the file location and suggest they open it to verify results

## Troubleshooting

**"Unauthorized (401)"**
- Token is missing, invalid, or expired
- Run `python scripts/skywork_auth.py --login` to re-authenticate

**"Connection timeout"**
- Complex tasks (especially PDF-to-Excel with AI reasoning models) can take 5-25 minutes
- Default timeout is now 900 seconds (15 minutes)
- Use `--timeout 1500` for very complex tasks
- Consider breaking very large tasks into smaller steps

**"File not found after download"**
- Check that `output_files` event was received before attempting download
- Verify file_id is correct
- Try downloading directly with curl (requires valid token in header)

**"Agent produces wrong output"**
- Be more specific in the request (include column names, chart types, formatting details)
- Try multi-turn: generate a basic version first, then refine in follow-up messages

## Script Reference

Use the bundled `scripts/excel_api_client.py` for streamlined integration:

```python
from excel_api_client import ExcelAgentClient

# Initialize with auto-login (recommended)
client = ExcelAgentClient()

# Check if service is ready
if not client.health_check():
    print("Service unavailable or authentication failed.")
    exit(1)

# Upload files if needed
file_ids = [client.upload_file("data.xlsx")]

# Run agent with progress streaming
output_files = client.run_agent(
    message="Create a summary report with charts",
    file_ids=file_ids
)

# Download results
for f in output_files:
    client.download_file(f["file_id"], f"./{f['name']}")
```

## Security Notes

⚠️ **Important Security Practices**:

- **Never commit tokens** to version control
- **Never log full tokens** - use masked format (e.g., `qGXpDd6H...cv0`)
- Token is stored in `~/.skywork_token` (user home directory, not in project)
- Use environment variables for CI/CD pipelines
- Tokens expire - the client will auto-refresh when needed

See `scripts/excel_api_client.py` and `scripts/skywork_auth.py` for the full implementation.