ai-tools

Google AI tools integration. Modules: Gemini API (multimodal: audio/image/video/PDF, 2M context), Gemini CLI (second opinions, Google Search, code review), NotebookLM (source-grounded Q&A). Capabilities: transcription, OCR, video analysis, image generation, web search, document queries. Actions: transcribe, analyze, extract, generate, query, search with Google AI. Keywords: Gemini, Gemini API, Gemini CLI, NotebookLM, audio transcription, image captioning, video analysis, PDF extraction, Google Search, second opinion, source-grounded, multimodal, web research. Use when: processing media files, needing second AI opinion, searching current web info, querying uploaded documents, generating images.

16 stars

Best use case

ai-tools is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Google AI tools integration. Modules: Gemini API (multimodal: audio/image/video/PDF, 2M context), Gemini CLI (second opinions, Google Search, code review), NotebookLM (source-grounded Q&A). Capabilities: transcription, OCR, video analysis, image generation, web search, document queries. Actions: transcribe, analyze, extract, generate, query, search with Google AI. Keywords: Gemini, Gemini API, Gemini CLI, NotebookLM, audio transcription, image captioning, video analysis, PDF extraction, Google Search, second opinion, source-grounded, multimodal, web research. Use when: processing media files, needing second AI opinion, searching current web info, querying uploaded documents, generating images.

Teams using ai-tools should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ai-tools/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/content-media/ai-tools/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/ai-tools/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How ai-tools Compares

Feature / Agentai-toolsStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Google AI tools integration. Modules: Gemini API (multimodal: audio/image/video/PDF, 2M context), Gemini CLI (second opinions, Google Search, code review), NotebookLM (source-grounded Q&A). Capabilities: transcription, OCR, video analysis, image generation, web search, document queries. Actions: transcribe, analyze, extract, generate, query, search with Google AI. Keywords: Gemini, Gemini API, Gemini CLI, NotebookLM, audio transcription, image captioning, video analysis, PDF extraction, Google Search, second opinion, source-grounded, multimodal, web research. Use when: processing media files, needing second AI opinion, searching current web info, querying uploaded documents, generating images.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Google AI Tools

Unified integration for Google's AI ecosystem: Gemini API (multimodal), Gemini CLI, and NotebookLM.

## Module Selection

| Need | Module | When to Use |
|------|--------|-------------|
| **Media Processing** | Gemini API | Audio/image/video/PDF analysis, generation |
| **Second Opinion** | Gemini CLI | Code review, cross-validation, alternative perspective |
| **Web Research** | Gemini CLI | Current info via Google Search grounding |
| **Doc-Grounded Q&A** | NotebookLM | Questions from uploaded documents |

---

## Gemini API (Multimodal)

Process audio, images, videos, documents, and generate images.

### Prerequisites

```bash
export GEMINI_API_KEY="your-key"  # Get from https://aistudio.google.com/apikey
pip install google-genai python-dotenv pillow
```

### Quick Commands

**Transcribe Audio:**
```bash
python scripts/gemini_batch_process.py --files audio.mp3 --task transcribe --model gemini-2.5-flash
```

**Analyze Image:**
```bash
python scripts/gemini_batch_process.py --files image.jpg --task analyze --prompt "Describe this" --output output.md
```

**Process Video:**
```bash
python scripts/gemini_batch_process.py --files video.mp4 --task analyze --prompt "Summarize with timestamps"
```

**Extract from PDF:**
```bash
python scripts/gemini_batch_process.py --files doc.pdf --task extract --prompt "Extract tables as JSON" --format json
```

**Generate Image:**
```bash
python scripts/gemini_batch_process.py --task generate --prompt "A futuristic city" --model gemini-2.5-flash-image
```

### Model Selection

| Model | Use Case | Context |
|-------|----------|---------|
| gemini-2.5-flash | General (best price/perf) | 1-2M tokens |
| gemini-2.5-pro | Highest quality | 1-2M tokens |
| gemini-2.5-flash-image | Image generation | - |

### Supported Formats

- **Audio:** WAV, MP3, AAC, FLAC, OGG (up to 9.5 hrs)
- **Images:** PNG, JPEG, WEBP, HEIC (up to 3,600 images)
- **Video:** MP4, MOV, AVI, WebM (up to 6 hrs)
- **Documents:** PDF (up to 1,000 pages)

**References:** `references/audio-processing.md`, `references/vision-understanding.md`, `references/video-analysis.md`, `references/document-extraction.md`, `references/image-generation.md`

---

## Gemini CLI

Orchestrate Gemini for code review, web search, and parallel tasks.

### Verify Installation

```bash
command -v gemini || which gemini
```

### Quick Commands

**Code Generation:**
```bash
gemini "Create [description]. Output complete file." --yolo -o text
```

**Code Review:**
```bash
gemini "Review [file] for bugs and security issues" -o text
```

**Web Research:**
```bash
gemini "What are the latest [topic]? Use Google Search." -o text
```

**Architecture Analysis:**
```bash
gemini "Use codebase_investigator to analyze this project" -o text
```

**Faster Model:**
```bash
gemini "[prompt]" -m gemini-2.5-flash -o text
```

### Key Flags

- `--yolo` / `-y`: Auto-approve tool calls
- `-o text`: Human-readable output
- `-o json`: Structured output
- `-m gemini-2.5-flash`: Faster model

### When to Use

✅ Second opinion on code
✅ Current web information
✅ Codebase architecture analysis
✅ Parallel code generation

❌ Simple quick tasks
❌ Interactive refinement

**References:** `references/gemini-reference.md`, `references/gemini-patterns.md`, `references/gemini-templates.md`, `references/gemini-tools.md`

---

## NotebookLM

Query uploaded documents with source-grounded answers.

### Prerequisites

```bash
python scripts/run.py auth_manager.py status  # Check auth
python scripts/run.py auth_manager.py setup   # One-time setup (browser visible)
```

### Quick Commands

**List Notebooks:**
```bash
python scripts/run.py notebook_manager.py list
```

**Add Notebook:**
```bash
python scripts/run.py notebook_manager.py add \
  --url "https://notebooklm.google.com/notebook/..." \
  --name "Name" --description "What it contains" --topics "topic1,topic2"
```

**Ask Question:**
```bash
python scripts/run.py ask_question.py --question "Your question" --notebook-id ID
```

**Search Notebooks:**
```bash
python scripts/run.py notebook_manager.py search --query "keyword"
```

### Critical Notes

1. **Always use `run.py` wrapper** - Handles venv automatically
2. **Browser visible for auth** - Required for Google login
3. **Follow-up questions** - Don't stop at first answer
4. **Rate limit:** 50 queries/day on free accounts

**References:** `references/notebooklm-api.md`, `references/notebooklm-troubleshooting.md`

---

## Scripts Overview

### Gemini API Scripts (in `scripts/`)

| Script | Purpose |
|--------|---------|
| `gemini_batch_process.py` | Batch process media files |
| `media_optimizer.py` | Prepare media for API limits |
| `document_converter.py` | Convert docs to PDF |

### NotebookLM Scripts (via `run.py`)

| Script | Purpose |
|--------|---------|
| `auth_manager.py` | Authentication management |
| `notebook_manager.py` | Library CRUD |
| `ask_question.py` | Query interface |
| `cleanup_manager.py` | Data cleanup |

---

## Cost Optimization

### Gemini API Pricing

| Model | Input | Output |
|-------|-------|--------|
| 2.5 Flash | $1.00/1M | $0.10/1M |
| 2.5 Pro | $3.00/1M | $12.00/1M |

### Token Rates

- Audio: 32 tokens/sec (1 min = 1,920 tokens)
- Video: ~300 tokens/sec
- PDF: 258 tokens/page
- Image: 258-1,548 tokens

### Best Practices

1. Use `gemini-2.5-flash` for most tasks
2. Use File API for files >20MB
3. Optimize media before upload
4. Process specific segments, not full videos

---

## Error Handling

| Error | Solution |
|-------|----------|
| 401 | Check API key |
| 429 | Rate limit - wait or use flash model |
| ModuleNotFoundError | Use `run.py` wrapper |
| Auth fails | Browser must be visible |

---

## References

### Gemini API
- `references/audio-processing.md`
- `references/vision-understanding.md`
- `references/video-analysis.md`
- `references/document-extraction.md`
- `references/image-generation.md`

### Gemini CLI
- `references/gemini-reference.md`
- `references/gemini-patterns.md`
- `references/gemini-templates.md`
- `references/gemini-tools.md`

### NotebookLM
- `references/notebooklm-api.md`
- `references/notebooklm-troubleshooting.md`
- `references/notebooklm-usage.md`

---

## Resources

- [Gemini API Key](https://aistudio.google.com/apikey)
- [Gemini API Docs](https://ai.google.dev/gemini-api)
- [NotebookLM](https://notebooklm.google.com)

Related Skills

cli-modern-tools

16
from diegosouzapw/awesome-omni-skill

Auto-suggest modern CLI tool alternatives (bat, eza, fd, ripgrep) for faster, more efficient command-line operations with 50%+ speed improvements

chrome-devtools

16
from diegosouzapw/awesome-omni-skill

Control Chrome browser programmatically using chrome-devtools-mcp. Use when user asks to automate Chrome, debug web pages, take screenshots, evaluate JavaScript, inspect network requests, or interact with browser DevTools. Also use when asked about browser automation, web scraping, or testing websites.

api-tools

16
from diegosouzapw/awesome-omni-skill

API testing, documentation, and development tools

ai-dev-tools-sync

16
from diegosouzapw/awesome-omni-skill

Synchronize and update Claude Code and GitHub Copilot development tool configurations to work similarly. Use when asked to update Claude Code setup, update Copilot setup, sync AI dev tools, add new skills/prompts/agents across both platforms, or ensure Claude and Copilot configurations are aligned. Covers skills, prompts, agents, instructions, workflows, and chat modes.

agent-tools

16
from diegosouzapw/awesome-omni-skill

Reference for configuring tool permissions when launching Claude Code agents. Use when setting up --allowedTools flags, restricting file access, or configuring agent permissions.

bgo

10
from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

moai-lang-r

16
from diegosouzapw/awesome-omni-skill

R 4.4+ best practices with testthat 3.2, lintr 3.2, and data analysis patterns.

moai-lang-python

16
from diegosouzapw/awesome-omni-skill

Python 3.13+ development specialist covering FastAPI, Django, async patterns, data science, testing with pytest, and modern Python features. Use when developing Python APIs, web applications, data pipelines, or writing tests.

moai-icons-vector

16
from diegosouzapw/awesome-omni-skill

Vector icon libraries ecosystem guide covering 10+ major libraries with 200K+ icons, including React Icons (35K+), Lucide (1000+), Tabler Icons (5900+), Iconify (200K+), Heroicons, Phosphor, and Radix Icons with implementation patterns, decision trees, and best practices.

moai-foundation-trust

16
from diegosouzapw/awesome-omni-skill

Complete TRUST 4 principles guide covering Test First, Readable, Unified, Secured. Validation methods, enterprise quality gates, metrics, and November 2025 standards. Enterprise v4.0 with 50+ software quality standards references.

moai-foundation-memory

16
from diegosouzapw/awesome-omni-skill

Persistent memory across sessions using MCP Memory Server for user preferences, project context, and learned patterns

moai-foundation-core

16
from diegosouzapw/awesome-omni-skill

MoAI-ADK's foundational principles - TRUST 5, SPEC-First TDD, delegation patterns, token optimization, progressive disclosure, modular architecture, agent catalog, command reference, and execution rules for building AI-powered development workflows