ai-tools
Google AI tools integration. Modules: Gemini API (multimodal: audio/image/video/PDF, 2M context), Gemini CLI (second opinions, Google Search, code review), NotebookLM (source-grounded Q&A). Capabilities: transcription, OCR, video analysis, image generation, web search, document queries. Actions: transcribe, analyze, extract, generate, query, search with Google AI. Keywords: Gemini, Gemini API, Gemini CLI, NotebookLM, audio transcription, image captioning, video analysis, PDF extraction, Google Search, second opinion, source-grounded, multimodal, web research. Use when: processing media files, needing second AI opinion, searching current web info, querying uploaded documents, generating images.
Best use case
ai-tools is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Google AI tools integration. Modules: Gemini API (multimodal: audio/image/video/PDF, 2M context), Gemini CLI (second opinions, Google Search, code review), NotebookLM (source-grounded Q&A). Capabilities: transcription, OCR, video analysis, image generation, web search, document queries. Actions: transcribe, analyze, extract, generate, query, search with Google AI. Keywords: Gemini, Gemini API, Gemini CLI, NotebookLM, audio transcription, image captioning, video analysis, PDF extraction, Google Search, second opinion, source-grounded, multimodal, web research. Use when: processing media files, needing second AI opinion, searching current web info, querying uploaded documents, generating images.
Teams using ai-tools should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/ai-tools/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How ai-tools Compares
| Feature / Agent | ai-tools | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Google AI tools integration. Modules: Gemini API (multimodal: audio/image/video/PDF, 2M context), Gemini CLI (second opinions, Google Search, code review), NotebookLM (source-grounded Q&A). Capabilities: transcription, OCR, video analysis, image generation, web search, document queries. Actions: transcribe, analyze, extract, generate, query, search with Google AI. Keywords: Gemini, Gemini API, Gemini CLI, NotebookLM, audio transcription, image captioning, video analysis, PDF extraction, Google Search, second opinion, source-grounded, multimodal, web research. Use when: processing media files, needing second AI opinion, searching current web info, querying uploaded documents, generating images.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Google AI Tools Unified integration for Google's AI ecosystem: Gemini API (multimodal), Gemini CLI, and NotebookLM. ## Module Selection | Need | Module | When to Use | |------|--------|-------------| | **Media Processing** | Gemini API | Audio/image/video/PDF analysis, generation | | **Second Opinion** | Gemini CLI | Code review, cross-validation, alternative perspective | | **Web Research** | Gemini CLI | Current info via Google Search grounding | | **Doc-Grounded Q&A** | NotebookLM | Questions from uploaded documents | --- ## Gemini API (Multimodal) Process audio, images, videos, documents, and generate images. ### Prerequisites ```bash export GEMINI_API_KEY="your-key" # Get from https://aistudio.google.com/apikey pip install google-genai python-dotenv pillow ``` ### Quick Commands **Transcribe Audio:** ```bash python scripts/gemini_batch_process.py --files audio.mp3 --task transcribe --model gemini-2.5-flash ``` **Analyze Image:** ```bash python scripts/gemini_batch_process.py --files image.jpg --task analyze --prompt "Describe this" --output output.md ``` **Process Video:** ```bash python scripts/gemini_batch_process.py --files video.mp4 --task analyze --prompt "Summarize with timestamps" ``` **Extract from PDF:** ```bash python scripts/gemini_batch_process.py --files doc.pdf --task extract --prompt "Extract tables as JSON" --format json ``` **Generate Image:** ```bash python scripts/gemini_batch_process.py --task generate --prompt "A futuristic city" --model gemini-2.5-flash-image ``` ### Model Selection | Model | Use Case | Context | |-------|----------|---------| | gemini-2.5-flash | General (best price/perf) | 1-2M tokens | | gemini-2.5-pro | Highest quality | 1-2M tokens | | gemini-2.5-flash-image | Image generation | - | ### Supported Formats - **Audio:** WAV, MP3, AAC, FLAC, OGG (up to 9.5 hrs) - **Images:** PNG, JPEG, WEBP, HEIC (up to 3,600 images) - **Video:** MP4, MOV, AVI, WebM (up to 6 hrs) - **Documents:** PDF (up to 1,000 pages) **References:** `references/audio-processing.md`, `references/vision-understanding.md`, `references/video-analysis.md`, `references/document-extraction.md`, `references/image-generation.md` --- ## Gemini CLI Orchestrate Gemini for code review, web search, and parallel tasks. ### Verify Installation ```bash command -v gemini || which gemini ``` ### Quick Commands **Code Generation:** ```bash gemini "Create [description]. Output complete file." --yolo -o text ``` **Code Review:** ```bash gemini "Review [file] for bugs and security issues" -o text ``` **Web Research:** ```bash gemini "What are the latest [topic]? Use Google Search." -o text ``` **Architecture Analysis:** ```bash gemini "Use codebase_investigator to analyze this project" -o text ``` **Faster Model:** ```bash gemini "[prompt]" -m gemini-2.5-flash -o text ``` ### Key Flags - `--yolo` / `-y`: Auto-approve tool calls - `-o text`: Human-readable output - `-o json`: Structured output - `-m gemini-2.5-flash`: Faster model ### When to Use ✅ Second opinion on code ✅ Current web information ✅ Codebase architecture analysis ✅ Parallel code generation ❌ Simple quick tasks ❌ Interactive refinement **References:** `references/gemini-reference.md`, `references/gemini-patterns.md`, `references/gemini-templates.md`, `references/gemini-tools.md` --- ## NotebookLM Query uploaded documents with source-grounded answers. ### Prerequisites ```bash python scripts/run.py auth_manager.py status # Check auth python scripts/run.py auth_manager.py setup # One-time setup (browser visible) ``` ### Quick Commands **List Notebooks:** ```bash python scripts/run.py notebook_manager.py list ``` **Add Notebook:** ```bash python scripts/run.py notebook_manager.py add \ --url "https://notebooklm.google.com/notebook/..." \ --name "Name" --description "What it contains" --topics "topic1,topic2" ``` **Ask Question:** ```bash python scripts/run.py ask_question.py --question "Your question" --notebook-id ID ``` **Search Notebooks:** ```bash python scripts/run.py notebook_manager.py search --query "keyword" ``` ### Critical Notes 1. **Always use `run.py` wrapper** - Handles venv automatically 2. **Browser visible for auth** - Required for Google login 3. **Follow-up questions** - Don't stop at first answer 4. **Rate limit:** 50 queries/day on free accounts **References:** `references/notebooklm-api.md`, `references/notebooklm-troubleshooting.md` --- ## Scripts Overview ### Gemini API Scripts (in `scripts/`) | Script | Purpose | |--------|---------| | `gemini_batch_process.py` | Batch process media files | | `media_optimizer.py` | Prepare media for API limits | | `document_converter.py` | Convert docs to PDF | ### NotebookLM Scripts (via `run.py`) | Script | Purpose | |--------|---------| | `auth_manager.py` | Authentication management | | `notebook_manager.py` | Library CRUD | | `ask_question.py` | Query interface | | `cleanup_manager.py` | Data cleanup | --- ## Cost Optimization ### Gemini API Pricing | Model | Input | Output | |-------|-------|--------| | 2.5 Flash | $1.00/1M | $0.10/1M | | 2.5 Pro | $3.00/1M | $12.00/1M | ### Token Rates - Audio: 32 tokens/sec (1 min = 1,920 tokens) - Video: ~300 tokens/sec - PDF: 258 tokens/page - Image: 258-1,548 tokens ### Best Practices 1. Use `gemini-2.5-flash` for most tasks 2. Use File API for files >20MB 3. Optimize media before upload 4. Process specific segments, not full videos --- ## Error Handling | Error | Solution | |-------|----------| | 401 | Check API key | | 429 | Rate limit - wait or use flash model | | ModuleNotFoundError | Use `run.py` wrapper | | Auth fails | Browser must be visible | --- ## References ### Gemini API - `references/audio-processing.md` - `references/vision-understanding.md` - `references/video-analysis.md` - `references/document-extraction.md` - `references/image-generation.md` ### Gemini CLI - `references/gemini-reference.md` - `references/gemini-patterns.md` - `references/gemini-templates.md` - `references/gemini-tools.md` ### NotebookLM - `references/notebooklm-api.md` - `references/notebooklm-troubleshooting.md` - `references/notebooklm-usage.md` --- ## Resources - [Gemini API Key](https://aistudio.google.com/apikey) - [Gemini API Docs](https://ai.google.dev/gemini-api) - [NotebookLM](https://notebooklm.google.com)
Related Skills
cli-modern-tools
Auto-suggest modern CLI tool alternatives (bat, eza, fd, ripgrep) for faster, more efficient command-line operations with 50%+ speed improvements
chrome-devtools
Control Chrome browser programmatically using chrome-devtools-mcp. Use when user asks to automate Chrome, debug web pages, take screenshots, evaluate JavaScript, inspect network requests, or interact with browser DevTools. Also use when asked about browser automation, web scraping, or testing websites.
api-tools
API testing, documentation, and development tools
ai-dev-tools-sync
Synchronize and update Claude Code and GitHub Copilot development tool configurations to work similarly. Use when asked to update Claude Code setup, update Copilot setup, sync AI dev tools, add new skills/prompts/agents across both platforms, or ensure Claude and Copilot configurations are aligned. Covers skills, prompts, agents, instructions, workflows, and chat modes.
agent-tools
Reference for configuring tool permissions when launching Claude Code agents. Use when setting up --allowedTools flags, restricting file access, or configuring agent permissions.
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
moai-lang-r
R 4.4+ best practices with testthat 3.2, lintr 3.2, and data analysis patterns.
moai-lang-python
Python 3.13+ development specialist covering FastAPI, Django, async patterns, data science, testing with pytest, and modern Python features. Use when developing Python APIs, web applications, data pipelines, or writing tests.
moai-icons-vector
Vector icon libraries ecosystem guide covering 10+ major libraries with 200K+ icons, including React Icons (35K+), Lucide (1000+), Tabler Icons (5900+), Iconify (200K+), Heroicons, Phosphor, and Radix Icons with implementation patterns, decision trees, and best practices.
moai-foundation-trust
Complete TRUST 4 principles guide covering Test First, Readable, Unified, Secured. Validation methods, enterprise quality gates, metrics, and November 2025 standards. Enterprise v4.0 with 50+ software quality standards references.
moai-foundation-memory
Persistent memory across sessions using MCP Memory Server for user preferences, project context, and learned patterns
moai-foundation-core
MoAI-ADK's foundational principles - TRUST 5, SPEC-First TDD, delegation patterns, token optimization, progressive disclosure, modular architecture, agent catalog, command reference, and execution rules for building AI-powered development workflows