gemini-assistant
General-purpose AI assistant using Gemini API with voice and text support. Use when you need a smart AI assistant that can answer questions, have conversations, or help with general tasks using Google's Gemini models with audio/text capabilities.
Best use case
gemini-assistant is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
General-purpose AI assistant using Gemini API with voice and text support. Use when you need a smart AI assistant that can answer questions, have conversations, or help with general tasks using Google's Gemini models with audio/text capabilities.
Teams using gemini-assistant should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/gemini-assistant/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How gemini-assistant Compares
| Feature / Agent | gemini-assistant | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
General-purpose AI assistant using Gemini API with voice and text support. Use when you need a smart AI assistant that can answer questions, have conversations, or help with general tasks using Google's Gemini models with audio/text capabilities.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
Top AI Agents for Productivity
See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.
SKILL.md Source
# Gemini Assistant
A general-purpose AI assistant powered by Google's Gemini API. Supports both text and voice interactions.
## Usage
### Text Mode
```bash
cd ~/.openclaw/agents/kashif/skills/gemini-assistant && python3 handler.py "Your question or message"
```
### Voice Mode
```bash
cd ~/.openclaw/agents/kashif/skills/gemini-assistant && python3 handler.py --audio /path/to/audio.ogg "optional context"
```
## Response Format
The handler returns a JSON response:
```json
{
"message": "[[audio_as_voice]]\nMEDIA:/tmp/gemini_voice_xxx.ogg",
"text": "Text response from Gemini"
}
```
## Configuration
Set your Gemini API key:
```bash
export GEMINI_API_KEY="your-api-key-here"
```
Or create a `.env` file in the skill directory:
```
GEMINI_API_KEY=your-api-key-here
```
## Model Options
The default model is `gemini-2.5-flash-native-audio-preview-12-2025` for audio support.
To use a different model, edit `handler.py`:
```python
MODEL = "gemini-2.0-flash-exp" # For text-only
```
## Requirements
- `google-genai>=1.0.0`
- `numpy>=1.24.0`
- `soundfile>=0.12.0`
- `librosa>=0.10.0` (for audio input)
- FFmpeg (for audio conversion)
## Features
- 🎙️ Voice input/output support
- 💬 Text conversations
- 🔧 Configurable system instructions
- ⚡ Fast responses with Gemini FlashRelated Skills
Contract Review Assistant
Analyze business contracts for risks, unfavorable terms, and missing clauses. Get a plain-English summary of what you're signing.
AI Coding Toolkit — Master Every AI Coding Assistant
> The complete methodology for 10X productivity with AI-assisted development. Covers Cursor, Windsurf, Cline, Aider, Claude Code, GitHub Copilot, and more — tool-agnostic principles that work everywhere.
seo-assistant
A client-facing SEO assistant grounded in Google's official SEO Starter Guide. Use this skill whenever a user mentions SEO, search rankings, Google visibility, meta descriptions, title tags, page titles, alt text, sitemaps, duplicate content, URL structure, or asks how to improve their website's presence in search results. Also trigger when a user shares a URL or webpage content and wants feedback, or asks for help writing any web content that needs to perform well in search. This skill covers auditing, content writing, and answering SEO questions — use it proactively even if the user only hints at wanting more website traffic or better Google rankings.
anime-assistant
二次元创作全能助手 - 专注于动漫、插画、角色设计、漫画创作、动画制作等二次元内容创作
accounting-assistant
Buchhaltungs-Automatisierung mit EÜR-Erstellung, DATEV-Export, PDF-Beleganalyse und Steuer-Vorbereitung. Ideal für Freelancer und KMU.
rag-knowledge-assistant
基于向量数据库的 RAG(检索增强生成) 知识库助手。支持语义检索、多格式文档 (PDF/Word/Excel/Markdown) 处理、智能问答。使用 Chroma 向量库 + BGE-M3 Embedding 模型。适用于从 knowledge 目录快速检索信息、回答基于文档的问题。触发词:"从知识库查"、"检索文档"、"RAG 查询"、"向量搜索"、"语义检索"等。
icd10-cpt-coding-assistant
Automatically recommend ICD-10 diagnosis codes and CPT procedure codes from clinical notes. Trigger when: user provides clinical notes, patient encounter summaries, discharge summaries, or asks for medical coding assistance. Use for healthcare providers, medical coders, and billing professionals who need accurate code recommendations.
amber-voice-assistant
AI phone assistant and virtual receptionist for OpenClaw. Answers inbound phone calls, screens callers, makes outbound phone calls, and books appointments — all over Twilio + OpenAI Realtime voice. Full telephone workflow: phone call screening, live call transcripts, CRM contact memory, calendar integration. Ideal for anyone who wants an AI to answer their phone, handle call screening, or make phone calls autonomously. Includes interactive setup wizard, live call dashboard, and human-in-the-loop escalation. Also ships as a Claude Desktop MCP plugin — dial phone numbers, check call history, query CRM, and manage calendar directly from Claude Desktop.
feishu-voice-assistant
Sends voice messages (audio) to Feishu chats using Duby TTS.
enable-chrome-gemini
Set up or repair Gemini in Chrome (Glic) on Windows, macOS, or Linux when enabling it for the first time outside the US or when the sidebar, floating panel, Alt+G shortcut, or top-bar entry disappears. Back up and patch Chrome Local State, restore region/eligibility fields, and check the required Glic flags and Chrome language.
PDF OCR using Gemini LLM
Extract text from PDFs using Google Gemini OCR. Use when extracting text from PDFs, performing OCR on scanned documents, or processing image-based PDFs.
gemini-deep-research
Perform complex, long-running research tasks using Gemini Deep Research Agent. Use when asked to research topics requiring multi-source synthesis, competitive analysis, market research, or comprehensive technical investigations that benefit from systematic web search and analysis.