gemini-api
Google Gemini API integration for building AI-powered applications. Use when working with Google's Gemini API, Python SDK (google-genai), TypeScript SDK (@google/genai), multimodal inputs (image, video, audio, PDF), thinking/reasoning features, streaming responses, structured outputs with JSON schemas, multi-turn chat, system instructions, image generation (Nano Banana), video generation (Veo), music generation (Lyria), embeddings, document/PDF processing, or any Gemini API integration task. Triggers on mentions of Gemini, Gemini 3, Gemini 2.5, Google AI, Nano Banana, Veo, Lyria, google-genai, or @google/genai SDK usage.
Best use case
gemini-api is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Google Gemini API integration for building AI-powered applications. Use when working with Google's Gemini API, Python SDK (google-genai), TypeScript SDK (@google/genai), multimodal inputs (image, video, audio, PDF), thinking/reasoning features, streaming responses, structured outputs with JSON schemas, multi-turn chat, system instructions, image generation (Nano Banana), video generation (Veo), music generation (Lyria), embeddings, document/PDF processing, or any Gemini API integration task. Triggers on mentions of Gemini, Gemini 3, Gemini 2.5, Google AI, Nano Banana, Veo, Lyria, google-genai, or @google/genai SDK usage.
Teams using gemini-api should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/gemini-api/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How gemini-api Compares
| Feature / Agent | gemini-api | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Google Gemini API integration for building AI-powered applications. Use when working with Google's Gemini API, Python SDK (google-genai), TypeScript SDK (@google/genai), multimodal inputs (image, video, audio, PDF), thinking/reasoning features, streaming responses, structured outputs with JSON schemas, multi-turn chat, system instructions, image generation (Nano Banana), video generation (Veo), music generation (Lyria), embeddings, document/PDF processing, or any Gemini API integration task. Triggers on mentions of Gemini, Gemini 3, Gemini 2.5, Google AI, Nano Banana, Veo, Lyria, google-genai, or @google/genai SDK usage.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Gemini API
Generate text from text, images, video, and audio using Google's Gemini API.
## Models
| Model | Code | I/O | Context | Thinking |
|-------|------|-----|---------|----------|
| **Gemini 3 Pro** | `gemini-3-pro-preview` | Text/Image/Video/Audio/PDF -> Text | 1M/64K | Yes |
| **Gemini 3 Flash** | `gemini-3-flash-preview` | Text/Image/Video/Audio/PDF -> Text | 1M/64K | Yes |
| **Gemini 2.5 Pro** | `gemini-2.5-pro` | Text/Image/Video/Audio/PDF -> Text | 1M/65K | Yes |
| **Gemini 2.5 Flash** | `gemini-2.5-flash` | Text/Image/Video/Audio -> Text | 1M/65K | Yes |
| **Nano Banana** | `gemini-2.5-flash-image` | Text/Image -> Image | - | No |
| **Nano Banana Pro** | `gemini-3-pro-image-preview` | Text/Image -> Image (up to 4K) | 65K/32K | Yes |
| **Veo 3.1** | `veo-3.1-generate-preview` | Text/Image/Video -> Video+Audio | - | - |
| **Veo 3** | `veo-3-generate-preview` | Text/Image -> Video+Audio | - | - |
| **Veo 2** | `veo-2.0-generate-001` | Text/Image -> Video (silent) | - | - |
| **Lyria RealTime** | `lyria-realtime-exp` | Text -> Music (streaming) | - | - |
| **Embeddings** | `gemini-embedding-001` | Text -> Embeddings | 2K | No |
**Free Tier**: Flash models only (no free tier for `gemini-3-pro-preview` in API). **Default Temperature**: 1.0 (do not change for Gemini 3).
**Pricing (per 1M tokens)**:
- Gemini 3 Pro: $2/$12 (<200k), $4/$18 (>200k)
- Gemini 3 Flash: $0.50/$3
- Nano Banana Pro: $2 (text) / $0.134 (image)
---
## Basic Text Generation
### Python
```python
from google import genai
client = genai.Client()
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="How does AI work?"
)
print(response.text)
```
### JavaScript
```javascript
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({});
const response = await ai.models.generateContent({
model: "gemini-3-flash-preview",
contents: "How does AI work?",
});
console.log(response.text);
```
### REST
```bash
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-d '{"contents": [{"parts": [{"text": "How does AI work?"}]}]}'
```
---
## System Instructions
```python
response = client.models.generate_content(
model="gemini-3-flash-preview",
config=types.GenerateContentConfig(
system_instruction="You are a helpful assistant."
),
contents="Hello"
)
```
```javascript
const response = await ai.models.generateContent({
model: "gemini-3-flash-preview",
contents: "Hello",
config: { systemInstruction: "You are a helpful assistant." },
});
```
---
## Streaming
```python
for chunk in client.models.generate_content_stream(
model="gemini-3-flash-preview",
contents="Tell me a story"
):
print(chunk.text, end="")
```
```javascript
const response = await ai.models.generateContentStream({
model: "gemini-3-flash-preview",
contents: "Tell me a story",
});
for await (const chunk of response) {
console.log(chunk.text);
}
```
---
## Multi-turn Chat
```python
chat = client.chats.create(model="gemini-3-flash-preview")
response = chat.send_message("I have 2 dogs.")
print(response.text)
response = chat.send_message("How many paws total?")
print(response.text)
```
```javascript
const chat = ai.chats.create({ model: "gemini-3-flash-preview" });
const response = await chat.sendMessage({ message: "I have 2 dogs." });
console.log(response.text);
```
---
## Multimodal (Image)
```python
from PIL import Image
image = Image.open("/path/to/image.png")
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[image, "Describe this image"]
)
```
```javascript
const image = await ai.files.upload({ file: "/path/to/image.png" });
const response = await ai.models.generateContent({
model: "gemini-3-flash-preview",
contents: [
createUserContent([
"Describe this image",
createPartFromUri(image.uri, image.mimeType),
]),
],
});
```
---
## Document Processing (PDF)
Process PDFs with native vision understanding (up to 1000 pages).
```python
from google.genai import types
import pathlib
filepath = pathlib.Path('document.pdf')
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[
types.Part.from_bytes(data=filepath.read_bytes(), mime_type='application/pdf'),
"Summarize this document"
]
)
```
```javascript
import * as fs from 'fs';
const response = await ai.models.generateContent({
model: "gemini-3-flash-preview",
contents: [
{ text: "Summarize this document" },
{
inlineData: {
mimeType: 'application/pdf',
data: Buffer.from(fs.readFileSync("document.pdf")).toString("base64")
}
}
]
});
```
**For large PDFs**, use Files API (stored 48 hours):
```python
uploaded_file = client.files.upload(file=pathlib.Path('large.pdf'))
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[uploaded_file, "Summarize this document"]
)
```
See [references/documents.md](references/documents.md) for Files API, multiple PDFs, and best practices.
---
## Image Generation (Nano Banana)
Generate and edit images conversationally.
```python
response = client.models.generate_content(
model="gemini-2.5-flash-image",
contents="Create a picture of a sunset over mountains",
)
for part in response.parts:
if part.inline_data is not None:
part.as_image().save("generated.png")
```
```javascript
const response = await ai.models.generateContent({
model: "gemini-2.5-flash-image",
contents: "Create a picture of a sunset over mountains",
});
for (const part of response.candidates[0].content.parts) {
if (part.inlineData) {
const buffer = Buffer.from(part.inlineData.data, "base64");
fs.writeFileSync("generated.png", buffer);
}
}
```
**Nano Banana Pro** (`gemini-3-pro-image-preview`): 4K output, Google Search grounding, up to 14 reference images, conversational editing with thought signatures.
See [references/image-generation.md](references/image-generation.md) for editing, multi-turn, and advanced features.
See [references/gemini-3.md](references/gemini-3.md#nano-banana-pro-image-generation) for Gemini 3 image capabilities.
---
## Video Generation (Veo)
Generate 8-second 720p, 1080p, or 4K videos with native audio using Veo.
```python
import time
from google import genai
client = genai.Client()
operation = client.models.generate_videos(
model="veo-3.1-generate-preview",
prompt="A cinematic shot of a majestic lion in the savannah at golden hour",
)
# Poll until complete (video generation is async)
while not operation.done:
time.sleep(10)
operation = client.operations.get(operation)
# Download the video
video = operation.response.generated_videos[0]
client.files.download(file=video.video)
video.video.save("lion.mp4")
```
```javascript
let operation = await ai.models.generateVideos({
model: "veo-3.1-generate-preview",
prompt: "A cinematic shot of a majestic lion in the savannah at golden hour",
});
while (!operation.done) {
await new Promise(resolve => setTimeout(resolve, 10000));
operation = await ai.operations.getVideosOperation({ operation });
}
ai.files.download({
file: operation.response.generatedVideos[0].video,
downloadPath: "lion.mp4",
});
```
**Veo 3.1 features**: Portrait (9:16), video extension (up to 148s), 4K resolution, native audio with dialogue/SFX.
See [references/veo.md](references/veo.md) for image-to-video, reference images, video extension, and prompting guide.
---
## Music Generation (Lyria RealTime)
Generate continuous instrumental music in real-time with dynamic steering.
```python
import asyncio
from google import genai
from google.genai import types
client = genai.Client()
async def main():
async with client.aio.live.music.connect(model='models/lyria-realtime-exp') as session:
# Set prompts and config
await session.set_weighted_prompts(
prompts=[types.WeightedPrompt(text='minimal techno', weight=1.0)]
)
await session.set_music_generation_config(
config=types.LiveMusicGenerationConfig(bpm=90, temperature=1.0)
)
# Start streaming
await session.play()
# Receive audio chunks
async for message in session.receive():
if message.server_content and message.server_content.audio_chunks:
audio_data = message.server_content.audio_chunks[0].data
# Process audio...
asyncio.run(main())
```
```javascript
const session = await ai.live.music.connect({
model: "models/lyria-realtime-exp",
callbacks: {
onmessage: (message) => {
if (message.serverContent?.audioChunks) {
for (const chunk of message.serverContent.audioChunks) {
const audioBuffer = Buffer.from(chunk.data, "base64");
// Process audio...
}
}
},
},
});
await session.setWeightedPrompts({
weightedPrompts: [{ text: "minimal techno", weight: 1.0 }],
});
await session.setMusicGenerationConfig({
musicGenerationConfig: { bpm: 90, temperature: 1.0 },
});
await session.play();
```
**Output**: 48kHz stereo 16-bit PCM. **Instrumental only**. Configurable BPM, scale, density, brightness.
See [references/lyria.md](references/lyria.md) for steering music, configuration, and prompting guide.
---
## Embeddings
Generate text embeddings for semantic similarity, search, and classification.
```python
result = client.models.embed_content(
model="gemini-embedding-001",
contents="What is the meaning of life?"
)
print(result.embeddings)
```
```javascript
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: 'What is the meaning of life?',
});
console.log(response.embeddings);
```
**Task types**: `SEMANTIC_SIMILARITY`, `CLASSIFICATION`, `CLUSTERING`, `RETRIEVAL_DOCUMENT`, `RETRIEVAL_QUERY`
**Output dimensions**: 768, 1536, 3072 (default)
See [references/embeddings.md](references/embeddings.md) for batch processing, task types, and normalization.
---
## Thinking (Gemini 3)
Control reasoning depth with `thinking_level`: `minimal` (Flash only), `low`, `medium` (Flash only), `high` (default).
```python
from google.genai import types
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="Solve this math problem...",
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(thinking_level="high")
),
)
```
```javascript
import { ThinkingLevel } from "@google/genai";
const response = await ai.models.generateContent({
model: "gemini-3-flash-preview",
contents: "Solve this math problem...",
config: { thinkingConfig: { thinkingLevel: ThinkingLevel.HIGH } },
});
```
**Note**: Cannot mix `thinking_level` with legacy `thinking_budget` (returns 400 error).
For Gemini 2.5, use `thinking_budget` (0-32768) instead. See [references/thinking.md](references/thinking.md).
For complete Gemini 3 features (thought signatures, media resolution, etc.), see [references/gemini-3.md](references/gemini-3.md).
---
## Structured Outputs
Generate JSON responses adhering to a schema.
```python
from pydantic import BaseModel
from typing import List
class Recipe(BaseModel):
name: str
ingredients: List[str]
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="Extract: chocolate chip cookies need flour, sugar, chips",
config={
"response_mime_type": "application/json",
"response_json_schema": Recipe.model_json_schema(),
},
)
recipe = Recipe.model_validate_json(response.text)
```
```javascript
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
const recipeSchema = z.object({
name: z.string(),
ingredients: z.array(z.string()),
});
const response = await ai.models.generateContent({
model: "gemini-3-flash-preview",
contents: "Extract: chocolate chip cookies need flour, sugar, chips",
config: {
responseMimeType: "application/json",
responseJsonSchema: zodToJsonSchema(recipeSchema),
},
});
```
See [references/structured-outputs.md](references/structured-outputs.md) for advanced patterns.
---
## Built-in Tools (Gemini 3)
**Available**: Google Search, File Search, Code Execution, URL Context, Function Calling
**Not supported**: Google Maps grounding, Computer Use (use Gemini 2.5 for these)
```python
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents="What's the latest news on AI?",
config={"tools": [{"google_search": {}}]},
)
```
```javascript
const response = await ai.models.generateContent({
model: "gemini-3-pro-preview",
contents: "What's the latest news on AI?",
config: { tools: [{ googleSearch: {} }] },
});
```
**Structured outputs + tools**: Gemini 3 supports combining JSON schemas with built-in tools (Google Search, URL Context, Code Execution). See [references/gemini-3.md](references/gemini-3.md#structured-outputs-with-built-in-tools).
See [references/tools.md](references/tools.md) for all tool patterns.
---
## Function Calling
Connect models to external tools and APIs. The model determines when to call functions and provides parameters.
```python
from google.genai import types
# Define function
get_weather = {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
},
"required": ["location"],
},
}
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="What's the weather in Tokyo?",
config=types.GenerateContentConfig(
tools=[types.Tool(function_declarations=[get_weather])]
),
)
# Check for function call
if response.function_calls:
fc = response.function_calls[0]
print(f"Call {fc.name} with {fc.args}")
```
```javascript
const response = await ai.models.generateContent({
model: "gemini-3-flash-preview",
contents: "What's the weather in Tokyo?",
config: {
tools: [{ functionDeclarations: [getWeather] }],
},
});
if (response.functionCalls) {
const { name, args } = response.functionCalls[0];
// Execute function and send result back
}
```
**Automatic function calling (Python)**: Pass functions directly as tools for automatic execution.
See [references/function-calling.md](references/function-calling.md) for execution modes, compositional calling, multimodal responses, MCP integration, and best practices.
---
## Quick Reference
| Feature | Python | JavaScript |
|---------|--------|------------|
| Generate | `generate_content()` | `generateContent()` |
| Stream | `generate_content_stream()` | `generateContentStream()` |
| Chat | `chats.create()` | `chats.create()` |
| Structured | `response_json_schema=` | `responseJsonSchema:` |
| Image Gen | `gemini-2.5-flash-image` | `gemini-2.5-flash-image` |
| Video Gen | `generate_videos()` | `generateVideos()` |
| Music Gen | `live.music.connect()` | `live.music.connect()` |
| Function Call | `function_declarations` | `functionDeclarations` |
| Embeddings | `embed_content()` | `embedContent()` |
| Files API | `files.upload()` | `files.upload()` |
---
## Gemini 3 Specific Features
For advanced Gemini 3 features, see [references/gemini-3.md](references/gemini-3.md):
- **Thinking levels**: Control reasoning depth (`minimal`, `low`, `medium`, `high`)
- **Media resolution**: Fine-grained multimodal processing (`media_resolution_low` to `ultra_high`)
- **Thought signatures**: Required for function calling and image editing context
- **Structured outputs + tools**: Combine JSON schemas with Google Search, URL Context
- **Multimodal function responses**: Return images in tool responses
---
## Resources
- [Gemini 3 Guide](https://ai.google.dev/gemini-api/docs/gemini-3)
- [Models Overview](https://ai.google.dev/gemini-api/docs/models)
- [Thinking Guide](https://ai.google.dev/gemini-api/docs/thinking)
- [Thought Signatures](https://ai.google.dev/gemini-api/docs/thought-signatures)
- [Structured Outputs](https://ai.google.dev/gemini-api/docs/structured-output)
- [Image Generation](https://ai.google.dev/gemini-api/docs/image-generation)
- [Video Generation (Veo)](https://ai.google.dev/gemini-api/docs/video)
- [Music Generation (Lyria)](https://ai.google.dev/gemini-api/docs/music-generation)
- [Function Calling](https://ai.google.dev/gemini-api/docs/function-calling)
- [Document Processing](https://ai.google.dev/gemini-api/docs/document-processing)
- [Embeddings](https://ai.google.dev/gemini-api/docs/embeddings)
- [Google AI Studio](https://aistudio.google.com)
- [Gemini 3 Pro in AI Studio](https://aistudio.google.com?model=gemini-3-pro-preview)
- [Gemini 3 Flash in AI Studio](https://aistudio.google.com?model=gemini-3-flash-preview)
- [Nano Banana Pro in AI Studio](https://aistudio.google.com?model=gemini-3-pro-image-preview)
- [Veo Studio](https://aistudio.google.com/apps/bundled/veo_studio)Related Skills
imagegen-gemini
Generate/edit images via Gemini API (Nano Banana). Triggers: generate image, create picture, AI art, edit image, make illustration.
gemini-image-generator
Generate and edit images using Google Gemini. Use when the user asks to generate, create, edit, or modify images.
gemini-api-dev
Use this skill when building applications with Gemini models, Gemini API, working with multimodal content (text, images, audio, video), implementing function calling, using structured outputs, or n...
ask-gemini
This skill should be used when the user asks to "ask Gemini", "get Gemini's opinion", "have Gemini review", "improve writing style", "make less AI-sounding", "get feedback on article", "review this draft", or needs a second opinion on content, writing, code, or design. Supports text questions and up to 10 images.
gemini-system-prompt-best-practices
Applies official Google best practices when writing or editing Gemini system prompts (systemInstruction). Use when creating or changing system prompts for Gemini (e.g. transcription, Dictate Prompt, Prompt & Read), when reviewing prompt text in AppConstants or SpeechService, or when the user asks about Gemini prompt design.
gemini-svg-creator
Create professional SVG graphics powered by Gemini 3.1 Pro via the Gemini MCP server. Generates logos, icons, illustrations, infographics, patterns, animated SVGs, and UI elements with a dual-model refinement loop (Claude orchestrates + Gemini generates). Gemini 3.1 Pro has SOTA animated SVG capabilities and advanced reasoning. Use this skill when the user asks to: create an SVG, design a logo, make an icon, draw an illustration, create an infographic, design a pattern, make an animated SVG, generate vector graphics, create SVG art, or any request involving SVG creation or generation. Also triggers on: 'generate SVG', 'draw me', 'design graphic', 'create vector', 'SVG illustration', 'SVG icon', 'SVG animation', 'create badge', 'design emblem', 'make a diagram'.
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
mcp-create-declarative-agent
Skill converted from mcp-create-declarative-agent.prompt.md
MCP Architecture Expert
Design and implement Model Context Protocol servers for standardized AI-to-data integration with resources, tools, prompts, and security best practices
mathem-shopping
Automatiserar att logga in på Mathem.se, söka och lägga till varor från en lista eller recept, hantera ersättningar enligt policy och reservera leveranstid, men lämnar varukorgen redo för manuell checkout.
math-modeling
本技能应在用户要求"数学建模"、"建模比赛"、"数模论文"、"数学建模竞赛"、"建模分析"、"建模求解"或提及数学建模相关任务时使用。适用于全国大学生数学建模竞赛(CUMCM)、美国大学生数学建模竞赛(MCM/ICM)等各类数学建模比赛。
matchms
Mass spectrometry analysis. Process mzML/MGF/MSP, spectral similarity (cosine, modified cosine), metadata harmonization, compound ID, for metabolomics and MS data processing.