gemini-api

Google Gemini API integration for building AI-powered applications. Use when working with Google's Gemini API, Python SDK (google-genai), TypeScript SDK (@google/genai), multimodal inputs (image, video, audio, PDF), thinking/reasoning features, streaming responses, structured outputs with JSON schemas, multi-turn chat, system instructions, image generation (Nano Banana), video generation (Veo), music generation (Lyria), embeddings, document/PDF processing, or any Gemini API integration task. Triggers on mentions of Gemini, Gemini 3, Gemini 2.5, Google AI, Nano Banana, Veo, Lyria, google-genai, or @google/genai SDK usage.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

gemini-api is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using gemini-api should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/gemini-api/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/content-media/gemini-api/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/gemini-api/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How gemini-api Compares

Feature / Agent	gemini-api	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Gemini API

Generate text from text, images, video, and audio using Google's Gemini API.

## Models

| Model | Code | I/O | Context | Thinking |
|-------|------|-----|---------|----------|
| **Gemini 3 Pro** | `gemini-3-pro-preview` | Text/Image/Video/Audio/PDF -> Text | 1M/64K | Yes |
| **Gemini 3 Flash** | `gemini-3-flash-preview` | Text/Image/Video/Audio/PDF -> Text | 1M/64K | Yes |
| **Gemini 2.5 Pro** | `gemini-2.5-pro` | Text/Image/Video/Audio/PDF -> Text | 1M/65K | Yes |
| **Gemini 2.5 Flash** | `gemini-2.5-flash` | Text/Image/Video/Audio -> Text | 1M/65K | Yes |
| **Nano Banana** | `gemini-2.5-flash-image` | Text/Image -> Image | - | No |
| **Nano Banana Pro** | `gemini-3-pro-image-preview` | Text/Image -> Image (up to 4K) | 65K/32K | Yes |
| **Veo 3.1** | `veo-3.1-generate-preview` | Text/Image/Video -> Video+Audio | - | - |
| **Veo 3** | `veo-3-generate-preview` | Text/Image -> Video+Audio | - | - |
| **Veo 2** | `veo-2.0-generate-001` | Text/Image -> Video (silent) | - | - |
| **Lyria RealTime** | `lyria-realtime-exp` | Text -> Music (streaming) | - | - |
| **Embeddings** | `gemini-embedding-001` | Text -> Embeddings | 2K | No |

**Free Tier**: Flash models only (no free tier for `gemini-3-pro-preview` in API). **Default Temperature**: 1.0 (do not change for Gemini 3).

**Pricing (per 1M tokens)**:
- Gemini 3 Pro: $2/$12 (<200k), $4/$18 (>200k)
- Gemini 3 Flash: $0.50/$3
- Nano Banana Pro: $2 (text) / $0.134 (image)

---

## Basic Text Generation

### Python
```python
from google import genai

client = genai.Client()
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="How does AI work?"
)
print(response.text)
```

### JavaScript
```javascript
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({});
const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: "How does AI work?",
});
console.log(response.text);
```

### REST
```bash
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{"contents": [{"parts": [{"text": "How does AI work?"}]}]}'
```

---

## System Instructions

```python
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    config=types.GenerateContentConfig(
        system_instruction="You are a helpful assistant."
    ),
    contents="Hello"
)
```

```javascript
const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: "Hello",
  config: { systemInstruction: "You are a helpful assistant." },
});
```

---

## Streaming

```python
for chunk in client.models.generate_content_stream(
    model="gemini-3-flash-preview",
    contents="Tell me a story"
):
    print(chunk.text, end="")
```

```javascript
const response = await ai.models.generateContentStream({
  model: "gemini-3-flash-preview",
  contents: "Tell me a story",
});
for await (const chunk of response) {
  console.log(chunk.text);
}
```

---

## Multi-turn Chat

```python
chat = client.chats.create(model="gemini-3-flash-preview")
response = chat.send_message("I have 2 dogs.")
print(response.text)
response = chat.send_message("How many paws total?")
print(response.text)
```

```javascript
const chat = ai.chats.create({ model: "gemini-3-flash-preview" });
const response = await chat.sendMessage({ message: "I have 2 dogs." });
console.log(response.text);
```

---

## Multimodal (Image)

```python
from PIL import Image

image = Image.open("/path/to/image.png")
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents=[image, "Describe this image"]
)
```

```javascript
const image = await ai.files.upload({ file: "/path/to/image.png" });
const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: [
    createUserContent([
      "Describe this image",
      createPartFromUri(image.uri, image.mimeType),
    ]),
  ],
});
```

---

## Document Processing (PDF)

Process PDFs with native vision understanding (up to 1000 pages).

```python
from google.genai import types
import pathlib

filepath = pathlib.Path('document.pdf')
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents=[
        types.Part.from_bytes(data=filepath.read_bytes(), mime_type='application/pdf'),
        "Summarize this document"
    ]
)
```

```javascript
import * as fs from 'fs';

const response = await ai.models.generateContent({
    model: "gemini-3-flash-preview",
    contents: [
        { text: "Summarize this document" },
        {
            inlineData: {
                mimeType: 'application/pdf',
                data: Buffer.from(fs.readFileSync("document.pdf")).toString("base64")
            }
        }
    ]
});
```

**For large PDFs**, use Files API (stored 48 hours):

```python
uploaded_file = client.files.upload(file=pathlib.Path('large.pdf'))
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents=[uploaded_file, "Summarize this document"]
)
```

See [references/documents.md](references/documents.md) for Files API, multiple PDFs, and best practices.

---

## Image Generation (Nano Banana)

Generate and edit images conversationally.

```python
response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents="Create a picture of a sunset over mountains",
)

for part in response.parts:
    if part.inline_data is not None:
        part.as_image().save("generated.png")
```

```javascript
const response = await ai.models.generateContent({
  model: "gemini-2.5-flash-image",
  contents: "Create a picture of a sunset over mountains",
});

for (const part of response.candidates[0].content.parts) {
  if (part.inlineData) {
    const buffer = Buffer.from(part.inlineData.data, "base64");
    fs.writeFileSync("generated.png", buffer);
  }
}
```

**Nano Banana Pro** (`gemini-3-pro-image-preview`): 4K output, Google Search grounding, up to 14 reference images, conversational editing with thought signatures.

See [references/image-generation.md](references/image-generation.md) for editing, multi-turn, and advanced features.
See [references/gemini-3.md](references/gemini-3.md#nano-banana-pro-image-generation) for Gemini 3 image capabilities.

---

## Video Generation (Veo)

Generate 8-second 720p, 1080p, or 4K videos with native audio using Veo.

```python
import time
from google import genai

client = genai.Client()

operation = client.models.generate_videos(
    model="veo-3.1-generate-preview",
    prompt="A cinematic shot of a majestic lion in the savannah at golden hour",
)

# Poll until complete (video generation is async)
while not operation.done:
    time.sleep(10)
    operation = client.operations.get(operation)

# Download the video
video = operation.response.generated_videos[0]
client.files.download(file=video.video)
video.video.save("lion.mp4")
```

```javascript
let operation = await ai.models.generateVideos({
    model: "veo-3.1-generate-preview",
    prompt: "A cinematic shot of a majestic lion in the savannah at golden hour",
});

while (!operation.done) {
    await new Promise(resolve => setTimeout(resolve, 10000));
    operation = await ai.operations.getVideosOperation({ operation });
}

ai.files.download({
    file: operation.response.generatedVideos[0].video,
    downloadPath: "lion.mp4",
});
```

**Veo 3.1 features**: Portrait (9:16), video extension (up to 148s), 4K resolution, native audio with dialogue/SFX.

See [references/veo.md](references/veo.md) for image-to-video, reference images, video extension, and prompting guide.

---

## Music Generation (Lyria RealTime)

Generate continuous instrumental music in real-time with dynamic steering.

```python
import asyncio
from google import genai
from google.genai import types

client = genai.Client()

async def main():
    async with client.aio.live.music.connect(model='models/lyria-realtime-exp') as session:
        # Set prompts and config
        await session.set_weighted_prompts(
            prompts=[types.WeightedPrompt(text='minimal techno', weight=1.0)]
        )
        await session.set_music_generation_config(
            config=types.LiveMusicGenerationConfig(bpm=90, temperature=1.0)
        )

        # Start streaming
        await session.play()

        # Receive audio chunks
        async for message in session.receive():
            if message.server_content and message.server_content.audio_chunks:
                audio_data = message.server_content.audio_chunks[0].data
                # Process audio...

asyncio.run(main())
```

```javascript
const session = await ai.live.music.connect({
    model: "models/lyria-realtime-exp",
    callbacks: {
        onmessage: (message) => {
            if (message.serverContent?.audioChunks) {
                for (const chunk of message.serverContent.audioChunks) {
                    const audioBuffer = Buffer.from(chunk.data, "base64");
                    // Process audio...
                }
            }
        },
    },
});

await session.setWeightedPrompts({
    weightedPrompts: [{ text: "minimal techno", weight: 1.0 }],
});

await session.setMusicGenerationConfig({
    musicGenerationConfig: { bpm: 90, temperature: 1.0 },
});

await session.play();
```

**Output**: 48kHz stereo 16-bit PCM. **Instrumental only**. Configurable BPM, scale, density, brightness.

See [references/lyria.md](references/lyria.md) for steering music, configuration, and prompting guide.

---

## Embeddings

Generate text embeddings for semantic similarity, search, and classification.

```python
result = client.models.embed_content(
    model="gemini-embedding-001",
    contents="What is the meaning of life?"
)
print(result.embeddings)
```

```javascript
const response = await ai.models.embedContent({
    model: 'gemini-embedding-001',
    contents: 'What is the meaning of life?',
});
console.log(response.embeddings);
```

**Task types**: `SEMANTIC_SIMILARITY`, `CLASSIFICATION`, `CLUSTERING`, `RETRIEVAL_DOCUMENT`, `RETRIEVAL_QUERY`

**Output dimensions**: 768, 1536, 3072 (default)

See [references/embeddings.md](references/embeddings.md) for batch processing, task types, and normalization.

---

## Thinking (Gemini 3)

Control reasoning depth with `thinking_level`: `minimal` (Flash only), `low`, `medium` (Flash only), `high` (default).

```python
from google.genai import types

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Solve this math problem...",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_level="high")
    ),
)
```

```javascript
import { ThinkingLevel } from "@google/genai";

const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: "Solve this math problem...",
  config: { thinkingConfig: { thinkingLevel: ThinkingLevel.HIGH } },
});
```

**Note**: Cannot mix `thinking_level` with legacy `thinking_budget` (returns 400 error).

For Gemini 2.5, use `thinking_budget` (0-32768) instead. See [references/thinking.md](references/thinking.md).

For complete Gemini 3 features (thought signatures, media resolution, etc.), see [references/gemini-3.md](references/gemini-3.md).

---

## Structured Outputs

Generate JSON responses adhering to a schema.

```python
from pydantic import BaseModel
from typing import List

class Recipe(BaseModel):
    name: str
    ingredients: List[str]

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Extract: chocolate chip cookies need flour, sugar, chips",
    config={
        "response_mime_type": "application/json",
        "response_json_schema": Recipe.model_json_schema(),
    },
)
recipe = Recipe.model_validate_json(response.text)
```

```javascript
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";

const recipeSchema = z.object({
  name: z.string(),
  ingredients: z.array(z.string()),
});

const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: "Extract: chocolate chip cookies need flour, sugar, chips",
  config: {
    responseMimeType: "application/json",
    responseJsonSchema: zodToJsonSchema(recipeSchema),
  },
});
```

See [references/structured-outputs.md](references/structured-outputs.md) for advanced patterns.

---

## Built-in Tools (Gemini 3)

**Available**: Google Search, File Search, Code Execution, URL Context, Function Calling

**Not supported**: Google Maps grounding, Computer Use (use Gemini 2.5 for these)

```python
response = client.models.generate_content(
    model="gemini-3-pro-preview",
    contents="What's the latest news on AI?",
    config={"tools": [{"google_search": {}}]},
)
```

```javascript
const response = await ai.models.generateContent({
  model: "gemini-3-pro-preview",
  contents: "What's the latest news on AI?",
  config: { tools: [{ googleSearch: {} }] },
});
```

**Structured outputs + tools**: Gemini 3 supports combining JSON schemas with built-in tools (Google Search, URL Context, Code Execution). See [references/gemini-3.md](references/gemini-3.md#structured-outputs-with-built-in-tools).

See [references/tools.md](references/tools.md) for all tool patterns.

---

## Function Calling

Connect models to external tools and APIs. The model determines when to call functions and provides parameters.

```python
from google.genai import types

# Define function
get_weather = {
    "name": "get_weather",
    "description": "Get weather for a location",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City name"},
        },
        "required": ["location"],
    },
}

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="What's the weather in Tokyo?",
    config=types.GenerateContentConfig(
        tools=[types.Tool(function_declarations=[get_weather])]
    ),
)

# Check for function call
if response.function_calls:
    fc = response.function_calls[0]
    print(f"Call {fc.name} with {fc.args}")
```

```javascript
const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: "What's the weather in Tokyo?",
  config: {
    tools: [{ functionDeclarations: [getWeather] }],
  },
});

if (response.functionCalls) {
  const { name, args } = response.functionCalls[0];
  // Execute function and send result back
}
```

**Automatic function calling (Python)**: Pass functions directly as tools for automatic execution.

See [references/function-calling.md](references/function-calling.md) for execution modes, compositional calling, multimodal responses, MCP integration, and best practices.

---

## Quick Reference

| Feature | Python | JavaScript |
|---------|--------|------------|
| Generate | `generate_content()` | `generateContent()` |
| Stream | `generate_content_stream()` | `generateContentStream()` |
| Chat | `chats.create()` | `chats.create()` |
| Structured | `response_json_schema=` | `responseJsonSchema:` |
| Image Gen | `gemini-2.5-flash-image` | `gemini-2.5-flash-image` |
| Video Gen | `generate_videos()` | `generateVideos()` |
| Music Gen | `live.music.connect()` | `live.music.connect()` |
| Function Call | `function_declarations` | `functionDeclarations` |
| Embeddings | `embed_content()` | `embedContent()` |
| Files API | `files.upload()` | `files.upload()` |

---

## Gemini 3 Specific Features

For advanced Gemini 3 features, see [references/gemini-3.md](references/gemini-3.md):

- **Thinking levels**: Control reasoning depth (`minimal`, `low`, `medium`, `high`)
- **Media resolution**: Fine-grained multimodal processing (`media_resolution_low` to `ultra_high`)
- **Thought signatures**: Required for function calling and image editing context
- **Structured outputs + tools**: Combine JSON schemas with Google Search, URL Context
- **Multimodal function responses**: Return images in tool responses

---

## Resources

- [Gemini 3 Guide](https://ai.google.dev/gemini-api/docs/gemini-3)
- [Models Overview](https://ai.google.dev/gemini-api/docs/models)
- [Thinking Guide](https://ai.google.dev/gemini-api/docs/thinking)
- [Thought Signatures](https://ai.google.dev/gemini-api/docs/thought-signatures)
- [Structured Outputs](https://ai.google.dev/gemini-api/docs/structured-output)
- [Image Generation](https://ai.google.dev/gemini-api/docs/image-generation)
- [Video Generation (Veo)](https://ai.google.dev/gemini-api/docs/video)
- [Music Generation (Lyria)](https://ai.google.dev/gemini-api/docs/music-generation)
- [Function Calling](https://ai.google.dev/gemini-api/docs/function-calling)
- [Document Processing](https://ai.google.dev/gemini-api/docs/document-processing)
- [Embeddings](https://ai.google.dev/gemini-api/docs/embeddings)
- [Google AI Studio](https://aistudio.google.com)
- [Gemini 3 Pro in AI Studio](https://aistudio.google.com?model=gemini-3-pro-preview)
- [Gemini 3 Flash in AI Studio](https://aistudio.google.com?model=gemini-3-flash-preview)
- [Nano Banana Pro in AI Studio](https://aistudio.google.com?model=gemini-3-pro-image-preview)
- [Veo Studio](https://aistudio.google.com/apps/bundled/veo_studio)

Related Skills

imagegen-gemini

from diegosouzapw/awesome-omni-skill

Generate/edit images via Gemini API (Nano Banana). Triggers: generate image, create picture, AI art, edit image, make illustration.

gemini-image-generator

from diegosouzapw/awesome-omni-skill

Generate and edit images using Google Gemini. Use when the user asks to generate, create, edit, or modify images.

gemini-api-dev

from diegosouzapw/awesome-omni-skill

Use this skill when building applications with Gemini models, Gemini API, working with multimodal content (text, images, audio, video), implementing function calling, using structured outputs, or n...

ask-gemini

from diegosouzapw/awesome-omni-skill

This skill should be used when the user asks to "ask Gemini", "get Gemini's opinion", "have Gemini review", "improve writing style", "make less AI-sounding", "get feedback on article", "review this draft", or needs a second opinion on content, writing, code, or design. Supports text questions and up to 10 images.

gemini-system-prompt-best-practices

from diegosouzapw/awesome-omni-skill

Applies official Google best practices when writing or editing Gemini system prompts (systemInstruction). Use when creating or changing system prompts for Gemini (e.g. transcription, Dictate Prompt, Prompt & Read), when reviewing prompt text in AppConstants or SpeechService, or when the user asks about Gemini prompt design.

gemini-svg-creator

from diegosouzapw/awesome-omni-skill

Create professional SVG graphics powered by Gemini 3.1 Pro via the Gemini MCP server. Generates logos, icons, illustrations, infographics, patterns, animated SVGs, and UI elements with a dual-model refinement loop (Claude orchestrates + Gemini generates). Gemini 3.1 Pro has SOTA animated SVG capabilities and advanced reasoning. Use this skill when the user asks to: create an SVG, design a logo, make an icon, draw an illustration, create an infographic, design a pattern, make an animated SVG, generate vector graphics, create SVG art, or any request involving SVG creation or generation. Also triggers on: 'generate SVG', 'draw me', 'design graphic', 'create vector', 'SVG illustration', 'SVG icon', 'SVG animation', 'create badge', 'design emblem', 'make a diagram'.

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

mcp-create-declarative-agent

from diegosouzapw/awesome-omni-skill

Skill converted from mcp-create-declarative-agent.prompt.md

MCP Architecture Expert

from diegosouzapw/awesome-omni-skill

Design and implement Model Context Protocol servers for standardized AI-to-data integration with resources, tools, prompts, and security best practices

mathem-shopping

from diegosouzapw/awesome-omni-skill

Automatiserar att logga in på Mathem.se, söka och lägga till varor från en lista eller recept, hantera ersättningar enligt policy och reservera leveranstid, men lämnar varukorgen redo för manuell checkout.

math-modeling

from diegosouzapw/awesome-omni-skill

本技能应在用户要求"数学建模"、"建模比赛"、"数模论文"、"数学建模竞赛"、"建模分析"、"建模求解"或提及数学建模相关任务时使用。适用于全国大学生数学建模竞赛(CUMCM)、美国大学生数学建模竞赛(MCM/ICM)等各类数学建模比赛。

matchms

from diegosouzapw/awesome-omni-skill

Mass spectrometry analysis. Process mzML/MGF/MSP, spectral similarity (cosine, modified cosine), metadata harmonization, compound ID, for metabolomics and MS data processing.