gemini-api-integration

Use when integrating Google Gemini API into projects. Covers model selection, multimodal inputs, streaming, function calling, and production best practices.

31,392 stars
Complexity: easy

About this skill

This skill empowers an AI agent to expertly guide users through the process of integrating the Google Gemini API into various applications. It provides comprehensive instructions and best practices for common programming environments like Node.js, Python, or browser projects. The skill covers the entire Gemini SDK lifecycle, from initial setup and model selection to advanced features such as handling multimodal inputs (text, image, audio, video), implementing streaming responses for improved user experience, leveraging function calling for tool integration, and adopting production-grade patterns for robust and scalable applications. It aims to enhance an agent's ability to assist developers in building sophisticated AI-powered features with Gemini.

Best use case

To enable AI agents to provide detailed guidance and code examples for integrating the Google Gemini API into software projects, covering its full range of capabilities from basic text generation to complex multimodal interactions and function calling.

Use when integrating Google Gemini API into projects. Covers model selection, multimodal inputs, streaming, function calling, and production best practices.

The AI agent will provide clear, actionable steps, code snippets, and architectural guidance for integrating the Google Gemini API. Users will successfully implement various Gemini API features (e.g., text generation, multimodal inputs, streaming, function calling) into their applications following the agent's instructions. The integrated solution will adhere to best practices for production readiness.

Practical example

Example input

I need help integrating the Google Gemini Pro Vision model into my Python Flask application. I want to be able to send both text prompts and an image, and get a streamed text response back. Can you guide me through the steps, including environment setup and a basic code example?

Example output

Okay, let's get your Google Gemini Pro Vision integration set up in your Python Flask application with multimodal input and streaming output. Here are the steps and code snippets you'll need:

**Step 1: Google Cloud Project & API Key**
First, you'll need a Google Cloud project with the Gemini API enabled and an API key. Go to the Google Cloud Console, create a new project (if you don't have one), enable the 'Generative Language API', and then create an API key via 'APIs & Services > Credentials'.

**Step 2: Install Google AI Python SDK**
Install the official Google Generative AI Python SDK:
```bash
pip install -q -U google-generativeai
```

**Step 3: Basic Flask Application Structure**
Here's a minimal Flask app to handle file uploads and text input:
```python
import os
import io
import base64
from flask import Flask, request, jsonify, Response
import google.generativeai as genai

app = Flask(__name__)

# Configure Gemini API (use environment variable for production)
genai.configure(api_key=os.environ.get("GEMINI_API_KEY"))

@app.route('/integrate-gemini-vision', methods=['POST'])
def integrate_gemini_vision():
    if 'image' not in request.files or 'prompt' not in request.form:
        return jsonify({'error': 'Image file and text prompt are required'}), 400

    image_file = request.files['image']
    text_prompt = request.form['prompt']

    # Convert image to bytes
    image_bytes = image_file.read()

    # Prepare content for Gemini API
    image_part = {
        'mime_type': image_file.mimetype,
        'data': image_bytes
    }
    
    model = genai.GenerativeModel('gemini-pro-vision')

    def generate():
        try:
            contents = [
                text_prompt,
                image_part
            ]
            response_stream = model.generate_content(contents, stream=True)
            
            for chunk in response_stream:
                # Ensure 'text' attribute exists before trying to access
                if hasattr(chunk, 'text'):
                    yield f"data: {chunk.text.replace('\n', '\\n')}\n\n" # SSE format
                else:
                    # Handle cases where a chunk might not have 'text' (e.g., safety ratings)
                    print(f"Skipping chunk without text: {chunk}")

        except Exception as e:
            yield f"data: ERROR: {str(e).replace('\n', '\\n')}\n\n"

    return Response(generate(), mimetype='text/event-stream')

if __name__ == '__main__':
    # For local development, load API key from .env or set directly
    # os.environ["GEMINI_API_KEY"] = "YOUR_GEMINI_API_KEY"
    app.run(debug=True, port=5000)
```

**Step 4: Prepare Multimodal Input (Covered in Step 3 Code)**
The code demonstrates creating `image_part` with `mime_type` and `data` from the uploaded file, and combining it with the `text_prompt` in the `contents` list.

**Step 5: Make Streaming API Call (Covered in Step 3 Code)**
The `model.generate_content(contents, stream=True)` call is used to get a streaming response.

**Step 6: Stream Responses to Client (Covered in Step 3 Code)**
The `generate()` function yields chunks of text in Server-Sent Events (SSE) format, allowing the client to receive the response as it's generated.

**Step 7: Production Best Practices**
*   **Environment Variables**: Always store your `GEMINI_API_KEY` in environment variables (`os.environ.get("GEMINI_API_KEY")`) rather than hardcoding it. For local development, consider using a `.env` file and a library like `python-dotenv`.
*   **Error Handling**: Implement more robust error handling for API calls, including retry mechanisms and specific error message handling.
*   **Rate Limits**: Be aware of Gemini API rate limits and implement strategies like exponential backoff if you anticipate high request volumes.
*   **Security**: Ensure proper validation and sanitization of user inputs (e.g., file types, sizes) to prevent vulnerabilities.

You can test this by sending a POST request with an image file and a 'prompt' form field to `http://127.0.0.1:5000/integrate-gemini-vision` using tools like `curl` or Postman, or by building a simple HTML form with JavaScript to handle the streaming response.

When to use this skill

  • When a user needs to set up Google Gemini API integration for the first time in a Node.js, Python, or browser-based project.
  • When implementing advanced Gemini features like multimodal inputs (combining text with images, audio, or video).
  • When adding streaming responses to an application using the Gemini API to improve perceived performance and user experience.
  • When integrating custom tools or functions with Gemini via its function calling capabilities.

When not to use this skill

  • When the project does not involve integrating the Google Gemini API.
  • When a simpler, non-Gemini AI model or a different API is sufficient for the task.
  • When the agent's task is solely to *use* an existing Gemini integration, rather than to *guide the integration process*.
  • If the required integration is for an AI model other than Google Gemini.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/gemini-api-integration/SKILL.md --create-dirs "https://raw.githubusercontent.com/sickn33/antigravity-awesome-skills/main/plugins/antigravity-awesome-skills-claude/skills/gemini-api-integration/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/gemini-api-integration/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How gemini-api-integration Compares

Feature / Agentgemini-api-integrationStandard Approach
Platform SupportClaudeLimited / Varies
Context Awareness High Baseline
Installation ComplexityeasyN/A

Frequently Asked Questions

What does this skill do?

Use when integrating Google Gemini API into projects. Covers model selection, multimodal inputs, streaming, function calling, and production best practices.

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Gemini API Integration

## Overview

This skill guides AI agents through integrating Google Gemini API into applications — from basic text generation to advanced multimodal, function calling, and streaming use cases. It covers the full Gemini SDK lifecycle with production-grade patterns.

## When to Use This Skill

- Use when setting up Gemini API for the first time in a Node.js, Python, or browser project
- Use when implementing multimodal inputs (text + image/audio/video)
- Use when adding streaming responses to improve perceived latency
- Use when implementing function calling / tool use with Gemini
- Use when optimizing model selection (Flash vs Pro vs Ultra) for cost and performance
- Use when debugging Gemini API errors, rate limits, or quota issues

## Step-by-Step Guide

### 1. Installation & Setup

**Node.js / TypeScript:**
```bash
npm install @google/generative-ai
```

**Python:**
```bash
pip install google-generativeai
```

Set your API key securely:
```bash
export GEMINI_API_KEY="your-api-key-here"
```

### 2. Basic Text Generation

**Node.js:**
```javascript
import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash" });

const result = await model.generateContent("Explain async/await in JavaScript");
console.log(result.response.text());
```

**Python:**
```python
import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")

response = model.generate_content("Explain async/await in JavaScript")
print(response.text)
```

### 3. Streaming Responses

```javascript
const result = await model.generateContentStream("Write a detailed blog post about AI");

for await (const chunk of result.stream) {
  process.stdout.write(chunk.text());
}
```

### 4. Multimodal Input (Text + Image)

```javascript
import fs from "fs";

const imageData = fs.readFileSync("screenshot.png");
const imagePart = {
  inlineData: {
    data: imageData.toString("base64"),
    mimeType: "image/png",
  },
};

const result = await model.generateContent(["Describe this image:", imagePart]);
console.log(result.response.text());
```

### 5. Function Calling / Tool Use

```javascript
const tools = [{
  functionDeclarations: [{
    name: "get_weather",
    description: "Get current weather for a city",
    parameters: {
      type: "OBJECT",
      properties: {
        city: { type: "STRING", description: "City name" },
      },
      required: ["city"],
    },
  }],
}];

const model = genAI.getGenerativeModel({ model: "gemini-1.5-pro", tools });
const result = await model.generateContent("What's the weather in Mumbai?");

const call = result.response.functionCalls()?.[0];
if (call) {
  // Execute the actual function
  const weatherData = await getWeather(call.args.city);
  // Send result back to model
}
```

### 6. Multi-turn Chat

```javascript
const chat = model.startChat({
  history: [
    { role: "user", parts: [{ text: "You are a helpful coding assistant." }] },
    { role: "model", parts: [{ text: "Sure! I'm ready to help with code." }] },
  ],
});

const response = await chat.sendMessage("How do I reverse a string in Python?");
console.log(response.response.text());
```

### 7. Model Selection Guide

| Model | Best For | Speed | Cost |
|-------|----------|-------|------|
| `gemini-1.5-flash` | High-throughput, cost-sensitive tasks | Fast | Low |
| `gemini-1.5-pro` | Complex reasoning, long context | Medium | Medium |
| `gemini-2.0-flash` | Latest fast model, multimodal | Very Fast | Low |
| `gemini-2.0-pro` | Most capable, advanced tasks | Slow | High |

## Best Practices

- ✅ **Do:** Use `gemini-1.5-flash` for most tasks — it's fast and cost-effective
- ✅ **Do:** Always stream responses for user-facing chat UIs to reduce perceived latency
- ✅ **Do:** Store API keys in environment variables, never hard-code them
- ✅ **Do:** Implement exponential backoff for rate limit (429) errors
- ✅ **Do:** Use `systemInstruction` to set persistent model behavior
- ❌ **Don't:** Use `gemini-pro` for simple tasks — Flash is cheaper and faster
- ❌ **Don't:** Send large base64 images inline for files > 20MB — use File API instead
- ❌ **Don't:** Ignore safety ratings in responses for production apps

## Error Handling

```javascript
try {
  const result = await model.generateContent(prompt);
  return result.response.text();
} catch (error) {
  if (error.status === 429) {
    // Rate limited — wait and retry with exponential backoff
    await new Promise(r => setTimeout(r, 2 ** retryCount * 1000));
  } else if (error.status === 400) {
    // Invalid request — check prompt or parameters
    console.error("Invalid request:", error.message);
  } else {
    throw error;
  }
}
```

## Troubleshooting

**Problem:** `API_KEY_INVALID` error
**Solution:** Ensure `GEMINI_API_KEY` environment variable is set and the key is active in Google AI Studio.

**Problem:** Response blocked by safety filters
**Solution:** Check `result.response.promptFeedback.blockReason` and adjust your prompt or safety settings.

**Problem:** Slow response times
**Solution:** Switch to `gemini-1.5-flash` and enable streaming. Consider caching repeated prompts.

**Problem:** `RESOURCE_EXHAUSTED` (quota exceeded)
**Solution:** Check your quota in Google Cloud Console. Implement request queuing and exponential backoff.

Related Skills

nerdzao-elite-gemini-high

31392
from sickn33/antigravity-awesome-skills

Modo Elite Coder + UX Pixel-Perfect otimizado especificamente para Gemini 3.1 Pro High. Workflow completo com foco em qualidade máxima e eficiência de tokens.

Software DevelopmentClaudeGemini

hubspot-integration

31392
from sickn33/antigravity-awesome-skills

Expert patterns for HubSpot CRM integration including OAuth authentication, CRM objects, associations, batch operations, webhooks, and custom objects. Covers Node.js and Python SDKs.

CRM IntegrationClaude

gemini-api-dev

31392
from sickn33/antigravity-awesome-skills

The Gemini API provides access to Google's most advanced AI models. Key capabilities include:

AI Model InteractionClaude

nft-standards

31392
from sickn33/antigravity-awesome-skills

Master ERC-721 and ERC-1155 NFT standards, metadata best practices, and advanced NFT features.

Web3 & BlockchainClaude

nextjs-app-router-patterns

31392
from sickn33/antigravity-awesome-skills

Comprehensive patterns for Next.js 14+ App Router architecture, Server Components, and modern full-stack React development.

Web FrameworksClaude

new-rails-project

31392
from sickn33/antigravity-awesome-skills

Create a new Rails project

Code GenerationClaude

networkx

31392
from sickn33/antigravity-awesome-skills

NetworkX is a Python package for creating, manipulating, and analyzing complex networks and graphs.

Network AnalysisClaude

network-engineer

31392
from sickn33/antigravity-awesome-skills

Expert network engineer specializing in modern cloud networking, security architectures, and performance optimization.

Network EngineeringClaude

nestjs-expert

31392
from sickn33/antigravity-awesome-skills

You are an expert in Nest.js with deep knowledge of enterprise-grade Node.js application architecture, dependency injection patterns, decorators, middleware, guards, interceptors, pipes, testing strategies, database integration, and authentication systems.

Frameworks & LibrariesClaude

nerdzao-elite

31392
from sickn33/antigravity-awesome-skills

Senior Elite Software Engineer (15+) and Senior Product Designer. Full workflow with planning, architecture, TDD, clean code, and pixel-perfect UX validation.

Software DevelopmentClaude

native-data-fetching

31392
from sickn33/antigravity-awesome-skills

Use when implementing or debugging ANY network request, API call, or data fetching. Covers fetch API, React Query, SWR, error handling, caching, offline support, and Expo Router data loaders (useLoaderData).

API IntegrationClaude

n8n-workflow-patterns

31392
from sickn33/antigravity-awesome-skills

Proven architectural patterns for building n8n workflows.

Workflow AutomationClaude