vertex-ai-gemini

Google Cloud Vertex AI for enterprise Gemini deployments — production scaling, fine-tuning, and MLOps. Use when deploying Gemini in GCP-native environments, running fine-tuning jobs, needing enterprise IAM controls, VPC isolation, batch prediction at scale, or production ML pipelines on Google Cloud.

26 stars

byTerminalSkills

View on GitHub Installation ↓

Best use case

vertex-ai-gemini is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using vertex-ai-gemini should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/vertex-ai-gemini/SKILL.md --create-dirs "https://raw.githubusercontent.com/TerminalSkills/skills/main/skills/vertex-ai-gemini/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/vertex-ai-gemini/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How vertex-ai-gemini Compares

Feature / Agent	vertex-ai-gemini	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Vertex AI — Gemini on Google Cloud

## Overview

Vertex AI is Google Cloud's enterprise ML platform. It provides access to the same Gemini models as Google AI Studio, but with enterprise-grade features: IAM-based auth (no API keys), VPC Service Controls for data isolation, audit logging, fine-tuning capabilities, batch prediction jobs, and integration with GCP data services like BigQuery and Cloud Storage.

## Vertex AI vs Google AI Studio

| Feature | Google AI Studio | Vertex AI |
|---|---|---|
| Auth | API Key | Service Account / IAM |
| Data residency | Limited | GCP regions |
| VPC isolation | ❌ | ✅ |
| Audit logging | ❌ | ✅ Cloud Audit Logs |
| Fine-tuning | ❌ | ✅ |
| Batch prediction | ❌ | ✅ |
| Pricing | Per token | Per token (different rates) |
| Quotas | Shared | Project-level quotas |

> **Naming note:** "Vertex AI" is being rebranded to **Agent Platform** (full name: Gemini Enterprise Agent Platform). The endpoints, IAM roles, and SDKs are the same product — most documentation still uses the legacy "Vertex AI" name.

## SDK Choice — Use the Unified Gen AI SDK

Google now ships a single `google-genai` SDK that targets both Agent Platform (Vertex) and Google AI Studio with the same code. **Use this for all new code.** The legacy `google-cloud-aiplatform` and `vertexai` modules are deprecated.

| New (use this) | Legacy (deprecated) |
|---|---|
| `google-genai` (Python) | `google-cloud-aiplatform`, `google-generativeai` |
| `@google/genai` (JS/TS) | `@google-cloud/vertexai` |
| `google.golang.org/genai` (Go) | `cloud.google.com/go/vertexai` |
| `com.google.genai:google-genai` (Java) | — |
| `Google.GenAI` (.NET) | — |

```bash
# Recommended: unified Gen AI SDK
pip install google-genai
```

```python
import os
from google import genai

os.environ["GOOGLE_CLOUD_PROJECT"] = "my-project-id"
os.environ["GOOGLE_CLOUD_LOCATION"] = "global"  # routes to nearest region
os.environ["GOOGLE_GENAI_USE_VERTEXAI"] = "true"

client = genai.Client()  # picks up env vars

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Explain containerization in simple terms.",
)
print(response.text)
```

Use `location="global"` by default — routes to the region with available capacity. Pin to a specific region (`us-central1`, `europe-west4`) only when data residency requires it.

## Setup (Legacy SDK — only for existing code)

```bash
pip install google-cloud-aiplatform
```

```bash
# Authenticate
gcloud auth application-default login

# Or use service account
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
```

```bash
# Set project and location
export GOOGLE_CLOUD_PROJECT=my-project-id
export GOOGLE_CLOUD_LOCATION=us-central1
```

## Instructions

> The examples below use the legacy `google-cloud-aiplatform` SDK. For new code, prefer the unified `google-genai` SDK shown above — same capabilities, cross-platform, current best practice.

### Basic Gemini Inference

```python
import vertexai
from vertexai.generative_models import GenerativeModel

vertexai.init(project="my-project-id", location="us-central1")

model = GenerativeModel("gemini-2.0-flash-001")
response = model.generate_content("Explain containerization in simple terms.")
print(response.text)
```

### Multi-Modal Inference

```python
import vertexai
from vertexai.generative_models import GenerativeModel, Part
import base64

vertexai.init(project="my-project-id", location="us-central1")
model = GenerativeModel("gemini-2.0-flash-001")

# Analyze image from Cloud Storage
gcs_image = Part.from_uri(
    uri="gs://my-bucket/product-photo.jpg",
    mime_type="image/jpeg",
)
response = model.generate_content(["Describe this product:", gcs_image])
print(response.text)

# Analyze local image
with open("chart.png", "rb") as f:
    image_data = f.read()

local_image = Part.from_data(data=image_data, mime_type="image/png")
response = model.generate_content(["What trends does this chart show?", local_image])
print(response.text)
```

### Streaming Responses

```python
import vertexai
from vertexai.generative_models import GenerativeModel

vertexai.init(project="my-project-id", location="us-central1")
model = GenerativeModel("gemini-2.0-flash-001")

for chunk in model.generate_content("Write a product description for a smartwatch.", stream=True):
    print(chunk.text, end="", flush=True)
print()
```

### Chat Session

```python
import vertexai
from vertexai.generative_models import GenerativeModel, ChatSession

vertexai.init(project="my-project-id", location="us-central1")

model = GenerativeModel(
    model_name="gemini-2.0-flash-001",
    system_instruction="You are a GCP expert. Provide concise, actionable answers.",
)

chat = model.start_chat()
print(chat.send_message("How do I set up Cloud Run?").text)
print(chat.send_message("What about environment variables?").text)
```

### Function Calling

```python
import vertexai
from vertexai.generative_models import (
    FunctionDeclaration,
    GenerativeModel,
    Tool,
)

vertexai.init(project="my-project-id", location="us-central1")

get_bq_query = FunctionDeclaration(
    name="run_bigquery_query",
    description="Run a SQL query on BigQuery and return results",
    parameters={
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "SQL query to execute"},
            "dataset": {"type": "string", "description": "BigQuery dataset name"},
        },
        "required": ["query"],
    },
)

tool = Tool(function_declarations=[get_bq_query])
model = GenerativeModel("gemini-2.0-flash-001", tools=[tool])

response = model.generate_content("How many users signed up last week?")

if response.candidates[0].function_calls:
    fc = response.candidates[0].function_calls[0]
    print(f"Function: {fc.name}, Args: {dict(fc.args)}")
```

### Fine-Tuning Gemini

```python
import vertexai
from vertexai.tuning import sft

vertexai.init(project="my-project-id", location="us-central1")

# Prepare training data in JSONL format in GCS:
# {"messages": [{"role": "user", "content": "..."}, {"role": "model", "content": "..."}]}

tuning_job = sft.train(
    source_model="gemini-2.0-flash-001",
    train_dataset="gs://my-bucket/training-data.jsonl",
    validation_dataset="gs://my-bucket/validation-data.jsonl",
    tuned_model_display_name="my-fine-tuned-gemini",
    epochs=3,
    learning_rate_multiplier=1.0,
)

print(f"Tuning job: {tuning_job.resource_name}")
print(f"State: {tuning_job.state}")

# Wait for completion
tuning_job.wait()
print(f"Tuned model: {tuning_job.tuned_model_name}")
```

### Batch Prediction

```python
import vertexai
from vertexai.generative_models import GenerativeModel
from vertexai.preview.batch_prediction import BatchPredictionJob

vertexai.init(project="my-project-id", location="us-central1")

# Input JSONL format in GCS:
# {"request": {"contents": [{"role": "user", "parts": [{"text": "Translate: Hello"}]}]}}

job = BatchPredictionJob.submit(
    source_model="gemini-2.0-flash-001",
    input_dataset="gs://my-bucket/batch-inputs.jsonl",
    output_uri_prefix="gs://my-bucket/batch-outputs/",
)

print(f"Batch job: {job.resource_name}")
job.wait()
print(f"Output: {job.output_location}")
```

### IAM Setup for Service Account

```bash
# Create a service account for your app
gcloud iam service-accounts create gemini-app-sa \
    --display-name="Gemini App Service Account"

# Grant Vertex AI User role
gcloud projects add-iam-policy-binding my-project-id \
    --member="serviceAccount:gemini-app-sa@my-project-id.iam.gserviceaccount.com" \
    --role="roles/aiplatform.user"

# Download key (for non-GCP environments)
gcloud iam service-accounts keys create key.json \
    --iam-account=gemini-app-sa@my-project-id.iam.gserviceaccount.com
```

### VPC Service Controls (Enterprise Isolation)

```python
# When VPC SC is enabled, all API calls must originate from within the perimeter
# Configure the SDK to use private endpoints:

import vertexai

vertexai.init(
    project="my-project-id",
    location="us-central1",
    api_endpoint="us-central1-aiplatform.googleapis.com",  # Regional endpoint
)
```

### Available Gemini Models on Vertex AI

| Model ID | Notes |
|---|---|
| `gemini-2.0-flash-001` | Latest Flash, fast + capable |
| `gemini-1.5-pro-002` | 2M context, most capable |
| `gemini-1.5-flash-002` | 1M context, balanced |
| `text-embedding-005` | Latest embeddings (768 dims) |

Use `gemini-2.0-flash-001` (version pinned) in production to avoid unexpected model changes.

## Examples

### Example 1 — Migrate a Python service from `google-cloud-aiplatform` to `google-genai`

User has a recommendation service running on Cloud Run that uses the legacy `google-cloud-aiplatform` SDK. Replace `pip install google-cloud-aiplatform` with `pip install google-genai`, swap `vertexai.init(...)` + `GenerativeModel(...)` for `genai.Client()` (with `GOOGLE_GENAI_USE_VERTEXAI=true`), update `model.generate_content(...)` to `client.models.generate_content(model=..., contents=...)`. Keep the existing service account and IAM bindings — same auth, new SDK. Pin to `gemini-2.5-flash` for cost, validate parity in staging before cutover.

### Example 2 — Run nightly batch translation of 5M product titles

User has 5M product titles in BigQuery to translate into 4 languages. Streaming inference would be slow and expensive. Format input as JSONL in GCS (`{"request": {"contents": [...]}}`) per row × language, submit a `BatchPredictionJob` against `gemini-2.5-flash`, and let it run unattended. Output JSONL lands in GCS, load it back into BigQuery via `bq load`. Cost is roughly half of streaming, runtime is hours not days.

## Guidelines

- **Use `google-genai` for all new code** — `google-cloud-aiplatform` and `google-generativeai` are deprecated.
- Always pin model versions in production for stability — `gemini-2.5-flash` is fine for non-prod, but production should target a specific build.
- Use Application Default Credentials (`gcloud auth application-default login`) during development.
- In GKE or Cloud Run, use Workload Identity — no service account keys needed.
- Default `location="global"` for the Gen AI SDK; pin to a region only for data residency.
- Fine-tuning requires a training JSONL with `messages` format and at least 100 examples.
- Batch prediction is cost-effective for offline bulk inference (no streaming).
- Enable Cloud Audit Logs on the `aiplatform.googleapis.com` service for compliance.

Related Skills

vertex-media-generation

from TerminalSkills/skills

Generate images with Imagen and videos with Veo using the Vercel AI SDK Google Vertex provider. Use when the user wants to generate images, edit images (inpainting, outpainting, background swap), generate videos, or build media generation pipelines with @ai-sdk/google-vertex. Covers Imagen 4.0/3.0 and Veo 3.1/3.0/2.0 models.

gemini-cli

from TerminalSkills/skills

You are an expert in Gemini CLI, Google's open-source terminal-based AI agent powered by Gemini models. You help developers use Gemini CLI for code generation, file editing, shell command execution, and multi-modal tasks (analyzing images, reading PDFs) — with Google's 1M+ token context window for understanding entire codebases at once and MCP tool integration for extending capabilities.

zustand

from TerminalSkills/skills

You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.

zoho

from TerminalSkills/skills

Integrate and automate Zoho products. Use when a user asks to work with Zoho CRM, Zoho Books, Zoho Desk, Zoho Projects, Zoho Mail, or Zoho Creator, build custom integrations via Zoho APIs, automate workflows with Deluge scripting, sync data between Zoho apps and external systems, manage leads and deals, automate invoicing, build custom Zoho Creator apps, set up webhooks, or manage Zoho organization settings. Covers Zoho CRM, Books, Desk, Projects, Creator, and cross-product integrations.

zod

from TerminalSkills/skills

You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.

zipkin

from TerminalSkills/skills

Deploy and configure Zipkin for distributed tracing and request flow visualization. Use when a user needs to set up trace collection, instrument Java/Spring or other services with Zipkin, analyze service dependencies, or configure storage backends for trace data.

zig

from TerminalSkills/skills

Expert guidance for Zig, the systems programming language focused on performance, safety, and readability. Helps developers write high-performance code with compile-time evaluation, seamless C interop, no hidden control flow, and no garbage collector. Zig is used for game engines, operating systems, networking, and as a C/C++ replacement.

zed

from TerminalSkills/skills

Expert guidance for Zed, the high-performance code editor built in Rust with native collaboration, AI integration, and GPU-accelerated rendering. Helps developers configure Zed, create custom extensions, set up collaborative editing sessions, and integrate AI assistants for productive coding.

zeabur

from TerminalSkills/skills

Expert guidance for Zeabur, the cloud deployment platform that auto-detects frameworks, builds and deploys applications with zero configuration, and provides managed services like databases and message queues. Helps developers deploy full-stack applications with automatic scaling and one-click marketplace services.

zapier

from TerminalSkills/skills

Automate workflows between apps with Zapier. Use when a user asks to connect apps without code, automate repetitive tasks, sync data between services, or build no-code integrations between SaaS tools.

zabbix

from TerminalSkills/skills

Configure Zabbix for enterprise infrastructure monitoring with templates, triggers, discovery rules, and dashboards. Use when a user needs to set up Zabbix server, configure host monitoring, create custom templates, define trigger expressions, or automate host discovery and registration.

yup

from TerminalSkills/skills

Validate data with Yup schemas. Use when adding form validation, defining API request schemas, validating configuration, or building type-safe validation pipelines in JavaScript/TypeScript.