vertex-ai-media-master

Automatic activation for ALL Google Vertex AI multimodal operations - video processing, audio generation, image creation, and marketing campaigns. **TRIGGER PHRASES:** - "vertex ai", "gemini multimodal", "process video", "generate audio", "create images", "marketing campaign" - "imagen", "video understanding", "multimodal", "content generation", "media assets" **AUTO-INVOKES FOR:** - Video processing and understanding (up to 6 hours) - Audio generation and transcription - Image generation with Imagen 4 - Marketing campaign automation - Social media content creation - Ad creative generation - Multimodal content workflows

25 stars

byComeOnOliver

View on GitHub Installation ↓

Best use case

vertex-ai-media-master is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using vertex-ai-media-master should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/vertex-media-master/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/jeremylongshore/claude-code-plugins-plus-skills/vertex-media-master/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/vertex-media-master/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How vertex-ai-media-master Compares

Feature / Agent	vertex-ai-media-master	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Vertex AI Media Master - Comprehensive Multimodal AI Operations

This Agent Skill provides comprehensive mastery of Google Vertex AI multimodal capabilities for video, audio, image, and text processing with focus on marketing applications.

## Core Capabilities

### 🎥 Video Processing (Gemini 2.0/2.5)
- **Video Understanding**: Process videos up to 6 hours at low resolution or 2 hours at default resolution
- **2M Context Window**: Gemini 2.5 Pro handles massive video content
- **Audio Track Processing**: Automatic audio transcription from video
- **Multi-video Analysis**: Process multiple videos in single request
- **Video Summarization**: Extract key moments, scenes, and insights
- **Marketing Use Cases**:
  - Analyze competitor video ads
  - Extract highlights from long-form content
  - Generate video summaries for social media
  - Transcribe and caption video content
  - Identify brand mentions and product placements

### 🎵 Audio Generation & Processing
- **Lyria Model (2025)**: Native audio and music generation
- **Speech-to-Text**: Transcribe audio with speaker diarization
- **Text-to-Speech**: Generate natural voiceovers
- **Music Composition**: Background music for campaigns
- **Audio Enhancement**: Noise reduction and quality improvement
- **Marketing Use Cases**:
  - Generate podcast scripts and voiceovers
  - Create audio ads and radio spots
  - Produce background music for video campaigns
  - Transcribe customer interviews
  - Generate multilingual voiceovers

### 🖼️ Image Generation (Imagen 4 & Gemini 2.5 Flash Image)
- **Imagen 4**: Highest quality text-to-image generation
- **Gemini 2.5 Flash Image**: Interleaved image generation with text
- **Style Transfer**: Apply brand styles to generated images
- **Product Visualization**: Generate product mockups
- **Campaign Assets**: Create ad creatives and social media graphics
- **Marketing Use Cases**:
  - Generate personalized ad images (Adios solution)
  - Create social media graphics at scale
  - Produce product lifestyle images
  - Generate A/B test variations
  - Create branded campaign visuals

### 📢 Marketing Campaign Automation
- **ViGenAiR**: Convert long-form video ads to short formats automatically
- **Adios**: Generate personalized ad images tailored to audience context
- **Campaign Asset Generation**: Photos, soundtracks, voiceovers from prompts
- **Content Pipeline**: Email copy, blog posts, social media, PMax assets
- **Catalog Enrichment**: Multi-agent workflow for product onboarding
- **Marketing Use Cases**:
  - Automated campaign asset production
  - Personalized content at scale
  - Multi-channel content distribution
  - Product catalog enhancement
  - Visual merchandising automation

### 🔧 Technical Implementation

**API Integration:**
```python
from google.cloud import aiplatform
from vertexai.preview.generative_models import GenerativeModel

# Initialize Vertex AI
aiplatform.init(project="your-project", location="us-central1")

# Gemini 2.5 Pro for video
model = GenerativeModel("gemini-2.5-pro")

# Process video with audio
response = model.generate_content([
    "Analyze this video and extract key marketing insights",
    video_file,  # Up to 6 hours
])

# Imagen 4 for image generation
from vertexai.preview.vision_models import ImageGenerationModel
imagen = ImageGenerationModel.from_pretrained("imagen-4")
images = imagen.generate_images(
    prompt="Professional product photo, studio lighting, white background",
    number_of_images=4
)
```

**Gemini 2.5 Flash Image (Interleaved Generation):**
```python
# Generate images within text responses
model = GenerativeModel("gemini-2.5-flash-image")
response = model.generate_content([
    "Create a 5-step recipe with images for each step"
])
# Returns text + images interleaved
```

**Audio Generation (Lyria):**
```python
from vertexai.preview.audio_models import AudioGenerationModel
lyria = AudioGenerationModel.from_pretrained("lyria")
audio = lyria.generate_audio(
    prompt="Upbeat background music for product launch video, 30 seconds",
    duration=30
)
```

### 📊 Marketing Workflow Automation

**1. Multi-Channel Campaign Creation:**
```python
# Single prompt generates all assets
campaign = model.generate_content([
    """Create a product launch campaign for [product]:
    - Hero image (1920x1080)
    - 3 social media graphics (1080x1080)
    - 30-second video script
    - Background music description
    - Email marketing copy
    - Instagram caption"""
])
```

**2. Video Repurposing Pipeline:**
```python
# Long-form to short-form conversion (ViGenAiR approach)
long_video = "gs://bucket/original-ad-60s.mp4"
response = model.generate_content([
    f"Extract 3 engaging 15-second clips from this video for TikTok/Reels",
    long_video
])
# Auto-generates format-specific versions
```

**3. Personalized Ad Generation:**
```python
# Context-aware image generation (Adios approach)
for audience in audiences:
    ad_image = imagen.generate_images(
        prompt=f"Product ad for {product}, targeting {audience.demographics}, {audience.style_preference}",
        aspect_ratio="16:9"
    )
```

### 🎯 Best Practices for Jeremy

**1. Project Setup:**
```bash
# Set environment variables
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_APPLICATION_CREDENTIALS="path/to/service-account.json"

# Install SDK
pip install google-cloud-aiplatform[vision,audio] google-generativeai
```

**2. Rate Limits & Quotas:**
- Gemini 2.5 Pro: 2M tokens/min (video processing)
- Imagen 4: 100 images/min
- Monitor usage in Cloud Console

**3. Cost Optimization:**
- Use Gemini 2.5 Flash for faster, cheaper operations
- Batch image generation requests
- Cache video embeddings for repeated analysis
- Use low-resolution video setting when appropriate

**4. Security & Compliance:**
- Keep API keys in Secret Manager, never in code
- Use service accounts with minimal permissions
- Enable VPC Service Controls for data residency
- Log all API calls for audit trails

### 🚀 Advanced Marketing Use Cases

**1. Campaign Performance Analysis:**
```python
# Analyze competitor campaigns
competitor_videos = ["gs://bucket/competitor1.mp4", "gs://bucket/competitor2.mp4"]
analysis = model.generate_content([
    "Compare these competitor videos: themes, messaging, CTAs, production quality",
    *competitor_videos
])
```

**2. Content Localization:**
```python
# Generate multilingual campaigns
for lang in ["en", "es", "fr", "de", "ja"]:
    localized_content = model.generate_content([
        f"Translate and culturally adapt this campaign for {lang} market:",
        campaign_brief,
        hero_image
    ])
```

**3. A/B Test Generation:**
```python
# Generate variations automatically
variations = []
for style in ["minimalist", "bold", "luxury", "playful"]:
    variation = imagen.generate_images(
        prompt=f"Product ad, {style} style, {brand_guidelines}",
        number_of_images=1
    )
    variations.append(variation)
```

### 📚 Reference Documentation

**Official Documentation:**
- Vertex AI Multimodal: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/overview
- Gemini 2.5 Pro: https://cloud.google.com/vertex-ai/generative-ai/docs/models
- Imagen 4: https://cloud.google.com/vertex-ai/generative-ai/docs/image/overview
- Video Understanding: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/video-understanding

**Marketing Solutions:**
- GenAI for Marketing: https://github.com/GoogleCloudPlatform/genai-for-marketing
- ViGenAiR (video repurposing)
- Adios (personalized ad images)

**Pricing:**
- Gemini 2.5 Pro: $3.50/1M input tokens, $10.50/1M output tokens
- Imagen 4: $0.04/image
- Video processing: Included in Gemini token pricing

## When This Skill Activates

This skill automatically activates when you mention:
- Video processing, analysis, or understanding
- Audio generation, music composition, or voiceovers
- Image generation, ad creatives, or visual content
- Marketing campaigns, content automation, or asset production
- Gemini multimodal capabilities
- Vertex AI media operations
- Social media content, email marketing, or PMax campaigns

## Integration with Other Tools

**Google Cloud Services:**
- Cloud Storage for media asset management
- BigQuery for campaign analytics
- Cloud Functions for automation triggers
- Vertex AI Pipelines for content workflows

**Third-Party Integrations:**
- Social media APIs (LinkedIn, Twitter, Instagram)
- Marketing automation platforms (HubSpot, Marketo)
- CMS integrations (WordPress, Contentful)
- DAM systems (Bynder, Cloudinary)

## Success Metrics

**Track These KPIs:**
- Asset generation speed (baseline: 5 images/min)
- Content approval rate (target: >80%)
- Campaign personalization scale (target: 1000+ variants)
- Cost per asset (target: <$0.10/image)
- Time saved vs manual production (target: 90% reduction)

---

**This skill makes Jeremy a Vertex AI multimodal expert with instant access to video processing, audio generation, image creation, and marketing automation capabilities.**

Related Skills

yaml-master

from ComeOnOliver/skillshub

Execute proactive YAML intelligence: automatically activates when working with YAML files. Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.

vertex-infra-expert

from ComeOnOliver/skillshub

Terraform infrastructure specialist for Vertex AI services and Gemini deployments. Provisions Model Garden, endpoints, vector search, pipelines, and enterprise AI infrastructure. Triggers: "vertex ai terraform", "gemini deployment terraform", "model garden infrastructure", "vertex ai endpoints"

vertex-engine-inspector

from ComeOnOliver/skillshub

Inspect and validate Vertex AI Agent Engine deployments including Code Execution Sandbox, Memory Bank, A2A protocol compliance, and security posture. Generates production readiness scores. Use when asked to inspect, validate, or audit an Agent Engine deployment. Trigger with "inspect agent engine", "validate agent engine deployment", "check agent engine config", "audit agent engine security", "agent engine readiness check", "vertex engine health", or "reasoning engine status".

vertex-ai-pipeline-creator

from ComeOnOliver/skillshub

Vertex Ai Pipeline Creator - Auto-activating skill for GCP Skills. Triggers on: vertex ai pipeline creator, vertex ai pipeline creator Part of the GCP Skills skill category.

vertex-ai-endpoint-config

from ComeOnOliver/skillshub

Vertex Ai Endpoint Config - Auto-activating skill for GCP Skills. Triggers on: vertex ai endpoint config, vertex ai endpoint config Part of the GCP Skills skill category.

vertex-ai-deployer

from ComeOnOliver/skillshub

Vertex Ai Deployer - Auto-activating skill for ML Deployment. Triggers on: vertex ai deployer, vertex ai deployer Part of the ML Deployment skill category.

vertex-agent-builder

from ComeOnOliver/skillshub

Build and deploy production-ready generative AI agents using Vertex AI, Gemini models, and Google Cloud infrastructure with RAG, function calling, and multi-modal capabilities

hypermedia-link-generator

from ComeOnOliver/skillshub

Hypermedia Link Generator - Auto-activating skill for API Development. Triggers on: hypermedia link generator, hypermedia link generator Part of the API Development skill category.

Google Cloud Agent SDK Master

from ComeOnOliver/skillshub

Execute automatic activation for all google cloud agent development kit (adk) Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.

firebase-vertex-ai

from ComeOnOliver/skillshub

Execute firebase platform expert with Vertex AI Gemini integration for Authentication, Firestore, Storage, Functions, Hosting, and AI-powered features. Use when asked to "setup firebase", "deploy to firebase", or "integrate vertex ai with firebase". Trigger with relevant phrases based on skill purpose.

creating-oracle-to-postgres-master-migration-plan

from ComeOnOliver/skillshub

Discovers all projects in a .NET solution, classifies each for Oracle-to-PostgreSQL migration eligibility, and produces a persistent master migration plan. Use when starting a multi-project Oracle-to-PostgreSQL migration, creating a migration inventory, or assessing which .NET projects contain Oracle dependencies.

cli-mastery

from ComeOnOliver/skillshub

Interactive training for the GitHub Copilot CLI. Guided lessons, quizzes, scenario challenges, and a full reference covering slash commands, shortcuts, modes, agents, skills, MCP, and configuration. Say "cliexpert" to start.