azure-ai-vision-imageanalysis-java

Build image analysis applications with Azure AI Vision SDK for Java. Use when implementing image captioning, OCR text extraction, object detection, tagging, or smart cropping.

31,392 stars
Complexity: medium

About this skill

This skill empowers AI agents to integrate with the Azure AI Vision service using its Java SDK. It provides comprehensive capabilities for understanding and extracting information from images. Agents can utilize this skill to generate descriptive captions for images, perform optical character recognition (OCR) to extract text, detect and identify objects within images, apply relevant tags for categorization, and intelligently crop images to highlight key areas. It's ideal for applications requiring deep visual content understanding and processing.

Best use case

Processing user-submitted images, enhancing multimedia content, automating content moderation, extracting data from scanned documents, building visual search engines, and generating accessible descriptions for images.

Build image analysis applications with Azure AI Vision SDK for Java. Use when implementing image captioning, OCR text extraction, object detection, tagging, or smart cropping.

Structured data or insights derived from image content, such as a textual description (caption), extracted text, a list of detected objects with their bounding boxes and confidence scores, relevant tags, or optimized smart-cropped image coordinates.

Practical example

Example input

```json
{
  "imageUrl": "https://example.com/image_of_a_cat.jpg",
  "features": ["caption", "objects", "read", "tags", "smartCrops"]
}
```

Example output

```json
{
  "metadata": {
    "width": 1024,
    "height": 768,
    "format": "Jpeg"
  },
  "caption": {
    "text": "A cat sitting on a couch in a living room.",
    "confidence": 0.96
  },
  "objects": [
    {
      "rectangle": {
        "x": 300,
        "y": 250,
        "w": 200,
        "h": 300
      },
      "object": "cat",
      "confidence": 0.99
    }
  ],
  "read": {
    "blocks": [
      {
        "lines": [
          {
            "text": "Welcome Home",
            "words": [
              {"text": "Welcome", "confidence": 0.98},
              {"text": "Home", "confidence": 0.97}
            ]
          }
        ]
      }
    ]
  },
  "tags": [
    {"name": "cat", "confidence": 0.99},
    {"name": "indoor", "confidence": 0.98},
    {"name": "couch", "confidence": 0.95}
  ],
  "smartCrops": [
    {
      "aspectRatio": 1.33,
      "boundingBox": {
        "x": 280,
        "y": 230,
        "w": 240,
        "h": 320
      }
    }
  ]
}
```

When to use this skill

  • When a task requires understanding the content of an image, extracting textual information from visuals, identifying objects or people, categorizing images, or intelligently resizing/cropping images based on their content. Use this skill when working within a Java ecosystem or when the agent needs to generate or interact with Java code to perform computer vision tasks.

When not to use this skill

  • When the task does not involve image analysis, when a different computer vision service (e.g., Google Vision AI, AWS Rekognition) is preferred, when a non-Java SDK/API integration is required, or when purely text-based analysis is sufficient.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/azure-ai-vision-imageanalysis-java/SKILL.md --create-dirs "https://raw.githubusercontent.com/sickn33/antigravity-awesome-skills/main/plugins/antigravity-awesome-skills-claude/skills/azure-ai-vision-imageanalysis-java/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/azure-ai-vision-imageanalysis-java/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How azure-ai-vision-imageanalysis-java Compares

Feature / Agentazure-ai-vision-imageanalysis-javaStandard Approach
Platform SupportClaudeLimited / Varies
Context Awareness High Baseline
Installation ComplexitymediumN/A

Frequently Asked Questions

What does this skill do?

Build image analysis applications with Azure AI Vision SDK for Java. Use when implementing image captioning, OCR text extraction, object detection, tagging, or smart cropping.

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as medium. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Azure AI Vision Image Analysis SDK for Java

Build image analysis applications using the Azure AI Vision Image Analysis SDK for Java.

## Installation

```xml
<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-ai-vision-imageanalysis</artifactId>
    <version>1.1.0-beta.1</version>
</dependency>
```

## Client Creation

### With API Key

```java
import com.azure.ai.vision.imageanalysis.ImageAnalysisClient;
import com.azure.ai.vision.imageanalysis.ImageAnalysisClientBuilder;
import com.azure.core.credential.KeyCredential;

String endpoint = System.getenv("VISION_ENDPOINT");
String key = System.getenv("VISION_KEY");

ImageAnalysisClient client = new ImageAnalysisClientBuilder()
    .endpoint(endpoint)
    .credential(new KeyCredential(key))
    .buildClient();
```

### Async Client

```java
import com.azure.ai.vision.imageanalysis.ImageAnalysisAsyncClient;

ImageAnalysisAsyncClient asyncClient = new ImageAnalysisClientBuilder()
    .endpoint(endpoint)
    .credential(new KeyCredential(key))
    .buildAsyncClient();
```

### With DefaultAzureCredential

```java
import com.azure.identity.DefaultAzureCredentialBuilder;

ImageAnalysisClient client = new ImageAnalysisClientBuilder()
    .endpoint(endpoint)
    .credential(new DefaultAzureCredentialBuilder().build())
    .buildClient();
```

## Visual Features

| Feature | Description |
|---------|-------------|
| `CAPTION` | Generate human-readable image description |
| `DENSE_CAPTIONS` | Captions for up to 10 regions |
| `READ` | OCR - Extract text from images |
| `TAGS` | Content tags for objects, scenes, actions |
| `OBJECTS` | Detect objects with bounding boxes |
| `SMART_CROPS` | Smart thumbnail regions |
| `PEOPLE` | Detect people with locations |

## Core Patterns

### Generate Caption

```java
import com.azure.ai.vision.imageanalysis.models.*;
import com.azure.core.util.BinaryData;
import java.io.File;
import java.util.Arrays;

// From file
BinaryData imageData = BinaryData.fromFile(new File("image.jpg").toPath());

ImageAnalysisResult result = client.analyze(
    imageData,
    Arrays.asList(VisualFeatures.CAPTION),
    new ImageAnalysisOptions().setGenderNeutralCaption(true));

System.out.printf("Caption: \"%s\" (confidence: %.4f)%n",
    result.getCaption().getText(),
    result.getCaption().getConfidence());
```

### Generate Caption from URL

```java
ImageAnalysisResult result = client.analyzeFromUrl(
    "https://example.com/image.jpg",
    Arrays.asList(VisualFeatures.CAPTION),
    new ImageAnalysisOptions().setGenderNeutralCaption(true));

System.out.printf("Caption: \"%s\"%n", result.getCaption().getText());
```

### Extract Text (OCR)

```java
ImageAnalysisResult result = client.analyze(
    BinaryData.fromFile(new File("document.jpg").toPath()),
    Arrays.asList(VisualFeatures.READ),
    null);

for (DetectedTextBlock block : result.getRead().getBlocks()) {
    for (DetectedTextLine line : block.getLines()) {
        System.out.printf("Line: '%s'%n", line.getText());
        System.out.printf("  Bounding polygon: %s%n", line.getBoundingPolygon());
        
        for (DetectedTextWord word : line.getWords()) {
            System.out.printf("  Word: '%s' (confidence: %.4f)%n",
                word.getText(),
                word.getConfidence());
        }
    }
}
```

### Detect Objects

```java
ImageAnalysisResult result = client.analyzeFromUrl(
    imageUrl,
    Arrays.asList(VisualFeatures.OBJECTS),
    null);

for (DetectedObject obj : result.getObjects()) {
    System.out.printf("Object: %s (confidence: %.4f)%n",
        obj.getTags().get(0).getName(),
        obj.getTags().get(0).getConfidence());
    
    ImageBoundingBox box = obj.getBoundingBox();
    System.out.printf("  Location: x=%d, y=%d, w=%d, h=%d%n",
        box.getX(), box.getY(), box.getWidth(), box.getHeight());
}
```

### Get Tags

```java
ImageAnalysisResult result = client.analyzeFromUrl(
    imageUrl,
    Arrays.asList(VisualFeatures.TAGS),
    null);

for (DetectedTag tag : result.getTags()) {
    System.out.printf("Tag: %s (confidence: %.4f)%n",
        tag.getName(),
        tag.getConfidence());
}
```

### Detect People

```java
ImageAnalysisResult result = client.analyzeFromUrl(
    imageUrl,
    Arrays.asList(VisualFeatures.PEOPLE),
    null);

for (DetectedPerson person : result.getPeople()) {
    ImageBoundingBox box = person.getBoundingBox();
    System.out.printf("Person at x=%d, y=%d (confidence: %.4f)%n",
        box.getX(), box.getY(), person.getConfidence());
}
```

### Smart Cropping

```java
ImageAnalysisResult result = client.analyzeFromUrl(
    imageUrl,
    Arrays.asList(VisualFeatures.SMART_CROPS),
    new ImageAnalysisOptions().setSmartCropsAspectRatios(Arrays.asList(1.0, 1.5)));

for (CropRegion crop : result.getSmartCrops()) {
    System.out.printf("Crop region: aspect=%.2f, x=%d, y=%d, w=%d, h=%d%n",
        crop.getAspectRatio(),
        crop.getBoundingBox().getX(),
        crop.getBoundingBox().getY(),
        crop.getBoundingBox().getWidth(),
        crop.getBoundingBox().getHeight());
}
```

### Dense Captions

```java
ImageAnalysisResult result = client.analyzeFromUrl(
    imageUrl,
    Arrays.asList(VisualFeatures.DENSE_CAPTIONS),
    new ImageAnalysisOptions().setGenderNeutralCaption(true));

for (DenseCaption caption : result.getDenseCaptions()) {
    System.out.printf("Caption: \"%s\" (confidence: %.4f)%n",
        caption.getText(),
        caption.getConfidence());
    System.out.printf("  Region: x=%d, y=%d, w=%d, h=%d%n",
        caption.getBoundingBox().getX(),
        caption.getBoundingBox().getY(),
        caption.getBoundingBox().getWidth(),
        caption.getBoundingBox().getHeight());
}
```

### Multiple Features

```java
ImageAnalysisResult result = client.analyzeFromUrl(
    imageUrl,
    Arrays.asList(
        VisualFeatures.CAPTION,
        VisualFeatures.TAGS,
        VisualFeatures.OBJECTS,
        VisualFeatures.READ),
    new ImageAnalysisOptions()
        .setGenderNeutralCaption(true)
        .setLanguage("en"));

// Access all results
System.out.println("Caption: " + result.getCaption().getText());
System.out.println("Tags: " + result.getTags().size());
System.out.println("Objects: " + result.getObjects().size());
System.out.println("Text blocks: " + result.getRead().getBlocks().size());
```

### Async Analysis

```java
asyncClient.analyzeFromUrl(
    imageUrl,
    Arrays.asList(VisualFeatures.CAPTION),
    null)
    .subscribe(
        result -> System.out.println("Caption: " + result.getCaption().getText()),
        error -> System.err.println("Error: " + error.getMessage()),
        () -> System.out.println("Complete")
    );
```

## Error Handling

```java
import com.azure.core.exception.HttpResponseException;

try {
    client.analyzeFromUrl(imageUrl, Arrays.asList(VisualFeatures.CAPTION), null);
} catch (HttpResponseException e) {
    System.out.println("Status: " + e.getResponse().getStatusCode());
    System.out.println("Error: " + e.getMessage());
}
```

## Environment Variables

```bash
VISION_ENDPOINT=https://<resource>.cognitiveservices.azure.com/
VISION_KEY=<your-api-key>
```

## Image Requirements

- Formats: JPEG, PNG, GIF, BMP, WEBP, ICO, TIFF, MPO
- Size: < 20 MB
- Dimensions: 50x50 to 16000x16000 pixels

## Regional Availability

Caption and Dense Captions require GPU-supported regions. Check [supported regions](https://learn.microsoft.com/azure/ai-services/computer-vision/concept-describe-images-40) before deployment.

## Trigger Phrases

- "image analysis Java"
- "Azure Vision SDK"
- "image captioning"
- "OCR image text extraction"
- "object detection image"
- "smart crop thumbnail"
- "detect people image"

## When to Use
This skill is applicable to execute the workflow or actions described in the overview.

Related Skills

azure-ai-vision-imageanalysis-py

31392
from sickn33/antigravity-awesome-skills

Azure AI Vision Image Analysis SDK for captions, tags, objects, OCR, people detection, and smart cropping. Use for computer vision and image understanding tasks.

Image AnalysisClaude

n8n-code-javascript

31392
from sickn33/antigravity-awesome-skills

Write JavaScript code in n8n Code nodes. Use when writing JavaScript in n8n, using $input/$json/$node syntax, making HTTP requests with $helpers, working with dates using DateTime, troubleshooting Code node errors, or choosing between Code node modes.

Programming & DevelopmentClaude

microsoft-azure-webjobs-extensions-authentication-events-dotnet

31392
from sickn33/antigravity-awesome-skills

Microsoft Entra Authentication Events SDK for .NET. Azure Functions triggers for custom authentication extensions.

Identity Management / Authentication & AuthorizationClaude

javascript-typescript-typescript-scaffold

31392
from sickn33/antigravity-awesome-skills

You are a TypeScript project architecture expert specializing in scaffolding production-ready Node.js and frontend applications. Generate complete project structures with modern tooling (pnpm, Vite, N

Code GenerationClaude

javascript-pro

31392
from sickn33/antigravity-awesome-skills

Master modern JavaScript with ES6+, async patterns, and Node.js APIs. Handles promises, event loops, and browser/Node compatibility.

Programming & DevelopmentClaude

javascript-mastery

31392
from sickn33/antigravity-awesome-skills

33+ essential JavaScript concepts every developer should know, inspired by [33-js-concepts](https://github.com/leonardomso/33-js-concepts).

Programming Language HelpClaude

java-pro

31392
from sickn33/antigravity-awesome-skills

Master Java 21+ with modern features like virtual threads, pattern matching, and Spring Boot 3.x. Expert in the latest Java ecosystem including GraalVM, Project Loom, and cloud-native patterns.

Programming & DevelopmentClaude

hugging-face-vision-trainer

31392
from sickn33/antigravity-awesome-skills

Train or fine-tune vision models on Hugging Face Jobs for detection, classification, and SAM or SAM2 segmentation.

Computer VisionClaude

azure-web-pubsub-ts

31392
from sickn33/antigravity-awesome-skills

Real-time messaging with WebSocket connections and pub/sub patterns.

Messaging & CommunicationClaude

azure-storage-queue-ts

31392
from sickn33/antigravity-awesome-skills

Azure Queue Storage JavaScript/TypeScript SDK (@azure/storage-queue) for message queue operations. Use for sending, receiving, peeking, and deleting messages in queues.

Cloud IntegrationClaude

azure-storage-queue-py

31392
from sickn33/antigravity-awesome-skills

Azure Queue Storage SDK for Python. Use for reliable message queuing, task distribution, and asynchronous processing.

Cloud IntegrationClaude

azure-storage-file-share-ts

31392
from sickn33/antigravity-awesome-skills

Azure File Share JavaScript/TypeScript SDK (@azure/storage-file-share) for SMB file share operations.

Cloud Storage ManagementClaude