azure-ai-contentunderstanding-py
Azure AI Content Understanding SDK for Python. Use for multimodal content extraction from documents, images, audio, and video.
Best use case
azure-ai-contentunderstanding-py is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Azure AI Content Understanding SDK for Python. Use for multimodal content extraction from documents, images, audio, and video.
Teams using azure-ai-contentunderstanding-py should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/azure-ai-contentunderstanding-py/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How azure-ai-contentunderstanding-py Compares
| Feature / Agent | azure-ai-contentunderstanding-py | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Azure AI Content Understanding SDK for Python. Use for multimodal content extraction from documents, images, audio, and video.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Azure AI Content Understanding SDK for Python
Multimodal AI service that extracts semantic content from documents, video, audio, and image files for RAG and automated workflows.
## Installation
```bash
pip install azure-ai-contentunderstanding
```
## Environment Variables
```bash
CONTENTUNDERSTANDING_ENDPOINT=https://<resource>.cognitiveservices.azure.com/
```
## Authentication
```python
import os
from azure.ai.contentunderstanding import ContentUnderstandingClient
from azure.identity import DefaultAzureCredential
endpoint = os.environ["CONTENTUNDERSTANDING_ENDPOINT"]
credential = DefaultAzureCredential()
client = ContentUnderstandingClient(endpoint=endpoint, credential=credential)
```
## Core Workflow
Content Understanding operations are asynchronous long-running operations:
1. **Begin Analysis** — Start the analysis operation with `begin_analyze()` (returns a poller)
2. **Poll for Results** — Poll until analysis completes (SDK handles this with `.result()`)
3. **Process Results** — Extract structured results from `AnalyzeResult.contents`
## Prebuilt Analyzers
| Analyzer | Content Type | Purpose |
|----------|--------------|---------|
| `prebuilt-documentSearch` | Documents | Extract markdown for RAG applications |
| `prebuilt-imageSearch` | Images | Extract content from images |
| `prebuilt-audioSearch` | Audio | Transcribe audio with timing |
| `prebuilt-videoSearch` | Video | Extract frames, transcripts, summaries |
| `prebuilt-invoice` | Documents | Extract invoice fields |
## Analyze Document
```python
import os
from azure.ai.contentunderstanding import ContentUnderstandingClient
from azure.ai.contentunderstanding.models import AnalyzeInput
from azure.identity import DefaultAzureCredential
endpoint = os.environ["CONTENTUNDERSTANDING_ENDPOINT"]
client = ContentUnderstandingClient(
endpoint=endpoint,
credential=DefaultAzureCredential()
)
# Analyze document from URL
poller = client.begin_analyze(
analyzer_id="prebuilt-documentSearch",
inputs=[AnalyzeInput(url="https://example.com/document.pdf")]
)
result = poller.result()
# Access markdown content (contents is a list)
content = result.contents[0]
print(content.markdown)
```
## Access Document Content Details
```python
from azure.ai.contentunderstanding.models import MediaContentKind, DocumentContent
content = result.contents[0]
if content.kind == MediaContentKind.DOCUMENT:
document_content: DocumentContent = content # type: ignore
print(document_content.start_page_number)
```
## Analyze Image
```python
from azure.ai.contentunderstanding.models import AnalyzeInput
poller = client.begin_analyze(
analyzer_id="prebuilt-imageSearch",
inputs=[AnalyzeInput(url="https://example.com/image.jpg")]
)
result = poller.result()
content = result.contents[0]
print(content.markdown)
```
## Analyze Video
```python
from azure.ai.contentunderstanding.models import AnalyzeInput
poller = client.begin_analyze(
analyzer_id="prebuilt-videoSearch",
inputs=[AnalyzeInput(url="https://example.com/video.mp4")]
)
result = poller.result()
# Access video content (AudioVisualContent)
content = result.contents[0]
# Get transcript phrases with timing
for phrase in content.transcript_phrases:
print(f"[{phrase.start_time} - {phrase.end_time}]: {phrase.text}")
# Get key frames (for video)
for frame in content.key_frames:
print(f"Frame at {frame.time}: {frame.description}")
```
## Analyze Audio
```python
from azure.ai.contentunderstanding.models import AnalyzeInput
poller = client.begin_analyze(
analyzer_id="prebuilt-audioSearch",
inputs=[AnalyzeInput(url="https://example.com/audio.mp3")]
)
result = poller.result()
# Access audio transcript
content = result.contents[0]
for phrase in content.transcript_phrases:
print(f"[{phrase.start_time}] {phrase.text}")
```
## Custom Analyzers
Create custom analyzers with field schemas for specialized extraction:
```python
# Create custom analyzer
analyzer = client.create_analyzer(
analyzer_id="my-invoice-analyzer",
analyzer={
"description": "Custom invoice analyzer",
"base_analyzer_id": "prebuilt-documentSearch",
"field_schema": {
"fields": {
"vendor_name": {"type": "string"},
"invoice_total": {"type": "number"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"amount": {"type": "number"}
}
}
}
}
}
}
)
# Use custom analyzer
from azure.ai.contentunderstanding.models import AnalyzeInput
poller = client.begin_analyze(
analyzer_id="my-invoice-analyzer",
inputs=[AnalyzeInput(url="https://example.com/invoice.pdf")]
)
result = poller.result()
# Access extracted fields
print(result.fields["vendor_name"])
print(result.fields["invoice_total"])
```
## Analyzer Management
```python
# List all analyzers
analyzers = client.list_analyzers()
for analyzer in analyzers:
print(f"{analyzer.analyzer_id}: {analyzer.description}")
# Get specific analyzer
analyzer = client.get_analyzer("prebuilt-documentSearch")
# Delete custom analyzer
client.delete_analyzer("my-custom-analyzer")
```
## Async Client
```python
import asyncio
import os
from azure.ai.contentunderstanding.aio import ContentUnderstandingClient
from azure.ai.contentunderstanding.models import AnalyzeInput
from azure.identity.aio import DefaultAzureCredential
async def analyze_document():
endpoint = os.environ["CONTENTUNDERSTANDING_ENDPOINT"]
credential = DefaultAzureCredential()
async with ContentUnderstandingClient(
endpoint=endpoint,
credential=credential
) as client:
poller = await client.begin_analyze(
analyzer_id="prebuilt-documentSearch",
inputs=[AnalyzeInput(url="https://example.com/doc.pdf")]
)
result = await poller.result()
content = result.contents[0]
return content.markdown
asyncio.run(analyze_document())
```
## Content Types
| Class | For | Provides |
|-------|-----|----------|
| `DocumentContent` | PDF, images, Office docs | Pages, tables, figures, paragraphs |
| `AudioVisualContent` | Audio, video files | Transcript phrases, timing, key frames |
Both derive from `MediaContent` which provides basic info and markdown representation.
## Model Imports
```python
from azure.ai.contentunderstanding.models import (
AnalyzeInput,
AnalyzeResult,
MediaContentKind,
DocumentContent,
AudioVisualContent,
)
```
## Client Types
| Client | Purpose |
|--------|---------|
| `ContentUnderstandingClient` | Sync client for all operations |
| `ContentUnderstandingClient` (aio) | Async client for all operations |
## Best Practices
1. **Use `begin_analyze` with `AnalyzeInput`** — this is the correct method signature
2. **Access results via `result.contents[0]`** — results are returned as a list
3. **Use prebuilt analyzers** for common scenarios (document/image/audio/video search)
4. **Create custom analyzers** only for domain-specific field extraction
5. **Use async client** for high-throughput scenarios with `azure.identity.aio` credentials
6. **Handle long-running operations** — video/audio analysis can take minutes
7. **Use URL sources** when possible to avoid upload overhead
## When to Use
This skill is applicable to execute the workflow or actions described in the overview.Related Skills
azure-speech-to-text-rest-py
Azure Speech to Text REST API for short audio (Python). Use for simple speech recognition of audio files up to 60 seconds without the Speech SDK.
azure-mgmt-apimanagement-py
Azure API Management SDK for Python. Use for managing APIM services, APIs, products, subscriptions, and policies.
azure-mgmt-apimanagement-dotnet
Azure Resource Manager SDK for API Management in .NET.
azure-mgmt-apicenter-py
Azure API Center Management SDK for Python. Use for managing API inventory, metadata, and governance across your organization.
azure-mgmt-apicenter-dotnet
Azure API Center SDK for .NET. Centralized API inventory management with governance, versioning, and discovery.
azure-communication-callingserver-java
Azure Communication Services CallingServer (legacy) Java SDK. Note - This SDK is deprecated. Use azure-communication-callautomation instead for new projects. Only use this skill when maintaining le...
azure-storage-queue-ts
Azure Queue Storage JavaScript/TypeScript SDK (@azure/storage-queue) for message queue operations. Use for sending, receiving, peeking, and deleting messages in queues.
azure-storage-queue-py
Azure Queue Storage SDK for Python. Use for reliable message queuing, task distribution, and asynchronous processing.
azure-storage-file-share-ts
Azure File Share JavaScript/TypeScript SDK (@azure/storage-file-share) for SMB file share operations.
azure-storage-file-share-py
Azure Storage File Share SDK for Python. Use for SMB file shares, directories, and file operations in the cloud.
azure-storage-file-datalake-py
Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations.
azure-storage-blob-ts
Azure Blob Storage JavaScript/TypeScript SDK (@azure/storage-blob) for blob operations. Use for uploading, downloading, listing, and managing blobs and containers.