azure-ai-translation-document-py

Azure AI Document Translation SDK for batch translation of documents with format preservation. Use for translating Word, PDF, Excel, PowerPoint, and other document formats at scale.

31,392 stars
Complexity: easy

About this skill

This skill integrates the Azure AI Document Translation SDK for Python, empowering AI agents to perform high-volume, batch translation of diverse document types. It's engineered to maintain the original formatting, layout, and structure of files like Microsoft Word, PDF, Excel, and PowerPoint during the translation process. By leveraging this skill, AI agents can efficiently process large datasets of documents, translate them into multiple target languages, and store the translated versions in Azure Blob Storage. This makes it an invaluable tool for enterprise-level content localization, global communication, multilingual data processing, and automating workflows that require accurate and format-preserving document translation.

Best use case

Localizing large volumes of corporate documents (e.g., reports, manuals, legal contracts) for international business operations. Translating academic papers or research documents to facilitate global collaboration and information sharing. Enabling AI agents to process user-uploaded documents in various source languages and provide translated versions on demand. Automating the translation of internal company knowledge bases, training materials, or technical documentation. Translating financial statements, regulatory filings, or legal documents for international compliance and review.

Azure AI Document Translation SDK for batch translation of documents with format preservation. Use for translating Word, PDF, Excel, PowerPoint, and other document formats at scale.

The successful initiation and monitoring of a document translation job, resulting in a batch of translated documents available in a specified target Azure Blob Storage container. The translated documents will accurately reflect the original content while maintaining their original formatting and structural integrity. The skill will return the job status and details, allowing the agent to track progress and retrieve output locations.

Practical example

Example input

{"source_document_urls": ["https://yourstorage.blob.core.windows.net/source-container/quarterly_report_en.docx", "https://yourstorage.blob.core.windows.net/source-container/legal_agreement_en.pdf"], "target_language_codes": ["es", "fr"], "source_language_code": "en", "target_container_url": "https://yourstorage.blob.core.windows.net/translated-documents-output/"}

Example output

{"translation_job_id": "87c4a0e9-b2f5-4e78-a0d3-3b1e2c4f5a6b", "status": "Running", "created_on": "2023-10-27T10:00:00Z", "last_updated_on": "2023-10-27T10:05:00Z", "documents_total": 2, "documents_completed": 0, "documents_failed": 0, "summary_url": "https://yourtranslator.cognitiveservices.azure.com/translator/text/batch/v1.0-preview.1/batches/87c4a0e9-b2f5-4e78-a0d3-3b1e2c4f5a6b/documents"}

When to use this skill

  • When the requirement is to translate entire documents, rather than just isolated text snippets.
  • When preserving the original formatting, layout, and structure (e.g., tables, images, charts in Word, PDF, Excel, PowerPoint) is critical.
  • When dealing with a substantial batch of documents that all require translation into one or more target languages.
  • When integrating robust and scalable translation capabilities directly into an AI agent's automated workflow.

When not to use this skill

  • When only small text snippets or phrases need translation (a simpler text translation API would be more efficient).
  • When documents are plain text files and format preservation is not a concern, as a less resource-intensive text translation service might suffice.
  • When real-time, interactive translation of spoken language is required (consider speech-to-text and real-time text translation skills instead).
  • If strict data residency requirements or compliance policies preclude the use of public cloud-based Azure services for data processing.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/azure-ai-translation-document-py/SKILL.md --create-dirs "https://raw.githubusercontent.com/sickn33/antigravity-awesome-skills/main/plugins/antigravity-awesome-skills-claude/skills/azure-ai-translation-document-py/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/azure-ai-translation-document-py/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How azure-ai-translation-document-py Compares

Feature / Agentazure-ai-translation-document-pyStandard Approach
Platform SupportClaudeLimited / Varies
Context Awareness High Baseline
Installation ComplexityeasyN/A

Frequently Asked Questions

What does this skill do?

Azure AI Document Translation SDK for batch translation of documents with format preservation. Use for translating Word, PDF, Excel, PowerPoint, and other document formats at scale.

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Azure AI Document Translation SDK for Python

Client library for Azure AI Translator document translation service for batch document translation with format preservation.

## Installation

```bash
pip install azure-ai-translation-document
```

## Environment Variables

```bash
AZURE_DOCUMENT_TRANSLATION_ENDPOINT=https://<resource>.cognitiveservices.azure.com
AZURE_DOCUMENT_TRANSLATION_KEY=<your-api-key>  # If using API key

# Storage for source and target documents
AZURE_SOURCE_CONTAINER_URL=https://<storage>.blob.core.windows.net/<container>?<sas>
AZURE_TARGET_CONTAINER_URL=https://<storage>.blob.core.windows.net/<container>?<sas>
```

## Authentication

### API Key

```python
import os
from azure.ai.translation.document import DocumentTranslationClient
from azure.core.credentials import AzureKeyCredential

endpoint = os.environ["AZURE_DOCUMENT_TRANSLATION_ENDPOINT"]
key = os.environ["AZURE_DOCUMENT_TRANSLATION_KEY"]

client = DocumentTranslationClient(endpoint, AzureKeyCredential(key))
```

### Entra ID (Recommended)

```python
from azure.ai.translation.document import DocumentTranslationClient
from azure.identity import DefaultAzureCredential

client = DocumentTranslationClient(
    endpoint=os.environ["AZURE_DOCUMENT_TRANSLATION_ENDPOINT"],
    credential=DefaultAzureCredential()
)
```

## Basic Document Translation

```python
from azure.ai.translation.document import DocumentTranslationInput, TranslationTarget

source_url = os.environ["AZURE_SOURCE_CONTAINER_URL"]
target_url = os.environ["AZURE_TARGET_CONTAINER_URL"]

# Start translation job
poller = client.begin_translation(
    inputs=[
        DocumentTranslationInput(
            source_url=source_url,
            targets=[
                TranslationTarget(
                    target_url=target_url,
                    language="es"  # Translate to Spanish
                )
            ]
        )
    ]
)

# Wait for completion
result = poller.result()

print(f"Status: {poller.status()}")
print(f"Documents translated: {poller.details.documents_succeeded_count}")
print(f"Documents failed: {poller.details.documents_failed_count}")
```

## Multiple Target Languages

```python
poller = client.begin_translation(
    inputs=[
        DocumentTranslationInput(
            source_url=source_url,
            targets=[
                TranslationTarget(target_url=target_url_es, language="es"),
                TranslationTarget(target_url=target_url_fr, language="fr"),
                TranslationTarget(target_url=target_url_de, language="de")
            ]
        )
    ]
)
```

## Translate Single Document

```python
from azure.ai.translation.document import SingleDocumentTranslationClient

single_client = SingleDocumentTranslationClient(endpoint, AzureKeyCredential(key))

with open("document.docx", "rb") as f:
    document_content = f.read()

result = single_client.translate(
    body=document_content,
    target_language="es",
    content_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
)

# Save translated document
with open("document_es.docx", "wb") as f:
    f.write(result)
```

## Check Translation Status

```python
# Get all translation operations
operations = client.list_translation_statuses()

for op in operations:
    print(f"Operation ID: {op.id}")
    print(f"Status: {op.status}")
    print(f"Created: {op.created_on}")
    print(f"Total documents: {op.documents_total_count}")
    print(f"Succeeded: {op.documents_succeeded_count}")
    print(f"Failed: {op.documents_failed_count}")
```

## List Document Statuses

```python
# Get status of individual documents in a job
operation_id = poller.id
document_statuses = client.list_document_statuses(operation_id)

for doc in document_statuses:
    print(f"Document: {doc.source_document_url}")
    print(f"  Status: {doc.status}")
    print(f"  Translated to: {doc.translated_to}")
    if doc.error:
        print(f"  Error: {doc.error.message}")
```

## Cancel Translation

```python
# Cancel a running translation
client.cancel_translation(operation_id)
```

## Using Glossary

```python
from azure.ai.translation.document import TranslationGlossary

poller = client.begin_translation(
    inputs=[
        DocumentTranslationInput(
            source_url=source_url,
            targets=[
                TranslationTarget(
                    target_url=target_url,
                    language="es",
                    glossaries=[
                        TranslationGlossary(
                            glossary_url="https://<storage>.blob.core.windows.net/glossary/terms.csv?<sas>",
                            file_format="csv"
                        )
                    ]
                )
            ]
        )
    ]
)
```

## Supported Document Formats

```python
# Get supported formats
formats = client.get_supported_document_formats()

for fmt in formats:
    print(f"Format: {fmt.format}")
    print(f"  Extensions: {fmt.file_extensions}")
    print(f"  Content types: {fmt.content_types}")
```

## Supported Languages

```python
# Get supported languages
languages = client.get_supported_languages()

for lang in languages:
    print(f"Language: {lang.name} ({lang.code})")
```

## Async Client

```python
from azure.ai.translation.document.aio import DocumentTranslationClient
from azure.identity.aio import DefaultAzureCredential

async def translate_documents():
    async with DocumentTranslationClient(
        endpoint=endpoint,
        credential=DefaultAzureCredential()
    ) as client:
        poller = await client.begin_translation(inputs=[...])
        result = await poller.result()
```

## Supported Formats

| Category | Formats |
|----------|---------|
| Documents | DOCX, PDF, PPTX, XLSX, HTML, TXT, RTF |
| Structured | CSV, TSV, JSON, XML |
| Localization | XLIFF, XLF, MHTML |

## Storage Requirements

- Source and target containers must be Azure Blob Storage
- Use SAS tokens with appropriate permissions:
  - Source: Read, List
  - Target: Write, List

## Best Practices

1. **Use SAS tokens** with minimal required permissions
2. **Monitor long-running operations** with `poller.status()`
3. **Handle document-level errors** by iterating document statuses
4. **Use glossaries** for domain-specific terminology
5. **Separate target containers** for each language
6. **Use async client** for multiple concurrent jobs
7. **Check supported formats** before submitting documents

## When to Use
This skill is applicable to execute the workflow or actions described in the overview.

Related Skills

azure-ai-translation-ts

31392
from sickn33/antigravity-awesome-skills

Text and document translation with REST-style clients.

TranslationClaude

azure-ai-translation-text-py

31392
from sickn33/antigravity-awesome-skills

Azure AI Text Translation SDK for real-time text translation, transliteration, language detection, and dictionary lookup. Use for translating text content in applications.

TranslationClaude

microsoft-azure-webjobs-extensions-authentication-events-dotnet

31392
from sickn33/antigravity-awesome-skills

Microsoft Entra Authentication Events SDK for .NET. Azure Functions triggers for custom authentication extensions.

Identity Management / Authentication & AuthorizationClaude

documentation

31392
from sickn33/antigravity-awesome-skills

Documentation generation workflow covering API docs, architecture docs, README files, code comments, and technical writing.

Workflow & Automation BundlesClaude

documentation-templates

31392
from sickn33/antigravity-awesome-skills

Documentation templates and structure guidelines. README, API docs, code comments, and AI-friendly documentation.

Content GenerationClaude

documentation-generation-doc-generate

31392
from sickn33/antigravity-awesome-skills

You are a documentation expert specializing in creating comprehensive, maintainable documentation from code. Generate API docs, architecture diagrams, user guides, and technical references using AI-powered analysis and industry best practices.

DocumentationClaude

code-documentation-doc-generate

31392
from sickn33/antigravity-awesome-skills

You are a documentation expert specializing in creating comprehensive, maintainable documentation from code. Generate API docs, architecture diagrams, user guides, and technical references using AI-powered analysis and industry best practices.

Developer ToolsClaude

code-documentation-code-explain

31392
from sickn33/antigravity-awesome-skills

You are a code education expert specializing in explaining complex code through clear narratives, visual diagrams, and step-by-step breakdowns. Transform difficult concepts into understandable explanations for developers at all levels.

Code AnalysisClaude

azure-web-pubsub-ts

31392
from sickn33/antigravity-awesome-skills

Real-time messaging with WebSocket connections and pub/sub patterns.

Messaging & CommunicationClaude

azure-storage-queue-ts

31392
from sickn33/antigravity-awesome-skills

Azure Queue Storage JavaScript/TypeScript SDK (@azure/storage-queue) for message queue operations. Use for sending, receiving, peeking, and deleting messages in queues.

Cloud IntegrationClaude

azure-storage-queue-py

31392
from sickn33/antigravity-awesome-skills

Azure Queue Storage SDK for Python. Use for reliable message queuing, task distribution, and asynchronous processing.

Cloud IntegrationClaude

azure-storage-file-share-ts

31392
from sickn33/antigravity-awesome-skills

Azure File Share JavaScript/TypeScript SDK (@azure/storage-file-share) for SMB file share operations.

Cloud Storage ManagementClaude