azure-storage-file-datalake-py
Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations.
About this skill
This skill provides AI agents with the Python SDK for Azure Data Lake Storage Gen2, a highly scalable and secure storage solution designed for big data analytics workloads. It enables agents to perform a wide range of operations on hierarchical file systems, including creating, reading, updating, and deleting files and directories within ADLS Gen2. Agents can leverage this skill for managing large datasets, facilitating big data processing workflows, integrating with Azure's cloud storage services, and automating data orchestration tasks. The skill simplifies programmatic interaction with ADLS Gen2, abstracting away complex API calls through its intuitive Python interface, ensuring secure access via Azure Identity.
Best use case
Managing large datasets in Azure Data Lake Storage Gen2 programmatically; automating file and directory operations for big data pipelines; enabling AI agents to access, process, or ingest data stored in a hierarchical cloud storage system; facilitating data synchronization, migration, or archival tasks in ADLS Gen2 for analytics workloads.
Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations.
Successful execution of specified file and directory operations within Azure Data Lake Storage Gen2, such as uploading a file, downloading data, listing directory contents, creating new directories, or deleting an outdated dataset. The agent will be able to programmatically interact with ADLS Gen2 resources and report on the status of these operations.
Practical example
Example input
Upload the local CSV file located at `/local/path/to/transactions.csv` to the `/financial_data/daily_ingest/` directory within Azure Data Lake Storage Gen2, ensuring overwrite if it exists.
Example output
Successfully uploaded `transactions.csv` to `/financial_data/daily_ingest/transactions.csv` in Azure Data Lake Storage Gen2. ETag: '0x8D9E7A6B5C4D3E2F'.
When to use this skill
- When an AI agent needs to read or write files to Azure Data Lake Storage Gen2; when an agent needs to create, list, or manage directories for big data analytics; when integrating an AI agent's capabilities with Azure-based data processing workflows; when automating data ingestion, extraction, or transformation processes involving ADLS Gen2.
When not to use this skill
- For general-purpose object storage where hierarchical namespaces are not a primary requirement (consider Azure Blob Storage instead); when working with extremely small, frequently accessed files where a different caching or database solution might be more performant; when the data is not stored in Azure Data Lake Storage Gen2 or a similar cloud environment; when the agent's task does not involve file or directory manipulation in ADLS Gen2.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/azure-storage-file-datalake-py/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How azure-storage-file-datalake-py Compares
| Feature / Agent | azure-storage-file-datalake-py | Standard Approach |
|---|---|---|
| Platform Support | Claude | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | medium | N/A |
Frequently Asked Questions
What does this skill do?
Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations.
Which AI agents support this skill?
This skill is designed for Claude.
How difficult is it to install?
The installation complexity is rated as medium. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
Top AI Agents for Productivity
See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
SKILL.md Source
# Azure Data Lake Storage Gen2 SDK for Python
Hierarchical file system for big data analytics workloads.
## Installation
```bash
pip install azure-storage-file-datalake azure-identity
```
## Environment Variables
```bash
AZURE_STORAGE_ACCOUNT_URL=https://<account>.dfs.core.windows.net
```
## Authentication
```python
from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient
credential = DefaultAzureCredential()
account_url = "https://<account>.dfs.core.windows.net"
service_client = DataLakeServiceClient(account_url=account_url, credential=credential)
```
## Client Hierarchy
| Client | Purpose |
|--------|---------|
| `DataLakeServiceClient` | Account-level operations |
| `FileSystemClient` | Container (file system) operations |
| `DataLakeDirectoryClient` | Directory operations |
| `DataLakeFileClient` | File operations |
## File System Operations
```python
# Create file system (container)
file_system_client = service_client.create_file_system("myfilesystem")
# Get existing
file_system_client = service_client.get_file_system_client("myfilesystem")
# Delete
service_client.delete_file_system("myfilesystem")
# List file systems
for fs in service_client.list_file_systems():
print(fs.name)
```
## Directory Operations
```python
file_system_client = service_client.get_file_system_client("myfilesystem")
# Create directory
directory_client = file_system_client.create_directory("mydir")
# Create nested directories
directory_client = file_system_client.create_directory("path/to/nested/dir")
# Get directory client
directory_client = file_system_client.get_directory_client("mydir")
# Delete directory
directory_client.delete_directory()
# Rename/move directory
directory_client.rename_directory(new_name="myfilesystem/newname")
```
## File Operations
### Upload File
```python
# Get file client
file_client = file_system_client.get_file_client("path/to/file.txt")
# Upload from local file
with open("local-file.txt", "rb") as data:
file_client.upload_data(data, overwrite=True)
# Upload bytes
file_client.upload_data(b"Hello, Data Lake!", overwrite=True)
# Append data (for large files)
file_client.append_data(data=b"chunk1", offset=0, length=6)
file_client.append_data(data=b"chunk2", offset=6, length=6)
file_client.flush_data(12) # Commit the data
```
### Download File
```python
file_client = file_system_client.get_file_client("path/to/file.txt")
# Download all content
download = file_client.download_file()
content = download.readall()
# Download to file
with open("downloaded.txt", "wb") as f:
download = file_client.download_file()
download.readinto(f)
# Download range
download = file_client.download_file(offset=0, length=100)
```
### Delete File
```python
file_client.delete_file()
```
## List Contents
```python
# List paths (files and directories)
for path in file_system_client.get_paths():
print(f"{'DIR' if path.is_directory else 'FILE'}: {path.name}")
# List paths in directory
for path in file_system_client.get_paths(path="mydir"):
print(path.name)
# Recursive listing
for path in file_system_client.get_paths(path="mydir", recursive=True):
print(path.name)
```
## File/Directory Properties
```python
# Get properties
properties = file_client.get_file_properties()
print(f"Size: {properties.size}")
print(f"Last modified: {properties.last_modified}")
# Set metadata
file_client.set_metadata(metadata={"processed": "true"})
```
## Access Control (ACL)
```python
# Get ACL
acl = directory_client.get_access_control()
print(f"Owner: {acl['owner']}")
print(f"Permissions: {acl['permissions']}")
# Set ACL
directory_client.set_access_control(
owner="user-id",
permissions="rwxr-x---"
)
# Update ACL entries
from azure.storage.filedatalake import AccessControlChangeResult
directory_client.update_access_control_recursive(
acl="user:user-id:rwx"
)
```
## Async Client
```python
from azure.storage.filedatalake.aio import DataLakeServiceClient
from azure.identity.aio import DefaultAzureCredential
async def datalake_operations():
credential = DefaultAzureCredential()
async with DataLakeServiceClient(
account_url="https://<account>.dfs.core.windows.net",
credential=credential
) as service_client:
file_system_client = service_client.get_file_system_client("myfilesystem")
file_client = file_system_client.get_file_client("test.txt")
await file_client.upload_data(b"async content", overwrite=True)
download = await file_client.download_file()
content = await download.readall()
import asyncio
asyncio.run(datalake_operations())
```
## Best Practices
1. **Use hierarchical namespace** for file system semantics
2. **Use `append_data` + `flush_data`** for large file uploads
3. **Set ACLs at directory level** and inherit to children
4. **Use async client** for high-throughput scenarios
5. **Use `get_paths` with `recursive=True`** for full directory listing
6. **Set metadata** for custom file attributes
7. **Consider Blob API** for simple object storage use cases
## When to Use
This skill is applicable to execute the workflow or actions described in the overview.Related Skills
azure-storage-file-share-ts
Azure File Share JavaScript/TypeScript SDK (@azure/storage-file-share) for SMB file share operations.
azure-storage-file-share-py
Azure Storage File Share SDK for Python. Use for SMB file shares, directories, and file operations in the cloud.
azure-storage-blob-ts
Azure Blob Storage JavaScript/TypeScript SDK (@azure/storage-blob) for blob operations. Use for uploading, downloading, listing, and managing blobs and containers.
azure-storage-blob-rust
Azure Blob Storage SDK for Rust. Use for uploading, downloading, and managing blobs and containers.
azure-storage-blob-py
Azure Blob Storage SDK for Python. Use for uploading, downloading, listing blobs, managing containers, and blob lifecycle.
microsoft-azure-webjobs-extensions-authentication-events-dotnet
Microsoft Entra Authentication Events SDK for .NET. Azure Functions triggers for custom authentication extensions.
filesystem-context
Use for file-based context management, dynamic context discovery, and reducing context window bloat. Offload context to files for just-in-time loading.
file-path-traversal
Identify and exploit file path traversal (directory traversal) vulnerabilities that allow attackers to read arbitrary files on the server, potentially including sensitive configuration files, credentials, and source code.
file-organizer
6. Reduces Clutter: Identifies old files you probably don't need anymore
azure-web-pubsub-ts
Real-time messaging with WebSocket connections and pub/sub patterns.
azure-storage-queue-ts
Azure Queue Storage JavaScript/TypeScript SDK (@azure/storage-queue) for message queue operations. Use for sending, receiving, peeking, and deleting messages in queues.
azure-storage-queue-py
Azure Queue Storage SDK for Python. Use for reliable message queuing, task distribution, and asynchronous processing.