azure-storage-file-datalake-py

Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations. Triggers: "data lake", "DataLakeServiceClient", "FileSystemClient", "ADLS Gen2", "hierarchical namespace".

25 stars

byComeOnOliver

View on GitHub Installation ↓

Best use case

azure-storage-file-datalake-py is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using azure-storage-file-datalake-py should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/azure-storage-file-datalake-py/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/aiskillstore/marketplace/sickn33/azure-storage-file-datalake-py/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/azure-storage-file-datalake-py/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How azure-storage-file-datalake-py Compares

Feature / Agent	azure-storage-file-datalake-py	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Azure Data Lake Storage Gen2 SDK for Python

Hierarchical file system for big data analytics workloads.

## Installation

```bash
pip install azure-storage-file-datalake azure-identity
```

## Environment Variables

```bash
AZURE_STORAGE_ACCOUNT_URL=https://<account>.dfs.core.windows.net
```

## Authentication

```python
from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient

credential = DefaultAzureCredential()
account_url = "https://<account>.dfs.core.windows.net"

service_client = DataLakeServiceClient(account_url=account_url, credential=credential)
```

## Client Hierarchy

| Client | Purpose |
|--------|---------|
| `DataLakeServiceClient` | Account-level operations |
| `FileSystemClient` | Container (file system) operations |
| `DataLakeDirectoryClient` | Directory operations |
| `DataLakeFileClient` | File operations |

## File System Operations

```python
# Create file system (container)
file_system_client = service_client.create_file_system("myfilesystem")

# Get existing
file_system_client = service_client.get_file_system_client("myfilesystem")

# Delete
service_client.delete_file_system("myfilesystem")

# List file systems
for fs in service_client.list_file_systems():
    print(fs.name)
```

## Directory Operations

```python
file_system_client = service_client.get_file_system_client("myfilesystem")

# Create directory
directory_client = file_system_client.create_directory("mydir")

# Create nested directories
directory_client = file_system_client.create_directory("path/to/nested/dir")

# Get directory client
directory_client = file_system_client.get_directory_client("mydir")

# Delete directory
directory_client.delete_directory()

# Rename/move directory
directory_client.rename_directory(new_name="myfilesystem/newname")
```

## File Operations

### Upload File

```python
# Get file client
file_client = file_system_client.get_file_client("path/to/file.txt")

# Upload from local file
with open("local-file.txt", "rb") as data:
    file_client.upload_data(data, overwrite=True)

# Upload bytes
file_client.upload_data(b"Hello, Data Lake!", overwrite=True)

# Append data (for large files)
file_client.append_data(data=b"chunk1", offset=0, length=6)
file_client.append_data(data=b"chunk2", offset=6, length=6)
file_client.flush_data(12)  # Commit the data
```

### Download File

```python
file_client = file_system_client.get_file_client("path/to/file.txt")

# Download all content
download = file_client.download_file()
content = download.readall()

# Download to file
with open("downloaded.txt", "wb") as f:
    download = file_client.download_file()
    download.readinto(f)

# Download range
download = file_client.download_file(offset=0, length=100)
```

### Delete File

```python
file_client.delete_file()
```

## List Contents

```python
# List paths (files and directories)
for path in file_system_client.get_paths():
    print(f"{'DIR' if path.is_directory else 'FILE'}: {path.name}")

# List paths in directory
for path in file_system_client.get_paths(path="mydir"):
    print(path.name)

# Recursive listing
for path in file_system_client.get_paths(path="mydir", recursive=True):
    print(path.name)
```

## File/Directory Properties

```python
# Get properties
properties = file_client.get_file_properties()
print(f"Size: {properties.size}")
print(f"Last modified: {properties.last_modified}")

# Set metadata
file_client.set_metadata(metadata={"processed": "true"})
```

## Access Control (ACL)

```python
# Get ACL
acl = directory_client.get_access_control()
print(f"Owner: {acl['owner']}")
print(f"Permissions: {acl['permissions']}")

# Set ACL
directory_client.set_access_control(
    owner="user-id",
    permissions="rwxr-x---"
)

# Update ACL entries
from azure.storage.filedatalake import AccessControlChangeResult
directory_client.update_access_control_recursive(
    acl="user:user-id:rwx"
)
```

## Async Client

```python
from azure.storage.filedatalake.aio import DataLakeServiceClient
from azure.identity.aio import DefaultAzureCredential

async def datalake_operations():
    credential = DefaultAzureCredential()
    
    async with DataLakeServiceClient(
        account_url="https://<account>.dfs.core.windows.net",
        credential=credential
    ) as service_client:
        file_system_client = service_client.get_file_system_client("myfilesystem")
        file_client = file_system_client.get_file_client("test.txt")
        
        await file_client.upload_data(b"async content", overwrite=True)
        
        download = await file_client.download_file()
        content = await download.readall()

import asyncio
asyncio.run(datalake_operations())
```

## Best Practices

1. **Use hierarchical namespace** for file system semantics
2. **Use `append_data` + `flush_data`** for large file uploads
3. **Set ACLs at directory level** and inherit to children
4. **Use async client** for high-throughput scenarios
5. **Use `get_paths` with `recursive=True`** for full directory listing
6. **Set metadata** for custom file attributes
7. **Consider Blob API** for simple object storage use cases

Related Skills

memory-profiler-setup

from ComeOnOliver/skillshub

Memory Profiler Setup - Auto-activating skill for Performance Testing. Triggers on: memory profiler setup, memory profiler setup Part of the Performance Testing skill category.

makefile-generator

from ComeOnOliver/skillshub

Makefile Generator - Auto-activating skill for DevOps Basics. Triggers on: makefile generator, makefile generator Part of the DevOps Basics skill category.

inference-latency-profiler

from ComeOnOliver/skillshub

Inference Latency Profiler - Auto-activating skill for ML Deployment. Triggers on: inference latency profiler, inference latency profiler Part of the ML Deployment skill category.

generating-docker-compose-files

from ComeOnOliver/skillshub

Execute use when you need to work with Docker Compose. This skill provides Docker Compose file generation with comprehensive guidance and automation. Trigger with phrases like "generate docker-compose", "create compose file", or "configure multi-container app".

file-format-converter

from ComeOnOliver/skillshub

File Format Converter - Auto-activating skill for Data Pipelines. Triggers on: file format converter, file format converter Part of the Data Pipelines skill category.

dockerfile-generator

from ComeOnOliver/skillshub

Dockerfile Generator - Auto-activating skill for DevOps Basics. Triggers on: dockerfile generator, dockerfile generator Part of the DevOps Basics skill category.

database-query-profiler

from ComeOnOliver/skillshub

Database Query Profiler - Auto-activating skill for Performance Testing. Triggers on: database query profiler, database query profiler Part of the Performance Testing skill category.

cpu-profiler-config

from ComeOnOliver/skillshub

Cpu Profiler Config - Auto-activating skill for Performance Testing. Triggers on: cpu profiler config, cpu profiler config Part of the Performance Testing skill category.

box-cloud-filesystem

from ComeOnOliver/skillshub

Cloud filesystem operations via Box CLI. Use when the user mentions Box, cloud files, cloud storage, uploading to the cloud, sharing files, document management, or syncing project files offsite. Trigger with "upload to box", "save to cloud", "pull from box", "search my box files", "share this file", "box sync", "cloud backup", or "box filesystem".

batch-file-processor

from ComeOnOliver/skillshub

Batch File Processor - Auto-activating skill for Business Automation. Triggers on: batch file processor, batch file processor Part of the Business Automation skill category.

azure-ml-deployer

from ComeOnOliver/skillshub

Azure Ml Deployer - Auto-activating skill for ML Deployment. Triggers on: azure ml deployer, azure ml deployer Part of the ML Deployment skill category.

defold-proto-file-editing

from ComeOnOliver/skillshub

Creates and edits Defold resource and component files that use Protobuf Text Format (.collection, .go, .atlas, .sprite, .gui, .collisionobject, .convexshape, .label, .font, .material, .model, .mesh, .particlefx, .sound, .camera, .factory, .collectionfactory, .collectionproxy, .tilemap, .tilesource, .objectinterpolation). Use when asked to create, modify, or configure any Defold proto text format file.