apache-nifi

Expert guidance for Apache NiFi data integration platform including flow design, processors, controller services, process groups, NiFi Registry integration, and cluster configuration. Use this when working with data flows, processors, or NiFi configuration.

16 stars

Best use case

apache-nifi is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Expert guidance for Apache NiFi data integration platform including flow design, processors, controller services, process groups, NiFi Registry integration, and cluster configuration. Use this when working with data flows, processors, or NiFi configuration.

Teams using apache-nifi should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/apache-nifi/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/design/apache-nifi/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/apache-nifi/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How apache-nifi Compares

Feature / Agentapache-nifiStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Expert guidance for Apache NiFi data integration platform including flow design, processors, controller services, process groups, NiFi Registry integration, and cluster configuration. Use this when working with data flows, processors, or NiFi configuration.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Apache NiFi Expert Skill

You are an expert in Apache NiFi, a powerful data integration and distribution platform.

## Core Concepts

### Architecture
- **FlowFiles**: Data packages flowing through NiFi (content + attributes)
- **Processors**: Building blocks for data transformation and routing
- **Connections**: Queues between processors with backpressure
- **Process Groups**: Containers for organizing flows
- **Controller Services**: Shared services (DB connections, SSL contexts, etc.)
- **Reporting Tasks**: Monitoring and metrics collection

### Key Processors
- **Data Ingestion**: GetFile, ListenHTTP, ConsumeKafka, GetFTP
- **Data Egress**: PutFile, InvokeHTTP, PublishKafka, PutFTP
- **Routing & Transformation**: RouteOnAttribute, UpdateAttribute, JoltTransformJSON
- **Content**: ExtractText, ReplaceText, SplitText, MergeContent
- **Database**: ExecuteSQL, PutSQL, QueryDatabaseTable
- **Scripting**: ExecuteScript (Groovy, Python, JavaScript)

## Flow Design Best Practices

### Organization
```
Process Group Structure:
├── Input Group (data ingestion)
├── Validation Group (data quality)
├── Transformation Group (ETL)
├── Enrichment Group (lookups)
└── Output Group (data egress)
```

### Performance
- Use backpressure thresholds (object count + data size)
- Enable connection load balancing for clusters
- Set appropriate concurrent tasks per processor
- Use batching for high-volume flows

### Error Handling
- Always configure failure relationships
- Use funnels to consolidate error handling
- Implement retry logic with penalization
- Log errors to provenance repository

## NiFi Registry Integration

### Version Control
```bash
# Connect to Registry
1. Add Registry Client in NiFi UI (Controller Settings)
2. Right-click Process Group → Version → Start version control
3. Select Registry, Bucket, and Flow name
4. Commit changes with descriptive messages
```

### Best Practices
- Version control all production flows
- Use meaningful commit messages
- Create separate buckets per environment (dev/test/prod)
- Tag stable releases

## Expression Language

### Common Expressions
```nifi
# Attributes
${filename}                           # Get attribute value
${filename:isEmpty()}                 # Check if empty
${filename:replace('.txt', '.csv')}   # String manipulation

# Date/Time
${now():format('yyyy-MM-dd')}         # Current date
${created:toDate('yyyy-MM-dd'):format('MM/dd/yyyy')} # Convert format

# Conditional
${attribute:equals('value'):ifElse('yes', 'no')} # If-else logic

# Content
${file.size}                          # FlowFile size
${file.size:gt(1000000)}             # Size comparison
```

## Configuration Files

### nifi.properties (Key Settings)
```properties
# Web UI
nifi.web.http.host=0.0.0.0
nifi.web.http.port=8080

# Clustering
nifi.cluster.is.node=true
nifi.cluster.node.address=nifi-node1
nifi.cluster.node.protocol.port=11443

# State Management
nifi.state.management.embedded.zookeeper.start=true
nifi.zookeeper.connect.string=localhost:2181

# Repository Settings
nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository
```

## Docker Deployment

### Single Node
```yaml
services:
  nifi:
    image: apache/nifi:latest
    ports:
      - "8443:8443"
    environment:
      - SINGLE_USER_CREDENTIALS_USERNAME=admin
      - SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUghvvgEvjnaLjFEB
    volumes:
      - ./nifi/conf:/opt/nifi/nifi-current/conf
      - ./nifi/content_repository:/opt/nifi/nifi-current/content_repository
      - ./nifi/database_repository:/opt/nifi/nifi-current/database_repository
      - ./nifi/flowfile_repository:/opt/nifi/nifi-current/flowfile_repository
      - ./nifi/provenance_repository:/opt/nifi/nifi-current/provenance_repository
      - ./nifi/state:/opt/nifi/nifi-current/state
```

### Cluster Configuration
```yaml
services:
  zookeeper:
    image: zookeeper:3.8
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=zookeeper:2888:3888;2181

  nifi-1:
    image: apache/nifi:latest
    environment:
      - NIFI_CLUSTER_IS_NODE=true
      - NIFI_CLUSTER_NODE_PROTOCOL_PORT=11443
      - NIFI_ZK_CONNECT_STRING=zookeeper:2181
      - NIFI_ELECTION_MAX_WAIT=1 min
    depends_on:
      - zookeeper
```

## Security

### Authentication
- Single-User: Basic username/password (dev/test)
- LDAP: Enterprise directory integration
- OIDC: OAuth2/OpenID Connect (Keycloak)
- Certificate: mTLS with client certificates

### Authorization
- File-based: policies.xml
- Ranger: Apache Ranger integration
- OIDC groups: Map OIDC groups to NiFi policies

## Monitoring & Troubleshooting

### Key Metrics
- FlowFile counts and sizes in queues
- Processor task durations
- System diagnostics (CPU, memory, disk)
- Bulletin board for errors

### Logs
```bash
# Main logs
logs/nifi-app.log          # Application log
logs/nifi-user.log         # User actions
logs/nifi-bootstrap.log    # Bootstrap process

# Enable DEBUG logging
conf/logback.xml → Set logger level to DEBUG
```

### Common Issues
| Issue | Solution |
|-------|----------|
| Out of Memory | Increase heap in bootstrap.conf |
| Disk full | Clean content/provenance repos |
| Connection timeout | Check network/firewall rules |
| FlowFile stuck | Check backpressure, processor logs |

## PLC4X Integration

### PLC4X Processors
- **GetPLC**: Read data from PLCs (Siemens S7, Modbus, OPC-UA, etc.)
- **PutPLC**: Write data to PLCs

### Example Configuration
```
Connection String: s7://192.168.1.100
Fields: DB1.DBW0:INT,DB1.DBD4:REAL,DB1.DBX8.0:BOOL
Polling Interval: 1 sec
```

## API & CLI

### NiFi REST API
```bash
# Get process groups
curl -u admin:password https://localhost:8443/nifi-api/process-groups/root

# Get processor status
curl -u admin:password https://localhost:8443/nifi-api/processors/{id}/status
```

### NiFi CLI (Toolkit)
```bash
# Export flow
nifi-toolkit/cli.sh nifi export-flow -u http://localhost:8080/nifi -b /tmp/backup.json

# Import flow
nifi-toolkit/cli.sh nifi import-flow -u http://localhost:8080/nifi -i /tmp/backup.json
```

## Resources
- [Apache NiFi Docs](https://nifi.apache.org/docs.html)
- [Expression Language Guide](https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html)
- [Processor Documentation](https://nifi.apache.org/docs/nifi-docs/components/)

Related Skills

unified-cicd-platform

16
from diegosouzapw/awesome-omni-skill

This skill should be used when the user is choosing a CI/CD platform, migrating between CI/CD providers, consolidating build and deployment pipelines, designing pipeline architecture across application and infrastructure code, setting up drift detection, configuring OIDC authentication for pipelines, or discussing the operational cost of multiple CI/CD systems. Covers platform selection, the cost of multi-platform CI/CD, what 'everything on one platform' means, OIDC pipeline authentication, Jenkins migration, and scheduled pipeline jobs like drift detection.

unified-find-skills

16
from diegosouzapw/awesome-omni-skill

Helps users discover and install agent skills from skills.sh, clawhub.com, and tessl.io. Use when the user asks to find a skill for a task, extend agent capabilities, or search for tools/workflows.

moai-lang-unified

16
from diegosouzapw/awesome-omni-skill

Unified enterprise programming language skill covering 25+ languages including Python 3.13, TypeScript 5.9, Go 1.23, Rust 1.91, Java 21, JavaScript ES2025, C++, C#, PHP, Swift, Kotlin, Scala, Elixir, Ruby, R, SQL, Shell, and more with patterns, best practices, and Context7 integration

canifi-skill-generator

16
from diegosouzapw/awesome-omni-skill

Self-evolving skill that enables Canifi to create, install, and manage new skills autonomously

apache-spark-data-processing

16
from diegosouzapw/awesome-omni-skill

Complete guide for Apache Spark data processing including RDDs, DataFrames, Spark SQL, streaming, MLlib, and production deployment

apache-airflow-orchestration

16
from diegosouzapw/awesome-omni-skill

Complete guide for Apache Airflow orchestration including DAGs, operators, sensors, XComs, task dependencies, dynamic workflows, and production deployment

apache-nifi-registry

16
from diegosouzapw/awesome-omni-skill

Expert guidance for Apache NiFi Registry including flow versioning, buckets, Git integration, security, and registry client configuration. Use this when working with flow version control and registry management.

apache

16
from diegosouzapw/awesome-omni-skill

Apache HTTP Server 配置

bgo

10
from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

google-docs-manager

16
from diegosouzapw/awesome-omni-skill

Expert in Google Docs management. Use when creating, reading, updating, formatting, or managing Google Docs with markdown support, advanced formatting, tables with full manipulation, images with styling, lists, headers/footers, and table of contents.

genesis-tools:living-docs

16
from diegosouzapw/awesome-omni-skill

Self-maintaining documentation system. Bootstraps, validates, refines, and optimizes codebase documentation. Creates minimal, token-efficient doc chunks. Use when creating, updating, or auditing project documentation.

generate-docstrings

16
from diegosouzapw/awesome-omni-skill

Create docstrings for functions and classes. Use when documenting code APIs.