apache-nifi
Expert guidance for Apache NiFi data integration platform including flow design, processors, controller services, process groups, NiFi Registry integration, and cluster configuration. Use this when working with data flows, processors, or NiFi configuration.
Best use case
apache-nifi is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Expert guidance for Apache NiFi data integration platform including flow design, processors, controller services, process groups, NiFi Registry integration, and cluster configuration. Use this when working with data flows, processors, or NiFi configuration.
Teams using apache-nifi should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/apache-nifi/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How apache-nifi Compares
| Feature / Agent | apache-nifi | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Expert guidance for Apache NiFi data integration platform including flow design, processors, controller services, process groups, NiFi Registry integration, and cluster configuration. Use this when working with data flows, processors, or NiFi configuration.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Apache NiFi Expert Skill
You are an expert in Apache NiFi, a powerful data integration and distribution platform.
## Core Concepts
### Architecture
- **FlowFiles**: Data packages flowing through NiFi (content + attributes)
- **Processors**: Building blocks for data transformation and routing
- **Connections**: Queues between processors with backpressure
- **Process Groups**: Containers for organizing flows
- **Controller Services**: Shared services (DB connections, SSL contexts, etc.)
- **Reporting Tasks**: Monitoring and metrics collection
### Key Processors
- **Data Ingestion**: GetFile, ListenHTTP, ConsumeKafka, GetFTP
- **Data Egress**: PutFile, InvokeHTTP, PublishKafka, PutFTP
- **Routing & Transformation**: RouteOnAttribute, UpdateAttribute, JoltTransformJSON
- **Content**: ExtractText, ReplaceText, SplitText, MergeContent
- **Database**: ExecuteSQL, PutSQL, QueryDatabaseTable
- **Scripting**: ExecuteScript (Groovy, Python, JavaScript)
## Flow Design Best Practices
### Organization
```
Process Group Structure:
├── Input Group (data ingestion)
├── Validation Group (data quality)
├── Transformation Group (ETL)
├── Enrichment Group (lookups)
└── Output Group (data egress)
```
### Performance
- Use backpressure thresholds (object count + data size)
- Enable connection load balancing for clusters
- Set appropriate concurrent tasks per processor
- Use batching for high-volume flows
### Error Handling
- Always configure failure relationships
- Use funnels to consolidate error handling
- Implement retry logic with penalization
- Log errors to provenance repository
## NiFi Registry Integration
### Version Control
```bash
# Connect to Registry
1. Add Registry Client in NiFi UI (Controller Settings)
2. Right-click Process Group → Version → Start version control
3. Select Registry, Bucket, and Flow name
4. Commit changes with descriptive messages
```
### Best Practices
- Version control all production flows
- Use meaningful commit messages
- Create separate buckets per environment (dev/test/prod)
- Tag stable releases
## Expression Language
### Common Expressions
```nifi
# Attributes
${filename} # Get attribute value
${filename:isEmpty()} # Check if empty
${filename:replace('.txt', '.csv')} # String manipulation
# Date/Time
${now():format('yyyy-MM-dd')} # Current date
${created:toDate('yyyy-MM-dd'):format('MM/dd/yyyy')} # Convert format
# Conditional
${attribute:equals('value'):ifElse('yes', 'no')} # If-else logic
# Content
${file.size} # FlowFile size
${file.size:gt(1000000)} # Size comparison
```
## Configuration Files
### nifi.properties (Key Settings)
```properties
# Web UI
nifi.web.http.host=0.0.0.0
nifi.web.http.port=8080
# Clustering
nifi.cluster.is.node=true
nifi.cluster.node.address=nifi-node1
nifi.cluster.node.protocol.port=11443
# State Management
nifi.state.management.embedded.zookeeper.start=true
nifi.zookeeper.connect.string=localhost:2181
# Repository Settings
nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository
```
## Docker Deployment
### Single Node
```yaml
services:
nifi:
image: apache/nifi:latest
ports:
- "8443:8443"
environment:
- SINGLE_USER_CREDENTIALS_USERNAME=admin
- SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUghvvgEvjnaLjFEB
volumes:
- ./nifi/conf:/opt/nifi/nifi-current/conf
- ./nifi/content_repository:/opt/nifi/nifi-current/content_repository
- ./nifi/database_repository:/opt/nifi/nifi-current/database_repository
- ./nifi/flowfile_repository:/opt/nifi/nifi-current/flowfile_repository
- ./nifi/provenance_repository:/opt/nifi/nifi-current/provenance_repository
- ./nifi/state:/opt/nifi/nifi-current/state
```
### Cluster Configuration
```yaml
services:
zookeeper:
image: zookeeper:3.8
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=zookeeper:2888:3888;2181
nifi-1:
image: apache/nifi:latest
environment:
- NIFI_CLUSTER_IS_NODE=true
- NIFI_CLUSTER_NODE_PROTOCOL_PORT=11443
- NIFI_ZK_CONNECT_STRING=zookeeper:2181
- NIFI_ELECTION_MAX_WAIT=1 min
depends_on:
- zookeeper
```
## Security
### Authentication
- Single-User: Basic username/password (dev/test)
- LDAP: Enterprise directory integration
- OIDC: OAuth2/OpenID Connect (Keycloak)
- Certificate: mTLS with client certificates
### Authorization
- File-based: policies.xml
- Ranger: Apache Ranger integration
- OIDC groups: Map OIDC groups to NiFi policies
## Monitoring & Troubleshooting
### Key Metrics
- FlowFile counts and sizes in queues
- Processor task durations
- System diagnostics (CPU, memory, disk)
- Bulletin board for errors
### Logs
```bash
# Main logs
logs/nifi-app.log # Application log
logs/nifi-user.log # User actions
logs/nifi-bootstrap.log # Bootstrap process
# Enable DEBUG logging
conf/logback.xml → Set logger level to DEBUG
```
### Common Issues
| Issue | Solution |
|-------|----------|
| Out of Memory | Increase heap in bootstrap.conf |
| Disk full | Clean content/provenance repos |
| Connection timeout | Check network/firewall rules |
| FlowFile stuck | Check backpressure, processor logs |
## PLC4X Integration
### PLC4X Processors
- **GetPLC**: Read data from PLCs (Siemens S7, Modbus, OPC-UA, etc.)
- **PutPLC**: Write data to PLCs
### Example Configuration
```
Connection String: s7://192.168.1.100
Fields: DB1.DBW0:INT,DB1.DBD4:REAL,DB1.DBX8.0:BOOL
Polling Interval: 1 sec
```
## API & CLI
### NiFi REST API
```bash
# Get process groups
curl -u admin:password https://localhost:8443/nifi-api/process-groups/root
# Get processor status
curl -u admin:password https://localhost:8443/nifi-api/processors/{id}/status
```
### NiFi CLI (Toolkit)
```bash
# Export flow
nifi-toolkit/cli.sh nifi export-flow -u http://localhost:8080/nifi -b /tmp/backup.json
# Import flow
nifi-toolkit/cli.sh nifi import-flow -u http://localhost:8080/nifi -i /tmp/backup.json
```
## Resources
- [Apache NiFi Docs](https://nifi.apache.org/docs.html)
- [Expression Language Guide](https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html)
- [Processor Documentation](https://nifi.apache.org/docs/nifi-docs/components/)Related Skills
unified-cicd-platform
This skill should be used when the user is choosing a CI/CD platform, migrating between CI/CD providers, consolidating build and deployment pipelines, designing pipeline architecture across application and infrastructure code, setting up drift detection, configuring OIDC authentication for pipelines, or discussing the operational cost of multiple CI/CD systems. Covers platform selection, the cost of multi-platform CI/CD, what 'everything on one platform' means, OIDC pipeline authentication, Jenkins migration, and scheduled pipeline jobs like drift detection.
unified-find-skills
Helps users discover and install agent skills from skills.sh, clawhub.com, and tessl.io. Use when the user asks to find a skill for a task, extend agent capabilities, or search for tools/workflows.
moai-lang-unified
Unified enterprise programming language skill covering 25+ languages including Python 3.13, TypeScript 5.9, Go 1.23, Rust 1.91, Java 21, JavaScript ES2025, C++, C#, PHP, Swift, Kotlin, Scala, Elixir, Ruby, R, SQL, Shell, and more with patterns, best practices, and Context7 integration
canifi-skill-generator
Self-evolving skill that enables Canifi to create, install, and manage new skills autonomously
apache-spark-data-processing
Complete guide for Apache Spark data processing including RDDs, DataFrames, Spark SQL, streaming, MLlib, and production deployment
apache-airflow-orchestration
Complete guide for Apache Airflow orchestration including DAGs, operators, sensors, XComs, task dependencies, dynamic workflows, and production deployment
apache-nifi-registry
Expert guidance for Apache NiFi Registry including flow versioning, buckets, Git integration, security, and registry client configuration. Use this when working with flow version control and registry management.
apache
Apache HTTP Server 配置
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
google-docs-manager
Expert in Google Docs management. Use when creating, reading, updating, formatting, or managing Google Docs with markdown support, advanced formatting, tables with full manipulation, images with styling, lists, headers/footers, and table of contents.
genesis-tools:living-docs
Self-maintaining documentation system. Bootstraps, validates, refines, and optimizes codebase documentation. Creates minimal, token-efficient doc chunks. Use when creating, updating, or auditing project documentation.
generate-docstrings
Create docstrings for functions and classes. Use when documenting code APIs.