Argentine Invoice Processing System
Complete invoice processing system for Argentine utility bills with OCR, classification, and automated organization
Best use case
Argentine Invoice Processing System is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Complete invoice processing system for Argentine utility bills with OCR, classification, and automated organization
Teams using Argentine Invoice Processing System should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/argentine-invoice-processing-system/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How Argentine Invoice Processing System Compares
| Feature / Agent | Argentine Invoice Processing System | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Complete invoice processing system for Argentine utility bills with OCR, classification, and automated organization
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Argentine Invoice Processing System
## Overview
This skill enables automated processing, classification, and organization of Argentine utility invoices (electricity, water, gas, municipal services, etc.). The system extracts text from PDFs and images using OCR, identifies service providers, extracts dates, and organizes files into a structured year/month hierarchy with standardized naming.
## What This Skill Does
- **OCR Processing**: Extract text from PDFs, images (JPG, PNG) using Tesseract
- **Service Classification**: Identify service providers (AYSA, Edenor, Metrogas, ARBA, etc.)
- **Date Extraction**: Parse due dates from Spanish-language invoices
- **File Naming**: Apply consistent naming convention: `{service}_{YYYY-MM-DD}_{payment}.ext`
- **Organization**: Create year/month folder structure (e.g., `2025/03_marzo/`)
- **Parallel Processing**: Handle multiple files concurrently
## When to Use This Skill
Use this skill when you need to:
- Process and organize utility invoices
- Extract dates from Spanish-language documents
- Classify Argentine service providers
- Troubleshoot OCR issues with Spanish text
- Deploy or maintain the invoice processing system
- Add new service providers or rules
- Test invoice processing functionality
## Quick Start
### Basic Usage
```bash
# Run with default configuration
dotnet run --project FileContentRenamer/FileContentRenamer.csproj
# Process specific directory
dotnet run --project FileContentRenamer -- /path/to/invoices
```
### Key Configuration
```json
{
"AppConfig": {
"BasePath": "/Users/santibraida/Downloads/__comprobantes/servicios",
"FileExtensions": [".pdf", ".jpg", ".png"],
"TesseractLanguage": "spa+eng",
"MaxDegreeOfParallelism": 4
}
}
```
## Detailed Documentation
This skill is organized into specialized sub-skills for different aspects of the system:
### 1. **Invoice Processing** ([invoice-processing.md](invoice-processing.md))
Core workflow and service provider directory. **Start here** for system overview.
**Covers**:
- Complete processing workflow
- Service provider catalog (AYSA, Edenor, Metrogas, etc.)
- File naming conventions
- Folder organization structure
- Configuration reference
**When to use**: Understanding the overall system, adding new providers, configuring the app
### 2. **OCR Troubleshooting** ([ocr-troubleshooting.md](ocr-troubleshooting.md))
Diagnose and fix OCR issues with Tesseract.
**Covers**:
- Common OCR problems and solutions
- Spanish character recognition
- Image quality requirements
- Performance optimization
- Tesseract configuration
**When to use**: OCR producing garbled text, missing content, or slow performance
### 3. **Service Identification** ([service-identification.md](service-identification.md))
Detailed patterns for each service provider.
**Covers**:
- Provider-specific keywords and identifiers
- Invoice characteristics and formats
- Validation rules and amount ranges
- Edge cases and conflicts
- Seasonal patterns
**When to use**: Adding new providers, debugging misclassification, understanding provider specifics
### 4. **Date Extraction** ([date-extraction.md](date-extraction.md))
Comprehensive date parsing patterns and validation.
**Covers**:
- All regex patterns with priority order
- Argentine date format handling
- OCR error correction for dates
- Multiple date scenarios
- Validation and edge cases
**When to use**: Date extraction issues, adding new date patterns, understanding parsing logic
### 5. **Testing Procedures** ([testing-procedures.md](testing-procedures.md))
Complete testing guide from unit tests to production validation.
**Covers**:
- Unit, integration, and E2E testing
- Test data management
- Manual testing procedures
- Performance and regression testing
- CI/CD integration
**When to use**: Writing tests, validating changes, testing new invoice types, quality assurance
### 6. **Deployment Guide** ([deployment-guide.md](deployment-guide.md))
Production deployment and maintenance handbook.
**Covers**:
- Installation and setup
- Production configuration
- Monitoring and health checks
- Backup and recovery
- Troubleshooting common issues
- Security considerations
**When to use**: Deploying to production, setting up scheduled runs, maintenance, troubleshooting
## Common Workflows
### Adding a New Service Provider
1. Collect 3-5 sample invoices
2. Read [service-identification.md](service-identification.md) for pattern analysis
3. Update `appsettings.json` with new rule
4. Test with samples (see [testing-procedures.md](testing-procedures.md))
5. Validate results and adjust keywords
### Troubleshooting OCR Issues
1. Check [ocr-troubleshooting.md](ocr-troubleshooting.md) for your specific issue
2. Verify Tesseract configuration and language packs
3. Test image quality requirements
4. Apply preprocessing if needed
5. Check logs for detailed error messages
### Fixing Date Extraction
1. Review [date-extraction.md](date-extraction.md) for pattern priority
2. Check if date format is supported
3. Apply OCR error correction patterns
4. Add new regex pattern if needed
5. Validate with unit tests
### Deploying to Production
1. Follow [deployment-guide.md](deployment-guide.md) installation steps
2. Configure production settings
3. Set up monitoring and logging
4. Test with sample invoices
5. Schedule automated runs (cron/launchd)
## Architecture Overview
```text
FileContentRenamer/
├── Program.cs # Entry point, configuration loading
├── Configuration/
│ └── ServiceConfiguration.cs # Dependency injection setup
├── Models/
│ ├── AppConfig.cs # Configuration model
│ └── NamingRule.cs # Service provider rules
└── Services/
├── FileService.cs # Main orchestrator
├── PdfProcessor.cs # PDF text extraction
├── ImageProcessor.cs # OCR with Tesseract
├── TextProcessor.cs # Plain text handling
├── DateExtractor.cs # Date parsing
├── FilenameGenerator.cs # Naming rules application
├── DirectoryOrganizer.cs # Folder structure creation
└── FileValidator.cs # File validation
```
## Supported Service Providers
| Provider | Type | Code |
| ----------------------- | ------------------ | --------------------- |
| AYSA | Water & Sanitation | `aysa` |
| Edenor | Electricity | `edenor` |
| Metrogas | Natural Gas | `metrogas` |
| Municipality of Quilmes | Municipal Taxes | `municipal_quilmes` |
| ARBA Inmobiliario | Property Tax | `arba_inmobiliario` |
| ARBA Automotor | Vehicle Tax | `arba_automotor` |
| Personal/Flow | Mobile/Internet | `personal` |
| Quilmes High School | School Tuition | `high_school_cuota` |
| Quilmes High School | School Lunch | `high_school_comedor` |
| Gloria | Domestic Service | `gloria` |
See [service-identification.md](service-identification.md) for detailed information on each provider.
## Key Features
### Intelligent Date Extraction
Handles multiple date formats with priority order:
1. Due date (abbreviated): `Vto.:DD/MM/YYYY`
2. Due date (full): `vencimiento DD/MM/YYYY`
3. Spanish format: `DD de MONTH de YYYY`
4. Generic: `DD/MM/YYYY`
### OCR with Error Correction
- Automatic correction of common OCR mistakes (O→0, I→1, S→5)
- Support for Spanish characters (ñ, á, é, í, ó, ú)
- Multi-language support (Spanish + English)
### Flexible Organization
Files organized into year/month structure:
```text
servicios/
└── 2025/
├── 03_marzo/
│ └── aysa_2025-03-21_santander.pdf
└── 08_agosto/
└── gloria_2025-08-08_mercadopago.jpeg
```
### Parallel Processing
Configurable parallelism for faster processing of large batches while maintaining file safety with proper locking.
## Configuration Reference
### Key Settings
| Setting | Description | Default |
| ---------------------------- | ---------------------- | -------------------------- |
| `BasePath` | Root directory to scan | `.` |
| `FileExtensions` | File types to process | `[".pdf", ".jpg", ".png"]` |
| `IncludeSubdirectories` | Scan subdirectories | `true` |
| `TesseractLanguage` | OCR languages | `"spa+eng"` |
| `MaxDegreeOfParallelism` | Concurrent files | `4` |
| `ForceReprocessAlreadyNamed` | Reprocess named files | `false` |
See [invoice-processing.md](invoice-processing.md) for complete configuration documentation.
## Logging
Logs are written to:
- Console: Real-time processing status
- File: `logs/app{YYYYMMDD}.log` (daily rotation)
### Log Levels
- **Information**: Normal processing flow
- **Warning**: Skipped files, no content found
- **Error**: Processing failures, OCR errors
- **Debug**: Detailed extraction and matching info
## Performance
Typical performance (4-core system):
- **PDF**: ~1-2 seconds per file
- **Image (OCR)**: ~3-5 seconds per file
- **Large batch** (100 files): ~5-8 minutes
See [deployment-guide.md](deployment-guide.md) for optimization tips.
## Troubleshooting Quick Reference
| Issue | See | Quick Fix |
| ------------------------ | ------------------------------------------------------ | ----------------------------------------------- |
| Garbled OCR text | [ocr-troubleshooting.md](ocr-troubleshooting.md) | Check image quality, verify `spa` language pack |
| Wrong service identified | [service-identification.md](service-identification.md) | Add more keywords, check priority |
| Date not found | [date-extraction.md](date-extraction.md) | Verify date format, check OCR quality |
| Files in wrong folder | [invoice-processing.md](invoice-processing.md) | Check `BasePath` configuration |
| High memory usage | [deployment-guide.md](deployment-guide.md) | Reduce `MaxDegreeOfParallelism` |
## Testing
Run tests:
```bash
dotnet test FileContentRenamer.Tests/
```
See [testing-procedures.md](testing-procedures.md) for comprehensive testing guide.
## Updates and Maintenance
- **Weekly**: Review logs for errors
- **Monthly**: Update service provider rules if needed
- **Quarterly**: Review and archive old invoices
- **Yearly**: Update dependencies and .NET runtime
See [deployment-guide.md](deployment-guide.md) for maintenance procedures.
## Version History
- **v1.0.0** (2025-11-06): Initial skill documentation
- Complete invoice processing system
- Support for 10 service providers
- OCR with Tesseract
- Automated organization
## Support
- **Repository**: <https://github.com/santibraida/comprobantes>
- **Issues**: Create GitHub issue for bugs/features
- **Documentation**: See `.skills/` directory
## Related Technologies
- **.NET 9.0**: Application framework
- **Tesseract OCR**: Text extraction from images
- **Serilog**: Structured logging
- **xUnit**: Unit testing framework
- **ImageMagick**: Image preprocessing (optional)
## Best Practices
1. **Always backup** before processing
2. **Test with samples** before bulk processing
3. **Monitor logs** for errors and warnings
4. **Keep configuration** in version control (except production secrets)
5. **Update skills documentation** when adding features
## Getting Help
1. Check the relevant detailed skill file for your issue
2. Review logs for error messages
3. Search existing GitHub issues
4. Create new issue with:
- Sample invoice (redacted)
- Log excerpt
- Configuration used
- Expected vs actual behavior
---
**Start with [invoice-processing.md](invoice-processing.md) for the complete system overview, then dive into specific skill files as needed.**Related Skills
apache-spark-data-processing
Complete guide for Apache Spark data processing including RDDs, DataFrames, Spark SQL, streaming, MLlib, and production deployment
agent-memory-systems
Memory is the cornerstone of intelligent agents. Without it, every interaction starts from zero. This skill covers the architecture of agent memory: short-term (context window), long-term (vector stores), and the cognitive architectures that organize them. Key insight: Memory isn't just storage - it's retrieval. A million stored facts mean nothing if you can't find the right one. Chunking, embedding, and retrieval strategies determine whether your agent remembers or forgets. The field is fragm
agent-embedded-systems
Expert embedded systems engineer specializing in microcontroller programming, RTOS development, and hardware optimization. Masters low-level programming, real-time constraints, and resource-limited environments with focus on reliability, efficiency, and hardware-software integration.
agent-context-system
A persistent local-only memory system for AI coding agents. Two files, one idea — AGENTS.md (committed, shared) + .agents.local.md (gitignored, personal). Agents read both at session start, update the scratchpad at session end, and promote stable patterns over time. Works across Claude Code, Cursor, Copilot, Windsurf. Subagent-ready. No plugins, no infrastructure, no background processes.
active-learning-system
Эксперт active learning. Используй для ML с участием человека, uncertainty sampling, annotation workflows и labeling optimization.
33GOD System Expert
Deep knowledge expert for the 33GOD agentic pipeline system, understands component relationships and suggests feature implementations based on actual codebase state
video-processing-editing
FFmpeg automation for cutting, trimming, concatenating videos. Audio mixing, timeline editing, transitions, effects. Export optimization for YouTube, social media. Subtitle handling, color grading, batch processing. Use for videogen projects, content creation, automated video production. Activate on "video editing", "FFmpeg", "trim video", "concatenate", "transitions", "export optimization". NOT for real-time video editing UI, 3D compositing, or motion graphics.
animation-system
Implements animation systems using AnimationPlayer, AnimationTree, blend trees, and procedural animation. Use when creating character animations and visual effects.
system-create-cli
Generate production-quality TypeScript CLIs with full documentation, error handling, and best practices. Creates deterministic, type-safe command-line tools following PAI's CLI-First Architecture. USE WHEN user says "create a CLI", "build a command-line tool", "make a CLI for X", or requests CLI generation. (user)
zoho-invoice-automation
Automate Zoho Invoice tasks via Rube MCP (Composio). Always search tools first for current schemas.
invoice-organizer
Automatically organizes invoices and receipts for tax preparation by reading messy files, extracting key information, renaming them consistently, and sorting them into logical folders. Turns hours of manual bookkeeping into minutes of automated organization.
email-systems
Email has the highest ROI of any marketing channel. $36 for every $1 spent. Yet most startups treat it as an afterthought - bulk blasts, no personalization, landing in spam folders. This skill covers transactional email that works, marketing automation that converts, deliverability that reaches inboxes, and the infrastructure decisions that scale. Use when: keywords, file_patterns, code_patterns.