text-processor

Text processing pipeline: an agent team collaborates to perform preprocessing, classification, entity/keyword extraction, sentiment analysis, summarization, structured data conversion, and report generation on bulk text. Use this skill for requests like 'analyze this text', 'text processing', 'classify documents', 'run sentiment analysis', 'extract keywords', 'named entity recognition', 'NER', 'text summarization', 'review analysis', 'survey text analysis', 'comment analysis', and other general text NLP tasks. Note: speech recognition (STT), machine translation, chatbot dialogue management, and LLM fine-tuning are outside the scope of this skill.

495 stars

Best use case

text-processor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Text processing pipeline: an agent team collaborates to perform preprocessing, classification, entity/keyword extraction, sentiment analysis, summarization, structured data conversion, and report generation on bulk text. Use this skill for requests like 'analyze this text', 'text processing', 'classify documents', 'run sentiment analysis', 'extract keywords', 'named entity recognition', 'NER', 'text summarization', 'review analysis', 'survey text analysis', 'comment analysis', and other general text NLP tasks. Note: speech recognition (STT), machine translation, chatbot dialogue management, and LLM fine-tuning are outside the scope of this skill.

Teams using text-processor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/text-processor/SKILL.md --create-dirs "https://raw.githubusercontent.com/revfactory/harness-100/main/en/33-text-processor/.claude/skills/text-processor/skill.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/text-processor/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How text-processor Compares

Feature / Agenttext-processorStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Text processing pipeline: an agent team collaborates to perform preprocessing, classification, entity/keyword extraction, sentiment analysis, summarization, structured data conversion, and report generation on bulk text. Use this skill for requests like 'analyze this text', 'text processing', 'classify documents', 'run sentiment analysis', 'extract keywords', 'named entity recognition', 'NER', 'text summarization', 'review analysis', 'survey text analysis', 'comment analysis', and other general text NLP tasks. Note: speech recognition (STT), machine translation, chatbot dialogue management, and LLM fine-tuning are outside the scope of this skill.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Text Processor — Full Text Processing Pipeline

An agent team collaborates to perform bulk text preprocessing, classification, extraction, sentiment analysis, structuring, and report generation.

## Execution Mode

**Agent Team** — Five agents communicate directly via SendMessage and perform cross-validation.

## Agent Composition

| Agent | File | Role | Type |
|-------|------|------|------|
| preprocessor | `.claude/agents/preprocessor.md` | Text preprocessing, noise removal | general-purpose |
| classifier | `.claude/agents/classifier.md` | Topic/intent classification, tagging | general-purpose |
| extractor | `.claude/agents/extractor.md` | Entity, keyword, relation, summary extraction | general-purpose |
| sentiment-analyzer | `.claude/agents/sentiment-analyzer.md` | Sentiment/emotion/opinion analysis | general-purpose |
| report-writer | `.claude/agents/report-writer.md` | Final report, quality assurance | general-purpose |

## Workflow

### Phase 1: Preparation (performed directly by the orchestrator)

1. Extract the following from user input:
    - **Text source**: File path, format, document count, language
    - **Analysis objective**: What the user wants to learn (classification, sentiment, keywords, etc.)
    - **Domain information** (optional): Industry, text type (reviews/news/social media/documents)
    - **Classification taxonomy** (optional): User-defined classification categories
2. Create the `_workspace/` directory and the `_workspace/structured_data/` subdirectory
3. Organize the input and save it to `_workspace/00_input.md`
4. If pre-existing files are available, copy them to `_workspace/` and skip the corresponding phase
5. **Determine the execution mode** based on the scope of the request

### Phase 2: Team Assembly and Execution

| Order | Task | Owner | Dependencies | Deliverable |
|-------|------|-------|-------------|-------------|
| 1 | Preprocessing | preprocessor | None | `01_preprocessing_result.md` |
| 2a | Classification | classifier | Task 1 | `02_classification_result.md` |
| 2b | Extraction | extractor | Task 1 | `03_extraction_result.md` |
| 3 | Sentiment analysis | sentiment-analyzer | Tasks 1, 2a, 2b | `04_sentiment_result.md` |
| 4 | Report | report-writer | Tasks 2a, 2b, 3 | `05_final_report.md` |

Tasks 2a (classification) and 2b (extraction) run **in parallel**. Sentiment analysis leverages classification and extraction results to improve aspect-level analysis accuracy.

**Inter-agent communication flow:**
- preprocessor completes > passes cleaned text and metadata to classifier, extractor, and sentiment-analyzer
- classifier completes > passes topic classification results to extractor (for topic-specific extraction) and sentiment-analyzer
- extractor completes > passes entity lists to sentiment-analyzer (for entity-level sentiment analysis)
- sentiment-analyzer completes > passes results to report-writer
- report-writer cross-validates all deliverables; requests corrections from the relevant agent if discrepancies are found (up to 2 rounds)

### Phase 3: Integration and Final Deliverables

1. Verify all files in `_workspace/` and the `structured_data/` directory
2. Confirm that all required corrections have been incorporated into the report
3. Present the final summary to the user

## Execution Modes by Request Scope

| User Request Pattern | Execution Mode | Agents Deployed |
|---------------------|---------------|----------------|
| "Analyze this text", "full pipeline" | **Full pipeline** | All 5 agents |
| "Just classify", "categorize" | **Classification mode** | preprocessor + classifier |
| "Sentiment analysis only", "review sentiment" | **Sentiment mode** | preprocessor + sentiment-analyzer |
| "Extract keywords", "named entity recognition" | **Extraction mode** | preprocessor + extractor |
| "Summarize", "text summary" | **Summary mode** | preprocessor + extractor (summary function) |
| "Write a report" (existing analyses available) | **Report mode** | report-writer only |

**Reusing existing files**: If the user provides pre-processed text or existing classification results, copy those files to the appropriate location in `_workspace/` and skip the corresponding agent.

## Data Transfer Protocol

| Strategy | Method | Purpose |
|----------|--------|---------|
| File-based | `_workspace/` directory | Markdown deliverables |
| Structured data | `_workspace/structured_data/` | JSON/CSV data for programmatic use |
| Message-based | SendMessage | Key information transfer, correction requests |

## Error Handling

| Error Type | Strategy |
|-----------|----------|
| Encoding errors | Auto-detect with chardet > force UTF-8 conversion, log losses |
| Large text volumes (>100K documents) | Batch processing; analyze a sample first, then apply to full dataset |
| Mixed languages | Separate into language-specific segments and process individually |
| NER domain mismatch | Supplement with pattern-based extraction; propose custom dictionary creation |
| Agent failure | Retry once; if still failing, proceed without that deliverable |
| Report discrepancy found | Request correction from the relevant agent (up to 2 rounds) |

## Test Scenarios

### Normal Flow
**Prompt**: "Analyze 1,000 customer reviews and extract product-level satisfaction and complaints"
**Expected result**:
- Preprocessing: Normalize review text, remove duplicates, compute statistics
- Classification: Classify by product category and review intent (praise/complaint/inquiry/suggestion)
- Extraction: Product names, feature names, key keywords, per-review summaries
- Sentiment: Overall sentiment distribution, product- and feature-level sentiment (ABSA), complaint patterns
- Report: Product satisfaction rankings, top 5 complaints, improvement recommendations

### Existing File Reuse Flow
**Prompt**: "I already have preprocessed text data; just run sentiment analysis" + preprocessed file attached
**Expected result**:
- Copy existing preprocessing results to `_workspace/01_preprocessing_result.md`
- Sentiment mode: Skip preprocessor, deploy only sentiment-analyzer
- Do not deploy classifier, extractor, or report-writer

### Error Flow
**Prompt**: "Analyze the comments in this CSV file" (mixed languages, many emojis, short text)
**Expected result**:
- preprocessor separates text by language, determines emoji handling strategy
- classifier flags reduced confidence for short text classification
- sentiment-analyzer leverages emoji sentiment information
- report-writer documents the analytical limitations of multilingual and short text in the report


## Agent Extension Skills

| Skill | Path | Enhanced Agent | Role |
|-------|------|---------------|------|
| nlp-preprocessing-toolkit | `.claude/skills/nlp-preprocessing-toolkit/skill.md` | preprocessor, extractor | Tokenization, morphological analysis, embedding selection, vectorization |
| sentiment-lexicon-builder | `.claude/skills/sentiment-lexicon-builder/skill.md` | sentiment-analyzer | Sentiment lexicon construction, ABSA, negation/intensity correction, emoji mapping |

Related Skills

text-analytics-methods

495
from revfactory/harness-100

Text analytics methodology. Referenced by topic-classifier and trend-detector agents when extracting topics and deriving trends from unstructured text. Used for 'topic classification', 'keyword analysis', 'text mining' requests. Note: NLP model training and large-scale data processing pipeline development are out of scope.

ddd-context-mapping

495
from revfactory/harness-100

Detailed methodology for DDD (Domain-Driven Design) bounded context identification, context map creation, and event storming execution. Use this skill for 'bounded context', 'DDD', 'domain modeling', 'event storming', 'context map', 'aggregate design', 'ubiquitous language', and other domain analysis tasks. Enhances the domain analysis capabilities of domain-analyst and service-architect. Note: infrastructure deployment and code implementation are outside the scope of this skill.

sustainability-audit

495
from revfactory/harness-100

Full audit pipeline for ESG/sustainability where an agent team collaborates to generate environmental, social, and governance assessments along with an integrated report and improvement plan. Use this skill for requests such as 'run an ESG audit', 'write a sustainability report', 'ESG assessment', 'carbon emissions calculation', 'ESG rating diagnosis', 'governance review', 'social responsibility assessment', 'GRI report', 'TCFD disclosure', 'ESG improvement plan', and other ESG/sustainability tasks. Also supports assessment of specific pillars (E/S/G) only or improving existing reports. However, actual on-site audit execution, third-party verification certificate issuance, ESG rating agency score changes, and carbon credit trading are outside the scope of this skill.

materiality-assessment

495
from revfactory/harness-100

ESG materiality assessment matrix. Referenced by the esg-reporter and improvement-planner agents when evaluating ESG issue materiality and setting priorities. Use for 'materiality assessment', 'importance analysis', or 'Materiality Matrix' requests. Stakeholder surveys and external certification are out of scope.

ghg-protocol

495
from revfactory/harness-100

GHG Protocol detailed guide. Referenced by the environmental-analyst agent when calculating and reporting greenhouse gas emissions. Use for 'GHG Protocol', 'carbon emissions', 'Scope 1/2/3', or 'carbon footprint' requests. Carbon credit trading and CDM project execution are out of scope.

citation-standards

495
from revfactory/harness-100

Academic citation and reference standards guide. Referenced by the paper-writer and submission-preparer agents when composing citations and references. Use for 'citation format', 'APA', or 'references' requests. Original paper retrieval and professional database access are out of scope.

academic-paper

495
from revfactory/harness-100

Full research pipeline for academic paper writing where an agent team collaborates to generate research design, experiment protocols, analysis, manuscript writing, and submission preparation. Use this skill for requests such as 'write an academic paper', 'research paper writing', 'help me write a paper', 'design a study', 'run statistical analysis', 'prepare journal submission', 'manuscript writing', 'research methodology design', 'hypothesis testing', 'academic writing', and other academic research paper tasks. Also supports analysis, rewriting, and submission preparation when existing data or drafts are available. However, actual data collection execution, official IRB submission, journal system login and upload, and running actual statistical software are outside the scope of this skill.

product-copy-formulas

495
from revfactory/harness-100

Product copy formula library. Referenced by the detail-page-writer and marketing-manager agents when writing purchase-driving copy. Use for 'product copy', 'marketing copy', or 'ad copy' requests. Ad placement and design mockup creation are out of scope.

ecommerce-launcher

495
from revfactory/harness-100

Full launch pipeline for e-commerce products where an agent team collaborates to generate product planning, detail pages, pricing strategy, marketing, and CS setup all at once. Use this skill for requests such as 'launch an e-commerce product', 'prepare a product launch', 'register a product on Naver Smart Store', 'launch on Coupang', 'create a detail page', 'develop a pricing strategy', 'create a marketing plan', 'launch prep', 'product planning brief', 'e-commerce CS manual', and other e-commerce product launch tasks. Also supports supplementing pricing/marketing/CS even when existing briefs or detail pages are provided. However, actual platform API integration (automated product registration), payment system development, logistics system integration, and real-time order management are outside the scope of this skill.

conversion-optimization

495
from revfactory/harness-100

Purchase conversion optimization framework. Referenced by the detail-page-writer and pricing-strategist agents when designing detail pages and pricing with a conversion focus. Use for 'conversion rate optimization', 'CRO', or 'purchase psychology' requests. A/B testing tool setup and funnel automation are out of scope.

real-estate-analyst

495
from revfactory/harness-100

Real estate investment analysis pipeline. An agent team collaborates to produce market research, location analysis, profitability analysis, risk assessment, and investment reports. Use this skill for requests such as 'analyze this real estate', 'apartment investment analysis', 'studio apartment yield', 'real estate market research', 'location analysis', 'real estate investment report', 'buy vs lease', 'reconstruction investment analysis', 'commercial property yield analysis', and other general real estate investment analysis tasks. Actual purchase contracts, brokerage services, interior design, and property management are outside the scope of this skill.

location-scoring

495
from revfactory/harness-100

Location scoring scorecard. Referenced by the location-analyst agent for systematic real estate location evaluation. Use for requests involving 'location analysis', 'location assessment', or 'commercial area analysis'. On-site inspections and surveying are out of scope.