Lexer Generator

Expert skill for generating and hand-writing lexers using DFA-based, table-driven, and recursive approaches

509 stars

Best use case

Lexer Generator is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Expert skill for generating and hand-writing lexers using DFA-based, table-driven, and recursive approaches

Teams using Lexer Generator should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/lexer-generator/SKILL.md --create-dirs "https://raw.githubusercontent.com/a5c-ai/babysitter/main/library/specializations/programming-languages/skills/lexer-generator/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/lexer-generator/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Lexer Generator Compares

Feature / AgentLexer GeneratorStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Expert skill for generating and hand-writing lexers using DFA-based, table-driven, and recursive approaches

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Lexer Generator Skill

## Overview

Expert skill for generating and hand-writing lexers using various approaches including DFA-based lexers, table-driven lexers, and hand-written recursive lexers.

## Capabilities

- Generate lexer from regular expression specifications
- Implement maximal munch tokenization
- Handle Unicode character classes and normalization
- Implement efficient keyword recognition (tries, perfect hashing)
- Support incremental/resumable lexing for IDE integration
- Generate lexer tables and state machines
- Handle lexer modes and contexts (e.g., string interpolation)
- Implement error recovery with skip-to-next strategies

## Target Processes

- lexer-implementation.js
- language-grammar-design.js
- lsp-server-implementation.js
- repl-development.js

## Dependencies

- Flex-like generators
- RE2/Hyperscan libraries

## Usage Guidelines

1. **Token Definition**: Start by defining the complete set of tokens with their regex patterns
2. **Maximal Munch**: Always implement maximal munch to handle ambiguous token boundaries
3. **Unicode Support**: Consider Unicode normalization forms and character classes from the start
4. **Error Recovery**: Implement skip-to-next-valid strategies for robust error handling
5. **Performance**: Use table-driven approaches for large token sets, hand-written for simple lexers

## Output Schema

```json
{
  "type": "object",
  "properties": {
    "tokens": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "pattern": { "type": "string" },
          "priority": { "type": "integer" }
        }
      }
    },
    "lexerType": {
      "type": "string",
      "enum": ["dfa", "table-driven", "hand-written"]
    },
    "generatedFiles": {
      "type": "array",
      "items": { "type": "string" }
    }
  }
}
```

Related Skills

color-palette-generator

509
from a5c-ai/babysitter

Generate accessible color palettes with WCAG compliance

tracing-schema-generator

509
from a5c-ai/babysitter

Generate distributed tracing schemas for OpenTelemetry with Jaeger/Zipkin integration

metrics-schema-generator

509
from a5c-ai/babysitter

Generate metrics schemas for Prometheus, OpenTelemetry, and Grafana dashboards

log-schema-generator

509
from a5c-ai/babysitter

Generate structured logging schemas with correlation ID patterns and ELK/Splunk integration

load-test-generator

509
from a5c-ai/babysitter

Generate load test scripts for k6, Locust, and Gatling from OpenAPI specs

graphql-schema-generator

509
from a5c-ai/babysitter

Generate GraphQL schemas from data models with resolver stubs and federation support

docs-site-generator

509
from a5c-ai/babysitter

Generate documentation sites using Docusaurus, MkDocs, or VuePress

dependency-graph-generator

509
from a5c-ai/babysitter

Generate module dependency graphs with circular dependency detection and coupling metrics

dashboard-generator

509
from a5c-ai/babysitter

Generate monitoring dashboards for Grafana and DataDog with alert integration

c4-diagram-generator

509
from a5c-ai/babysitter

Specialized skill for generating C4 model architecture diagrams. Supports Structurizr DSL, PlantUML, and Mermaid formats with multi-level abstraction (Context, Container, Component, Code).

adr-generator

509
from a5c-ai/babysitter

Specialized skill for generating and managing Architecture Decision Records (ADRs). Supports Nygard, MADR, and custom templates with auto-numbering, linking, and status management.

typespec-sdk-generator

509
from a5c-ai/babysitter

Microsoft TypeSpec-based API and SDK generation