vector-text-fixer

Fix garbled text in PDF/SVG vector graphics caused by font encoding issues, making files editable in AI tools. Supports batch processing and JSON export for manual correction.

53 stars

byaipoch

View on GitHub Installation ↓

Best use case

vector-text-fixer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Fix garbled text in PDF/SVG vector graphics caused by font encoding issues, making files editable in AI tools. Supports batch processing and JSON export for manual correction.

Teams using vector-text-fixer should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/vector-text-fixer/SKILL.md --create-dirs "https://raw.githubusercontent.com/aipoch/medical-research-skills/main/scientific-skills/Other/vector-text-fixer/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/vector-text-fixer/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How vector-text-fixer Compares

Feature / Agent	vector-text-fixer	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Fix garbled text in PDF/SVG vector graphics caused by font encoding issues, making files editable in AI tools. Supports batch processing and JSON export for manual correction.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

# Vector Text Fixer

Fixes garbled text in PDF/SVG vector graphics caused by font embedding problems, encoding errors, or missing font substitution. Outputs repaired files or editable JSON for AI tool import.

## Quick Check

```bash
python -m py_compile scripts/main.py
```

## Audit-Ready Commands

```bash
python -m py_compile scripts/main.py
python scripts/main.py --help
python scripts/main.py --input document.pdf --output fixed.pdf
python scripts/main.py --input diagram.svg --output fixed.svg
```

## When to Use

- Fix garbled/box characters in PDF files caused by font embedding issues
- Repair SVG text encoding errors before editing in Illustrator or Inkscape
- Batch-process a folder of PDF/SVG files with garbled text
- Export a text map JSON for manual correction in AI editors

## Workflow

1. Confirm input file path (PDF or SVG) or batch folder, and desired output path.
2. Validate that the request involves PDF/SVG garbled text repair; stop early if not.
3. Run `scripts/main.py --input <file> --output <file>` or `--batch <folder>`.
4. Return a structured result separating repaired blocks, skipped blocks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the Fallback Template below.

## Fallback Template

If `scripts/main.py` fails or required fields are missing, respond with:

```
FALLBACK REPORT
───────────────────────────────────────
Objective        : <repair goal>
Inputs Available : <file path or batch folder provided>
Missing Inputs   : <list exactly what is missing>
  Note: --input requires a valid PDF or SVG file path, not a text string.
        For batch mode use --batch <folder_path> instead.
Partial Result   : <any blocks repaired safely>
Blocked Steps    : <what could not be completed and why>
Next Steps       : <minimum info needed to complete>
───────────────────────────────────────
```

## Stress-Case Output Checklist

For complex multi-constraint requests, always include these sections explicitly:

- **Assumptions**: repair level default (standard), encoding auto-detected
- **Constraints**: encrypted PDFs require password unlock first; scanned PDFs need OCR first
- **Risks**: severely damaged files may not be fully repairable; rare fonts may not map correctly
- **Unresolved Items**: blocks with confidence < 0.3 flagged for manual review

## Supported Scenarios

**PDF Garbled Text:**
- Box/question mark issues from font embedding problems
- Garbled text from encoding conversion errors
- Missing font substitution characters
- Multi-language mixed encoding issues

**SVG Garbled Text:**
- Text entity encoding errors
- Special character escaping issues
- Invalid font reference display abnormalities
- XML encoding declaration errors

## CLI Usage

```bash
# Fix single PDF
python scripts/main.py --input document.pdf --output fixed.pdf

# Fix single SVG
python scripts/main.py --input diagram.svg --output fixed.svg

# Batch process folder
python scripts/main.py --batch ./input_folder --output ./output_folder

# Interactive repair
python scripts/main.py --input doc.pdf --interactive

# Export editable JSON
python scripts/main.py --input doc.pdf --export-json editable.json

# Specify repair level
python scripts/main.py --input doc.pdf --output fixed.pdf --repair-level aggressive
```

## Parameters

| Parameter | Required | Description | Default |
|---|---|---|---|
| `--input` | Yes* | Input PDF or SVG file path | — |
| `--batch` | Yes* | Batch input folder path | — |
| `--output` | Yes | Output file or folder path | — |
| `--repair-level` | No | `minimal` / `standard` / `aggressive` | `standard` |
| `--interactive` | No | Enable interactive repair mode | False |
| `--export-json` | No | Export editable JSON format | — |
| `--encoding` | No | Source file encoding (default: auto-detect) | auto |

*At least one of `--input` or `--batch` is required.

## Repair Levels

- **Minimal**: Only obvious errors (replacement characters, null bytes); maximum original integrity
- **Standard**: Common encoding issues + smart font replacement; balanced repair rate and accuracy
- **Aggressive**: Full text re-encoding + OCR-assisted recognition; for severely garbled documents

## Output Format (JSON Export)

```json
{
  "file_type": "pdf",
  "pages": [{
    "page_num": 1,
    "text_blocks": [{
      "id": "tb_001",
      "bbox": [100, 200, 300, 220],
      "original_text": "?????",
      "detected_encoding": "UTF-8",
      "confidence": 0.3,
      "suggested_fix": "Sample Text"
    }]
  }],
  "repair_summary": {
    "total_blocks": 15,
    "fixed_blocks": 12,
    "skipped_blocks": 3
  }
}
```

## Input Validation

This skill accepts: PDF (.pdf) or SVG (.svg) file paths, or a folder path for batch processing, where the files contain garbled or unreadable text caused by font/encoding issues.

If the request does not involve PDF/SVG garbled text repair — for example, asking to convert file formats, edit PDF content directly, perform OCR on scanned images, or process non-vector files — do not proceed. Instead respond:

> "`vector-text-fixer` is designed to fix garbled text in PDF/SVG vector graphics caused by font encoding issues. Your request appears to be outside this scope. Please provide a valid PDF or SVG file path, or use a more appropriate tool."

## Error Handling

- If `--input` receives a text string instead of a file path, report the error and request a valid file path.
- If the file is encrypted, report that password unlock is required before processing.
- If the task goes outside documented scope, stop instead of guessing.
- If `scripts/main.py` fails, use the Fallback Template above.
- Do not fabricate repaired text content or execution outcomes.

## Output Requirements

Every final response must include:

1. **Objective** — file(s) repaired and repair level used
2. **Inputs Received** — file path, repair level, encoding settings
3. **Assumptions** — defaults applied (repair level, encoding detection)
4. **Result** — output file path, blocks fixed vs skipped
5. **Risks and Limits** — confidence thresholds, manual review blocks
6. **Next Checks** — review low-confidence blocks manually before use

## Limitations

- Encrypted PDFs require password unlock before processing
- Severely damaged vector files may not be fully repairable
- Some rare fonts may not map correctly
- Scanned PDFs require OCR recognition first

## Dependencies

```
pdfplumber >= 0.10.0
PyMuPDF >= 1.23.0
cairosvg >= 2.7.0
beautifulsoup4 >= 4.12.0
fonttools >= 4.40.0
chardet >= 5.0.0
Pillow >= 10.0.0
```

Related Skills

text-to-technical-roadmap

from aipoch/medical-research-skills

Converts research text into a Mermaid technical roadmap flowchart. Use when the user provides research proposals, experiment designs, or scientific text and asks for a roadmap or flowchart.

text-format-organizer

from aipoch/medical-research-skills

A local text formatting organizer for biomedical/academic writing; use it when you need to clean whitespace/line endings while preserving Markdown structures or when normalizing .docx/.md/.txt before submission or proofreading.

fulltext-fetcher

from aipoch/medical-research-skills

Fetch and save the original HTML of scientific literature webpages when given a URL, DOI, or PubMed PMID (triggered when you need archival-grade page HTML for downstream parsing or review).

unstructured-medical-text-miner

from aipoch/medical-research-skills

Mine unstructured clinical text from MIMIC-IV to extract diagnostic logic.

meta-screening-fulltext

from aipoch/medical-research-skills

Screen full-text papers against inclusion/exclusion criteria, with optional PubMed metadata check using PMID. Use when the user needs to evaluate a paper for a meta-analysis.

skill-auditor

from aipoch/medical-research-skills

A comprehensive auditor for any agent skill — including Manus, OpenClaw/ClawHub, Claude, LobeHub, or custom SKILL.md-based skills. Use this skill whenever a user wants to evaluate, audit, review, score, or quality-check an agent skill before publishing, updating, or deploying. Covers two hard veto gates (structural redlines + research integrity redlines), static quality scoring across 25 criteria (ISO 25010 + OpenSSF + Agent), dynamic test input generation, multi-mode execution testing, multi-layer output evaluation with five specialized category rubrics (Evidence Insight / Protocol Design / Data Analysis / Academic Writing / Other), a Research Veto that applies to all four research categories, human eval viewer generation, actionable P0/P1/P2 optimization recommendations, and automatic skill improvement that outputs a polished, production-ready SKILL.md. Also use whenever a user says "audit my skill", "evaluate my skill", "improve my skill", or wants a corrected version after evaluation.

two-sample-mr-research-planner

from aipoch/medical-research-skills

Generates complete two-sample Mendelian randomization (MR) research designs from a user-provided research direction. Use when users want to design, plan, or build a study using two-sample MR to test causal relationships. Triggers:"design a two-sample MR study", "build a publishable MR paper", "test whether this biomarker causally affects this disease", "generate Lite/Standard/Advanced MR plans", "screen multiple exposures with MR", "bidirectional MR design", "causal inference using GWAS summary statistics", or "I want to study X and Y using MR". Always outputs four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.

research-proposal-generator

from aipoch/medical-research-skills

Generates a comprehensive research proposal design based on input literature, including hypothesis, mechanism verification, and budget. Use when the user wants to design a research project from a paper.

research-grants

from aipoch/medical-research-skills

Write competitive research proposals for NSF, NIH, DOE, DARPA, and Taiwan's NSTC when you need agency-compliant narratives, budgets, and review-criteria alignment for a specific solicitation/FOA/BAA.

protocol-standardization

from aipoch/medical-research-skills

Standardize fragmented experimental steps into reproducible protocol documents when you need method organization, lab SOP drafting, or cross-operator reproducibility; missing parameters must be explicitly marked as "To be supplemented/Not provided".

prospero-registration-helper

from aipoch/medical-research-skills

Assists researchers in generating PROSPERO registration content for meta-analyses from a title and optional protocol. Use when the user wants to draft a PROSPERO registration form.

non-tumor-ml-research-planner

from aipoch/medical-research-skills

Generates complete non-tumor biomedical machine learning research designs from a user-provided research direction. Always use this skill when users want to plan bioinformatics + ML papers for non-cancer diseases (metabolic, cardiovascular, kidney, inflammatory, autoimmune, infectious, neurological, endocrine, wound healing, chronic multifactor), design diagnostic biomarker studies, combine GEO datasets with feature selection and ML modeling, or generate Lite/Standard/Advanced/Publication+ workload plans. Trigger for:"non-tumor ML study", "bioinformatics paper outside oncology", "key genes and diagnostic model for a disease", "pyroptosis/ferroptosis/senescence/autophagy + disease", "GEO datasets + machine learning", "RF + LASSO diagnostic model", "DEG + feature selection + validation", "immune infiltration + biomarker", "non-cancer biomarker paper". Trigger even for casual phrasings like "I want to study X using machine learning", "help me design a non-tumor bioinformatics paper", or "how do I build a diagnostic model for disease Y".