vector-text-fixer
Fix garbled text in PDF/SVG vector graphics caused by font encoding issues, making files editable in AI tools. Supports batch processing and JSON export for manual correction.
Best use case
vector-text-fixer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Fix garbled text in PDF/SVG vector graphics caused by font encoding issues, making files editable in AI tools. Supports batch processing and JSON export for manual correction.
Teams using vector-text-fixer should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/vector-text-fixer/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How vector-text-fixer Compares
| Feature / Agent | vector-text-fixer | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Fix garbled text in PDF/SVG vector graphics caused by font encoding issues, making files editable in AI tools. Supports batch processing and JSON export for manual correction.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
# Vector Text Fixer
Fixes garbled text in PDF/SVG vector graphics caused by font embedding problems, encoding errors, or missing font substitution. Outputs repaired files or editable JSON for AI tool import.
## Quick Check
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
```bash
python -m py_compile scripts/main.py
python scripts/main.py --help
python scripts/main.py --input document.pdf --output fixed.pdf
python scripts/main.py --input diagram.svg --output fixed.svg
```
## When to Use
- Fix garbled/box characters in PDF files caused by font embedding issues
- Repair SVG text encoding errors before editing in Illustrator or Inkscape
- Batch-process a folder of PDF/SVG files with garbled text
- Export a text map JSON for manual correction in AI editors
## Workflow
1. Confirm input file path (PDF or SVG) or batch folder, and desired output path.
2. Validate that the request involves PDF/SVG garbled text repair; stop early if not.
3. Run `scripts/main.py --input <file> --output <file>` or `--batch <folder>`.
4. Return a structured result separating repaired blocks, skipped blocks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the Fallback Template below.
## Fallback Template
If `scripts/main.py` fails or required fields are missing, respond with:
```
FALLBACK REPORT
───────────────────────────────────────
Objective : <repair goal>
Inputs Available : <file path or batch folder provided>
Missing Inputs : <list exactly what is missing>
Note: --input requires a valid PDF or SVG file path, not a text string.
For batch mode use --batch <folder_path> instead.
Partial Result : <any blocks repaired safely>
Blocked Steps : <what could not be completed and why>
Next Steps : <minimum info needed to complete>
───────────────────────────────────────
```
## Stress-Case Output Checklist
For complex multi-constraint requests, always include these sections explicitly:
- **Assumptions**: repair level default (standard), encoding auto-detected
- **Constraints**: encrypted PDFs require password unlock first; scanned PDFs need OCR first
- **Risks**: severely damaged files may not be fully repairable; rare fonts may not map correctly
- **Unresolved Items**: blocks with confidence < 0.3 flagged for manual review
## Supported Scenarios
**PDF Garbled Text:**
- Box/question mark issues from font embedding problems
- Garbled text from encoding conversion errors
- Missing font substitution characters
- Multi-language mixed encoding issues
**SVG Garbled Text:**
- Text entity encoding errors
- Special character escaping issues
- Invalid font reference display abnormalities
- XML encoding declaration errors
## CLI Usage
```bash
# Fix single PDF
python scripts/main.py --input document.pdf --output fixed.pdf
# Fix single SVG
python scripts/main.py --input diagram.svg --output fixed.svg
# Batch process folder
python scripts/main.py --batch ./input_folder --output ./output_folder
# Interactive repair
python scripts/main.py --input doc.pdf --interactive
# Export editable JSON
python scripts/main.py --input doc.pdf --export-json editable.json
# Specify repair level
python scripts/main.py --input doc.pdf --output fixed.pdf --repair-level aggressive
```
## Parameters
| Parameter | Required | Description | Default |
|---|---|---|---|
| `--input` | Yes* | Input PDF or SVG file path | — |
| `--batch` | Yes* | Batch input folder path | — |
| `--output` | Yes | Output file or folder path | — |
| `--repair-level` | No | `minimal` / `standard` / `aggressive` | `standard` |
| `--interactive` | No | Enable interactive repair mode | False |
| `--export-json` | No | Export editable JSON format | — |
| `--encoding` | No | Source file encoding (default: auto-detect) | auto |
*At least one of `--input` or `--batch` is required.
## Repair Levels
- **Minimal**: Only obvious errors (replacement characters, null bytes); maximum original integrity
- **Standard**: Common encoding issues + smart font replacement; balanced repair rate and accuracy
- **Aggressive**: Full text re-encoding + OCR-assisted recognition; for severely garbled documents
## Output Format (JSON Export)
```json
{
"file_type": "pdf",
"pages": [{
"page_num": 1,
"text_blocks": [{
"id": "tb_001",
"bbox": [100, 200, 300, 220],
"original_text": "?????",
"detected_encoding": "UTF-8",
"confidence": 0.3,
"suggested_fix": "Sample Text"
}]
}],
"repair_summary": {
"total_blocks": 15,
"fixed_blocks": 12,
"skipped_blocks": 3
}
}
```
## Input Validation
This skill accepts: PDF (.pdf) or SVG (.svg) file paths, or a folder path for batch processing, where the files contain garbled or unreadable text caused by font/encoding issues.
If the request does not involve PDF/SVG garbled text repair — for example, asking to convert file formats, edit PDF content directly, perform OCR on scanned images, or process non-vector files — do not proceed. Instead respond:
> "`vector-text-fixer` is designed to fix garbled text in PDF/SVG vector graphics caused by font encoding issues. Your request appears to be outside this scope. Please provide a valid PDF or SVG file path, or use a more appropriate tool."
## Error Handling
- If `--input` receives a text string instead of a file path, report the error and request a valid file path.
- If the file is encrypted, report that password unlock is required before processing.
- If the task goes outside documented scope, stop instead of guessing.
- If `scripts/main.py` fails, use the Fallback Template above.
- Do not fabricate repaired text content or execution outcomes.
## Output Requirements
Every final response must include:
1. **Objective** — file(s) repaired and repair level used
2. **Inputs Received** — file path, repair level, encoding settings
3. **Assumptions** — defaults applied (repair level, encoding detection)
4. **Result** — output file path, blocks fixed vs skipped
5. **Risks and Limits** — confidence thresholds, manual review blocks
6. **Next Checks** — review low-confidence blocks manually before use
## Limitations
- Encrypted PDFs require password unlock before processing
- Severely damaged vector files may not be fully repairable
- Some rare fonts may not map correctly
- Scanned PDFs require OCR recognition first
## Dependencies
```
pdfplumber >= 0.10.0
PyMuPDF >= 1.23.0
cairosvg >= 2.7.0
beautifulsoup4 >= 4.12.0
fonttools >= 4.40.0
chardet >= 5.0.0
Pillow >= 10.0.0
```Related Skills
text-to-technical-roadmap
Converts research text into a Mermaid technical roadmap flowchart. Use when the user provides research proposals, experiment designs, or scientific text and asks for a roadmap or flowchart.
text-format-organizer
A local text formatting organizer for biomedical/academic writing; use it when you need to clean whitespace/line endings while preserving Markdown structures or when normalizing .docx/.md/.txt before submission or proofreading.
fulltext-fetcher
Fetch and save the original HTML of scientific literature webpages when given a URL, DOI, or PubMed PMID (triggered when you need archival-grade page HTML for downstream parsing or review).
unstructured-medical-text-miner
Mine unstructured clinical text from MIMIC-IV to extract diagnostic logic.
meta-screening-fulltext
Screen full-text papers against inclusion/exclusion criteria, with optional PubMed metadata check using PMID. Use when the user needs to evaluate a paper for a meta-analysis.
skill-auditor
A comprehensive auditor for any agent skill — including Manus, OpenClaw/ClawHub, Claude, LobeHub, or custom SKILL.md-based skills. Use this skill whenever a user wants to evaluate, audit, review, score, or quality-check an agent skill before publishing, updating, or deploying. Covers two hard veto gates (structural redlines + research integrity redlines), static quality scoring across 25 criteria (ISO 25010 + OpenSSF + Agent), dynamic test input generation, multi-mode execution testing, multi-layer output evaluation with five specialized category rubrics (Evidence Insight / Protocol Design / Data Analysis / Academic Writing / Other), a Research Veto that applies to all four research categories, human eval viewer generation, actionable P0/P1/P2 optimization recommendations, and automatic skill improvement that outputs a polished, production-ready SKILL.md. Also use whenever a user says "audit my skill", "evaluate my skill", "improve my skill", or wants a corrected version after evaluation.
two-sample-mr-research-planner
Generates complete two-sample Mendelian randomization (MR) research designs from a user-provided research direction. Use when users want to design, plan, or build a study using two-sample MR to test causal relationships. Triggers:"design a two-sample MR study", "build a publishable MR paper", "test whether this biomarker causally affects this disease", "generate Lite/Standard/Advanced MR plans", "screen multiple exposures with MR", "bidirectional MR design", "causal inference using GWAS summary statistics", or "I want to study X and Y using MR". Always outputs four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.
research-proposal-generator
Generates a comprehensive research proposal design based on input literature, including hypothesis, mechanism verification, and budget. Use when the user wants to design a research project from a paper.
research-grants
Write competitive research proposals for NSF, NIH, DOE, DARPA, and Taiwan's NSTC when you need agency-compliant narratives, budgets, and review-criteria alignment for a specific solicitation/FOA/BAA.
protocol-standardization
Standardize fragmented experimental steps into reproducible protocol documents when you need method organization, lab SOP drafting, or cross-operator reproducibility; missing parameters must be explicitly marked as "To be supplemented/Not provided".
prospero-registration-helper
Assists researchers in generating PROSPERO registration content for meta-analyses from a title and optional protocol. Use when the user wants to draft a PROSPERO registration form.
non-tumor-ml-research-planner
Generates complete non-tumor biomedical machine learning research designs from a user-provided research direction. Always use this skill when users want to plan bioinformatics + ML papers for non-cancer diseases (metabolic, cardiovascular, kidney, inflammatory, autoimmune, infectious, neurological, endocrine, wound healing, chronic multifactor), design diagnostic biomarker studies, combine GEO datasets with feature selection and ML modeling, or generate Lite/Standard/Advanced/Publication+ workload plans. Trigger for:"non-tumor ML study", "bioinformatics paper outside oncology", "key genes and diagnostic model for a disease", "pyroptosis/ferroptosis/senescence/autophagy + disease", "GEO datasets + machine learning", "RF + LASSO diagnostic model", "DEG + feature selection + validation", "immune infiltration + biomarker", "non-cancer biomarker paper". Trigger even for casual phrasings like "I want to study X using machine learning", "help me design a non-tumor bioinformatics paper", or "how do I build a diagnostic model for disease Y".