numpy-string-ops

Vectorized string manipulation using the char module and modern string alternatives, including cleaning and search operations. Triggers: string operations, numpy.char, text cleaning, substring search.

16 stars

Best use case

numpy-string-ops is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Vectorized string manipulation using the char module and modern string alternatives, including cleaning and search operations. Triggers: string operations, numpy.char, text cleaning, substring search.

Teams using numpy-string-ops should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/numpy-string-ops/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/data-ai/numpy-string-ops/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/numpy-string-ops/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How numpy-string-ops Compares

Feature / Agentnumpy-string-opsStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Vectorized string manipulation using the char module and modern string alternatives, including cleaning and search operations. Triggers: string operations, numpy.char, text cleaning, substring search.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Overview
NumPy's `char` submodule provides vectorized versions of standard Python string operations. It allows for efficient processing of arrays containing `str_` or `bytes_` types, though it is being transitioned to a newer `strings` module in recent versions.

## When to Use
- Cleaning large text datasets (e.g., stripping whitespace, normalization).
- Performing batch substring searches across thousands of records.
- Concatenating columns of text data using broadcasting.
- Converting character casing for entire datasets simultaneously.

## Decision Tree
1. Starting new development?
   - Use `numpy.strings` if available; `numpy.char` is legacy.
2. Comparing strings with potential trailing spaces?
   - `numpy.char` comparison operators automatically strip whitespace.
3. Concatenating a constant prefix to an array of names?
   - Use `np.char.add(prefix, name_array)`.

## Workflows
1. **Batch String Concatenation**
   - Create two arrays of strings, A and B.
   - Use `np.char.add(A, B)` to join them element-wise.
   - Broadcasting applies if one array is a single string and the other is multidimensional.

2. **Cleaning Text Datasets**
   - Identify an array of messy text.
   - Apply `np.char.strip(arr)` to remove whitespace.
   - Use `np.char.lower(arr)` to normalize casing across the entire dataset.

3. **Finding Substrings in Arrays**
   - Use `np.char.find(text_array, 'target_word')`.
   - Identify elements with non-negative indices (where the word was found).
   - Filter the original array using boolean indexing based on the search result.

## Non-Obvious Insights
- **Legacy Status:** The `char` module is considered legacy; future-proof code should look towards the `numpy.strings` alternative.
- **Implicit Stripping:** Unlike standard Python `==`, `char` module comparison operators strip trailing whitespace before evaluating equality.
- **Vectorization Reality:** While these operations are vectorized, string manipulation is inherently less performant than numeric math because strings have variable lengths and require more complex memory management.

## Evidence
- "Unlike the standard numpy comparison operators, the ones in the char module strip trailing whitespace characters before performing the comparison." [Source](https://numpy.org/doc/stable/reference/routines.char.html)
- "The numpy.char module provides a set of vectorized string operations for arrays of type numpy.str_ or numpy.bytes_." [Source](https://numpy.org/doc/stable/reference/routines.char.html)

## Scripts
- `scripts/numpy-string-ops_tool.py`: Routines for batch text cleaning and search.
- `scripts/numpy-string-ops_tool.js`: Simulated string concatenation logic.

## Dependencies
- `numpy` (Python)

## References
- [references/README.md](references/README.md)

Related Skills

bgo

10
from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

partner-revenue-desk

16
from diegosouzapw/awesome-omni-skill

Operating model for tracking, attributing, and accelerating partner-sourced revenue.

parallel-data-enrichment

16
from diegosouzapw/awesome-omni-skill

Structured company and entity data enrichment using Parallel AI Task API with core/base processors. Returns typed JSON output. No binary install — requires PARALLEL_API_KEY in .env.local.

parallel-agents

16
from diegosouzapw/awesome-omni-skill

Multi-agent orchestration patterns. Use when multiple independent tasks can run with different domain expertise or when comprehensive analysis requires multiple perspectives.

paper-writing-assistant

16
from diegosouzapw/awesome-omni-skill

Assist in drafting research papers and meeting notes, enforcing academic rigor and formatting.

pandas-data-manipulation-rules

16
from diegosouzapw/awesome-omni-skill

Focuses on pandas-specific rules for data manipulation, including method chaining, data selection using loc/iloc, and groupby operations.

pagent

16
from diegosouzapw/awesome-omni-skill

Guide for using pagent - a PRD-to-code orchestration tool. Use when users ask how to use pagent, run agents, create PRDs, or transform requirements into code.

page-annotator

16
from diegosouzapw/awesome-omni-skill

AI驱动的网页标注工具,支持高亮元素和添加文字批注。智能防重复、自动滚动、碰撞检测。兼容 GitHub 等严格 CSP 网站。适用场景:(1) 标记网页元素进行讲解 (2) 添加文字批注和注释 (3) 代码审查和设计评审 (4) 教学演示和用户引导 (5) Bug 报告和问题标记

package-json-modification-protection

16
from diegosouzapw/awesome-omni-skill

Protects lines with the specific 'Do not touch this line Cursor' comment within package.json.

orchestrator

16
from diegosouzapw/awesome-omni-skill

Multi-agent orchestrator that delegates all work to specialized subagents. Enforces parallelism, tracks progress, and coordinates agent teams for complex tasks.

orchestrator-conductor

16
from diegosouzapw/awesome-omni-skill

This skill should be used when the user asks to "orchestrate agents", "run /orchestrate", "manage parallel agents", "coordinate multiple agents", "decompose this task", or needs patterns for multi-agent workflows with parallel execution and task decomposition.

orchestration

16
from diegosouzapw/awesome-omni-skill

MANDATORY - Your default operating system. Adaptive workflow that routes simple tasks to direct execution and complex tasks to PRD iterations with agent swarms. Auto-creates skills when patterns emerge.