transcript-search

Intelligent semantic search over voice memo and video transcript DuckDB databases. Use when searching transcripts for topics, colors, tabs, concepts, or any content. NEVER dump full transcript text — use sentence-level extraction with context windows.

16 stars

Best use case

transcript-search is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Intelligent semantic search over voice memo and video transcript DuckDB databases. Use when searching transcripts for topics, colors, tabs, concepts, or any content. NEVER dump full transcript text — use sentence-level extraction with context windows.

Teams using transcript-search should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/transcript-search/SKILL.md --create-dirs "https://raw.githubusercontent.com/plurigrid/asi/main/plugins/asi/skills/transcript-search/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/transcript-search/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How transcript-search Compares

Feature / Agenttranscript-searchStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Intelligent semantic search over voice memo and video transcript DuckDB databases. Use when searching transcripts for topics, colors, tabs, concepts, or any content. NEVER dump full transcript text — use sentence-level extraction with context windows.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# transcript-search

> Search transcripts intelligently without loading entire texts into context.

**Trit**: 0 (ERGODIC - coordination/retrieval)

## CRITICAL RULE

**NEVER** run `SELECT text FROM transcripts` or load full transcript bodies into context.
Always use sentence-level extraction with `regexp_extract_all` or `string_split` + filtering.

## Known Databases

| Path | Schema | Content |
|------|--------|---------|
| `~/worlds/a/all_transcripts.duckdb` | `transcripts(id, source, source_path, audio_path, timestamp, text, duration_seconds, session_id)` | 174 voice memos + whisper transcripts |
| `~/worlds/a/audio_transcript.duckdb` | `recordings`, `segments`, `speakers`, `words` | Speaker-diarized audio with GF(3) |
| `~/worlds/a/aqua_transcriptions.duckdb` | varies | Aqua Voice transcriptions |
| `~/.topos/duckdb-atlas/audio_transcript.duckdb` | same as above | Atlas copy |

## Search Patterns

### 1. Sentence-Level Context Extraction (PRIMARY)

Extract sentences matching keywords with surrounding context:

```sql
-- Find sentences about a topic with ±250 char context window
SELECT id, source, timestamp, trim(chunk) as context
FROM (
  SELECT id, source, timestamp,
    unnest(regexp_extract_all(text, '[^.]{0,250}KEYWORD[^.]{0,250}', 0)) as chunk
  FROM transcripts
)
WHERE length(trim(chunk)) > 15
ORDER BY id;
```

### 2. Multi-Keyword Intersection

Find sentences where multiple concepts co-occur:

```sql
-- Sentences mentioning BOTH term1 AND term2
WITH sentences AS (
  SELECT id, source, unnest(string_split(text, '.')) as sentence
  FROM transcripts
)
SELECT id, source, trim(sentence) as sentence
FROM sentences
WHERE lower(sentence) LIKE '%term1%'
  AND lower(sentence) LIKE '%term2%'
  AND length(trim(sentence)) > 20;
```

### 3. Quick Count Before Deep Dive

Always count first to avoid surprise data dumps:

```sql
-- How many transcripts mention X?
SELECT COUNT(*) as hits,
       array_agg(id ORDER BY id) as transcript_ids
FROM transcripts
WHERE lower(text) LIKE '%keyword%';
```

### 4. Temporal Search

```sql
-- Recent transcripts mentioning X
SELECT id, source, timestamp, left(text, 200) as preview
FROM transcripts
WHERE lower(text) LIKE '%keyword%'
  AND timestamp > NOW() - INTERVAL '7 days'
ORDER BY timestamp DESC;
```

### 5. Co-occurrence Matrix

```sql
-- Which transcripts mention both colors AND tabs?
SELECT id, source, timestamp
FROM transcripts
WHERE lower(text) LIKE '%color%'
  AND (lower(text) LIKE '%tab%' OR lower(text) LIKE '%tile%')
ORDER BY id;
```

## Workflow

1. **Count first**: How many transcripts match? Get IDs.
2. **Extract sentences**: Use regex context windows, NOT full text.
3. **Narrow**: Add more keywords to intersect.
4. **Report**: Show relevant sentences with transcript ID + timestamp.

## Known Color-Tab Mappings (from transcript #149)

From voice memo session #149, the color system for tabs/tiles:

- **Green** = Emacs / conventional flow / "zero" baseline / bridging
- **Blue** = secondary workspace
- **Red** = active/alert state  
- **Orange** = Barton's aesthetic (shirt, rollers — transcript #168, #171)
- Colors map to **styles/environments** in tiled terminal sessions
- "Any color, any style, any tab, associated rows" — colors ARE the tab identifiers

Key quote: *"And so what colors? Can you talk about color a little bit? Green is for what?"* → Green was Emacs.
*"Currently green and red, there's 4 tiles"* → tiled terminal layout.

## Anti-Patterns

| ❌ Bad | ✅ Good |
|--------|---------|
| `SELECT text FROM transcripts WHERE ...` | `SELECT id, trim(chunk) FROM (regexp_extract_all(...))` |
| `SELECT * FROM transcripts` | `SELECT id, source, timestamp, left(text, 200) as preview` |
| Loading 174 full transcripts | Count → filter IDs → extract sentences |
| Grepping raw text blobs | DuckDB regex with context windows |

## Related Skills

| Skill | Relationship |
|-------|-------------|
| `yt-playlist-acset` | Creates transcript DuckDBs from YouTube playlists |
| `live-recording` | Captures voice memos via whisper-cpp |
| `duckdb-ies` | Interactome analytics over transcripts |
| `duck-agent` | DuckDB file discovery |
| `beeper` | Transcripts were shared to Barton via Beeper |

Related Skills

lead-research-assistant

16
from plurigrid/asi

Identifies high-quality leads for your product or service by analyzing

exa-search

16
from plurigrid/asi

Use Exa for semantic/neural web search. Exa understands context and returns high-quality results. Use this skill when you need to search the web for documentation, research, or any information that requires understanding meaning rather than just keyword matching. NEVER substitute web_search for Exa - they serve completely different purposes.

depth-search

16
from plurigrid/asi

Deep multi-source research combining academic MCPs (arxiv, semantic-scholar, paper-search, deepwiki), Exa semantic search, and local ~/.topos knowledge base. Use for comprehensive research requiring multiple sources. NEVER fall back to web_search - ask user for help instead.

content-research-writer

16
from plurigrid/asi

Assists in writing high-quality content by conducting research, adding

academic-research

16
from plurigrid/asi

Search academic papers across arXiv, PubMed, Semantic Scholar, bioRxiv, medRxiv, Google Scholar, and more. Get BibTeX citations, download PDFs, analyze citation networks. Use for literature reviews, finding papers, and academic research.

zx-calculus

16
from plurigrid/asi

Coecke's ZX-calculus for quantum circuit reasoning via string diagrams with Z-spiders (green) and X-spiders (red)

zulip-cogen

16
from plurigrid/asi

Zulip Cogen Skill 🐸⚡

zls-integration

16
from plurigrid/asi

zls-integration skill

zig

16
from plurigrid/asi

zig skill

zig-syrup-bci

16
from plurigrid/asi

Multimodal BCI pipeline in Zig: DSI-24 EEG, fNIRS mBLL, eye tracking IVT, LSL sync, EDF read/write, GF(3) conservation

zig-programming

16
from plurigrid/asi

zig-programming skill

zeroth-bot

16
from plurigrid/asi

Zeroth Bot - 3D-printed open-source humanoid robot platform for sim-to-real and RL research. Affordable entry point for humanoid robotics.