audio-transcription
Transcribe audio and video files into structured notes. Activate this skill when users want to transcribe recordings, meetings, podcasts, voice memos, or any audio/video content in their vault.
Best use case
audio-transcription is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Transcribe audio and video files into structured notes. Activate this skill when users want to transcribe recordings, meetings, podcasts, voice memos, or any audio/video content in their vault.
Teams using audio-transcription should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/audio-transcription/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How audio-transcription Compares
| Feature / Agent | audio-transcription | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Transcribe audio and video files into structured notes. Activate this skill when users want to transcribe recordings, meetings, podcasts, voice memos, or any audio/video content in their vault.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
# Audio & Video Transcription Transcribe audio and video files from the vault into structured Obsidian notes using `read_file` to send binary media directly to the model. ## Supported Formats Audio: `.wav`, `.mp3`, `.aac`, `.flac`, `.webm` (audio-only) Video: `.mp4`, `.mpeg`, `.mov`, `.flv`, `.mpg`, `.webm`, `.wmv`, `.3gp` **Size limit:** 20 MB per file (Gemini inline data limit). ## How to Transcribe 1. Use `read_file` with the path to the audio/video file — the binary data is sent directly to the model for processing. 2. Listen to/watch the content and produce a transcription. 3. Use `write_file` to save the transcription as a markdown note. ## Transcription Format Structure transcriptions as follows: ```markdown --- tags: - transcription source: "[[original-file.mp3]]" date: YYYY-MM-DD duration: "MM:SS" (estimate if possible) --- # Transcription: [Title] ## Summary Brief 2-3 sentence summary of the content. ## Transcript [00:00] Speaker 1: Opening remarks... [00:45] Speaker 2: Response... [01:30] Speaker 1: Follow-up... ``` ### Guidelines - **Timestamps**: Include approximate timestamps in `[MM:SS]` format at natural breaks (new speakers, topic changes, pauses). - **Speaker identification**: Label distinct speakers as "Speaker 1", "Speaker 2", etc. If names are mentioned, use them after first identification. - **Filler words**: Omit excessive filler words (um, uh, like) unless they carry meaning. - **Inaudible sections**: Mark unclear audio as `[inaudible]` or `[unclear]`. - **Non-speech sounds**: Note significant sounds like `[laughter]`, `[applause]`, `[music]`. - **Summary**: Always include a brief summary at the top for quick reference. - **Frontmatter**: Link back to the source file using a wikilink. ## Tips - For long recordings, let the user know the transcription may be partial due to the 20 MB size limit. Suggest splitting large files with an external tool. - If the user asks to "transcribe the recording in this note", use `read_file` on the current note first to find embedded audio/video links (e.g., `![[recording.mp3]]`), then `read_file` on the linked file. - For meeting notes, suggest adding attendees and action items sections after the transcript. - For podcasts or interviews, suggest adding a "Key Topics" section with timestamps.
Related Skills
vault-semantic-search
Search vault notes by meaning using semantic search (RAG). Activate this skill when users want to find notes by concept or topic rather than exact keywords, or when keyword search tools return poor results.
recall-sessions
Search past agent conversations to recall prior discussions, decisions, and context. Activate this skill when users ask about previous conversations, want to resume past work, or reference earlier decisions.
obsidian-properties
Work with Obsidian note properties (frontmatter). Activate this skill when users want to add, modify, or organize properties, understand property types, format YAML frontmatter, or use properties with templates, search, or Bases.
obsidian-bases
Create and configure Obsidian Bases — database-like views of notes. Activate this skill when users want to create bases, write filters, formulas, or set up table/cards/list/map views.
image-generation
Generate images from text descriptions and save them to the vault. Activate this skill when users want to create illustrations, diagrams, visual content, or any AI-generated images.
gemini-scribe-help
Answer questions about Gemini Scribe plugin features, settings, and usage. Activate this skill when users ask how to use the plugin, configure settings, or troubleshoot issues.
deep-research
Conduct comprehensive, multi-source research and generate cited reports. Activate this skill when users want in-depth research on a topic, need synthesis across web and vault sources, or want a structured research report saved to their vault.
ui-ux-guidelines
UI/UX best practices for obsidian-gemini plugin development. Covers modal sizing, text overflow, message formatting, collapsible UI, animations, icons, file chips, session state, CSS containment, and theme compatibility. Use this skill when building or modifying UI components.
release-process
Full release workflow for obsidian-gemini: update release notes, run checks, bump version with npm, create a GitHub release, and verify. Use this skill when preparing a new plugin release.
obsidian-plugin-development
Build, modify, and debug Obsidian plugins using the TypeScript API. Use this skill when working with Obsidian plugin source code, the obsidian npm package, plugin UI (views, modals, settings, commands, ribbons), vault file operations, editor manipulation, workspace management, metadata cache, events, markdown rendering, or the Obsidian CLI. Covers plugin lifecycle, best practices, common patterns, and the full TypeScript API surface.
obsidian-cli
Use the Obsidian CLI to debug, inspect, and test Obsidian plugins during development. Covers plugin reloading, console inspection, runtime evaluation, and common debugging recipes for the gemini-scribe plugin.
gemini-api-dev
Use this skill when building applications with Gemini models, Gemini API, working with multimodal content (text, images, audio, video), implementing function calling, using structured outputs, or needing current model specifications. Covers SDK usage (google-genai for Python, @google/genai for JavaScript/TypeScript, com.google.genai:google-genai for Java, google.golang.org/genai for Go), model selection, and API capabilities.