knowledge-base-search

Search and locate relevant content within a local knowledge base (files, indices, or PageIndex). Use when you need verifiable citations (file + page/paragraph) to support answers from local sources.

53 stars

byaipoch

View on GitHub Installation ↓

Best use case

knowledge-base-search is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Search and locate relevant content within a local knowledge base (files, indices, or PageIndex). Use when you need verifiable citations (file + page/paragraph) to support answers from local sources.

Teams using knowledge-base-search should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/knowledge-base-search/SKILL.md --create-dirs "https://raw.githubusercontent.com/aipoch/medical-research-skills/main/scientific-skills/Other/knowledge-base-search/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/knowledge-base-search/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How knowledge-base-search Compares

Feature / Agent	knowledge-base-search	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Search and locate relevant content within a local knowledge base (files, indices, or PageIndex). Use when you need verifiable citations (file + page/paragraph) to support answers from local sources.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

# Knowledge Base Search

## When to Use
- You need to find specific facts, definitions, or procedures from a local knowledge base and return the exact source location.
- You must provide traceable citations (file path + page/paragraph/section) for audit, compliance, or review.
- You need to verify the original wording of a claim in the source document (quote-level validation).
- You want to compare how multiple local documents discuss the same topic and identify differences.
- You need to assemble supporting snippets for a report, FAQ, or internal knowledge response using only local materials.

## Key Features
- Supports multiple retrieval approaches: direct file search, index-based search, and PageIndex-style location mapping.
- Query strategy guidance: keyword splitting, synonym expansion, and optional filters (time range, file type, tags).
- Relevance-oriented result ranking and filtering to keep the most supportive evidence first.
- Outputs verifiable hit snippets with precise citation locations (file + page/paragraph/section when available).
- Enforces local-only boundaries: searches only within authorized directories and does not modify source content.

## Dependencies
- `glob` (>= 10.0.0): file path pattern matching
- `grep` (>= 3.11): in-file text searching
- Local knowledge base index files (one or more of: filename index, content index, vector index, PageIndex mapping)
- `assets/hit_list_template.csv`: standardized hit list output template
- Optional reference: `references/guide.md` (output formats, checklists, inspection points)

## Example Usage
The following example demonstrates an end-to-end local search workflow and produces a CSV hit list compatible with `assets/hit_list_template.csv`.

### Inputs
- Knowledge base root: `./kb/`
- Query: `How do we rotate API keys?`
- Filters: file types `md,pdf`, time range `2024-01-01..2026-12-31`

### Steps
1. **Confirm index and scope**
   - Ensure the search scope is limited to authorized paths (e.g., `./kb/`).
   - Identify available indices:
     - filename/content index (fast keyword search)
     - vector index (semantic retrieval)
     - PageIndex mapping (page/paragraph location resolution)

2. **Build the query**
   - Keywords: `rotate`, `API key`, `key rotation`
   - Synonyms/variants: `credential rotation`, `token rotation`, `regenerate key`
   - Filters:
     - file type: `*.md`, `*.pdf`
     - time range: `2024-01-01..2026-12-31` (if metadata exists)

3. **Execute search (local-only)**
   - Path discovery (example):
     - `glob("./kb/**/*.md")`
     - `glob("./kb/**/*.pdf")`
   - Content search (example):
     - `grep -RIn "API key\|key rotation\|rotate" ./kb/`

4. **Filter and rank results**
   - Keep hits that directly answer the question (procedure, policy, steps, constraints).
   - Rank by:
     - term proximity (e.g., “rotate” near “API key”)
     - section relevance (e.g., “Security”, “Credentials”, “Operations”)
     - coverage (hits that include prerequisites + steps + verification)

5. **Output citations and hit list**
   - For each hit, output:
     - `file_path`
     - `location` (page number for PDFs; heading/paragraph index for Markdown; PageIndex if available)
     - `snippet` (verbatim excerpt supporting the conclusion)
     - `notes` (why it is relevant; any assumptions)
   - Save as `hit_list.csv` using `assets/hit_list_template.csv` columns.

### Example Output (CSV rows)
```csv
file_path,location,snippet,relevance_score,notes
kb/security/credential_policy.pdf,page 12,"API keys must be rotated every 90 days... Rotation requires...",0.92,"Direct policy + rotation interval + procedure reference."
kb/runbooks/api_key_rotation.md,section 'Procedure' ¶3,"To rotate an API key: (1) create a new key... (2) update services... (3) revoke old key...",0.89,"Step-by-step operational runbook."
kb/audit/controls.md,heading 'Key Management' ¶2,"Evidence of rotation includes change tickets and key revocation logs...",0.81,"Provides verification/evidence requirements."
```

## Implementation Details
### Retrieval Workflow
1. **Index confirmation**
   - Determine knowledge base root paths and last update time (if available).
   - Detect which indices exist:
     - filename index: quick narrowing by file names
     - content index: inverted index / grep-like scanning
     - vector index: semantic similarity retrieval
     - PageIndex: mapping from document offsets to page/paragraph identifiers

2. **Query strategy**
   - Tokenize the question into:
     - core entities (e.g., “API key”)
     - actions (e.g., “rotate”, “revoke”, “regenerate”)
     - constraints (e.g., “every 90 days”, “approval required”)
   - Expand with synonyms and variants.
   - Apply filters when metadata exists:
     - time range
     - file type
     - tags/collections

3. **Result filtering and ranking**
   - Remove low-signal hits (navigation, boilerplate, unrelated mentions).
   - Rank by a weighted score (example):
     - **Keyword match** (exact phrase > partial): 0.45
     - **Proximity** (terms close together): 0.20
     - **Section importance** (titles like “Procedure/Policy”): 0.20
     - **Coverage** (answers include steps + constraints + verification): 0.15
   - Keep the original text snippet verbatim for verification.

4. **Citation and location resolution**
   - Markdown/text:
     - use heading + paragraph index (or line range) as the primary locator
   - PDF:
     - use page number; optionally include bounding text around the hit
   - PageIndex (if present):
     - map internal offsets to stable `page/paragraph` identifiers

### Constraints and Limitations
- Search only within user-authorized local directories.
- Do not modify source documents.
- Do not execute scripts or arbitrary code.
- Do not access network resources or external APIs.
- If indices are missing/corrupted, fall back to direct file scanning; if scanning is not possible, report the limitation and required remediation (re-indexing).

Related Skills

two-sample-mr-research-planner

from aipoch/medical-research-skills

Generates complete two-sample Mendelian randomization (MR) research designs from a user-provided research direction. Use when users want to design, plan, or build a study using two-sample MR to test causal relationships. Triggers:"design a two-sample MR study", "build a publishable MR paper", "test whether this biomarker causally affects this disease", "generate Lite/Standard/Advanced MR plans", "screen multiple exposures with MR", "bidirectional MR design", "causal inference using GWAS summary statistics", or "I want to study X and Y using MR". Always outputs four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.

research-proposal-generator

from aipoch/medical-research-skills

Generates a comprehensive research proposal design based on input literature, including hypothesis, mechanism verification, and budget. Use when the user wants to design a research project from a paper.

research-grants

from aipoch/medical-research-skills

Write competitive research proposals for NSF, NIH, DOE, DARPA, and Taiwan's NSTC when you need agency-compliant narratives, budgets, and review-criteria alignment for a specific solicitation/FOA/BAA.

non-tumor-ml-research-planner

from aipoch/medical-research-skills

Generates complete non-tumor biomedical machine learning research designs from a user-provided research direction. Always use this skill when users want to plan bioinformatics + ML papers for non-cancer diseases (metabolic, cardiovascular, kidney, inflammatory, autoimmune, infectious, neurological, endocrine, wound healing, chronic multifactor), design diagnostic biomarker studies, combine GEO datasets with feature selection and ML modeling, or generate Lite/Standard/Advanced/Publication+ workload plans. Trigger for:"non-tumor ML study", "bioinformatics paper outside oncology", "key genes and diagnostic model for a disease", "pyroptosis/ferroptosis/senescence/autophagy + disease", "GEO datasets + machine learning", "RF + LASSO diagnostic model", "DEG + feature selection + validation", "immune infiltration + biomarker", "non-cancer biomarker paper". Trigger even for casual phrasings like "I want to study X using machine learning", "help me design a non-tumor bioinformatics paper", or "how do I build a diagnostic model for disease Y".

network-tox-docking-research-planner

from aipoch/medical-research-skills

Generates complete network toxicology + molecular docking research designs from a user-provided toxicant and disease/phenotype. Always use this skill when users want to investigate how an environmental toxicant, endocrine disruptor, heavy metal, food contaminant, pharmaceutical residue, or consumer product chemical may contribute to a disease through shared molecular targets, hub genes, pathways, and docking evidence. Trigger for:"network toxicology study", "toxicology mechanism paper", "target prediction + PPI + docking", "environmental pollutant and disease mechanism", "hub genes and docking for toxicant", "Lite/Standard/Advanced toxicology plan", "CTD + SwissTargetPrediction + GeneCards + STRING", "CB-Dock2 docking study", "triclosan/BPA/cadmium/PFAS + disease". Also triggers for Chinese phrasings:"网络毒理学研究设计"、"毒物机制论文"、"靶点预测+PPI+对接"、"环境污染物与疾病机制". Trigger even for casual phrasings like "I want to study how chemical X affects disease Y" or "help me design a toxicology paper". Always output four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.

clinic-research-design

from aipoch/medical-research-skills

Generates a structured prompt framework for clinical study protocols. Supports Diagnostic, Efficacy, Etiology, and Prognosis studies. Calculates sample size and provides logic guides for LLMs.

basic-research-design

from aipoch/medical-research-skills

A biomedical research topic designer that generates progressive experimental subtitles and detailed research outlines based on a given subject. Use when the user wants to design a research proposal, outline experiments for a topic, or structure a biomedical study.

uspto-database

from aipoch/medical-research-skills

Access USPTO data (Patent Search, PEDS, TSDR, assignments) when you need to query patents/trademarks and retrieve prosecution or status information programmatically.

search-pubmed

from aipoch/medical-research-skills

An intelligent tool for precision medical literature search using PubMed's E-utilities API.

meta-search-builder

from aipoch/medical-research-skills

Medical literature search strategy generator. Given a user's natural-language description (e.g., meta-analysis topic, PICOS elements, research question), automatically extract medical entities (disease, intervention, population, outcomes) and generate professional search queries for seven major databases (PubMed, Cochrane, Embase, Web of Science, CNKI, Wanfang, VIP). Useful for developing search strategies for systematic reviews and meta-analyses.

market-research-report-generator

from aipoch/medical-research-skills

Generates professional market research reports by analyzing business intent, decision levels, and conducting multi-source data retrieval (Web, PubMed, Clinical Trials).

file-search

from aipoch/medical-research-skills

Perform fast file name and content searches with ripgrep (rg); use it when you need to locate files by glob/regex, find keywords across directories, or replace common find/grep workflows.