ClaudeCursorCodexGitHub CopilotAI Research & Knowledge Management

bdistill-knowledge-extraction

Extract structured domain knowledge from AI models in-session or from local open-source models via Ollama. No API key needed.

31,392 stars

bysickn33

Complexity: medium

View on GitHub Installation ↓

About this skill

The `bdistill-knowledge-extraction` skill empowers AI agents to robustly extract and structure domain-specific knowledge. It operates by analyzing AI model responses—either generated in the current conversational session from closed models (eliminating the need for external API keys) or from locally hosted open-source models powered by Ollama. This skill transforms raw AI outputs into organized, quality-scored knowledge artifacts, effectively turning AI subscription sessions into a continuously growing, proprietary knowledge base. By automating the structuring and evaluation of extracted information, it supports the creation of valuable data moats, detailed reference data, and specialized training materials, enhancing an agent's ability to retain and leverage past interactions.

Best use case

Building proprietary domain-specific knowledge bases; Generating structured datasets for machine learning training; Creating internal reference materials from AI interactions; Extracting specific entities, relationships, and facts from large volumes of AI-generated text; Developing specialized 'data moats' to enhance AI capabilities without external dependencies; Supporting AI research by systematically capturing model insights.

Extract structured domain knowledge from AI models in-session or from local open-source models via Ollama. No API key needed.

A collection of structured, quality-scored knowledge artifacts (e.g., JSON, YAML, or database entries) representing domain-specific information extracted from AI models, contributing to a growing and verifiable knowledge base; Reduced reliance on external APIs for specific data extraction.

Practical example

Example input

```json
{
  "skill": "bdistill-knowledge-extraction",
  "command": "extract_knowledge",
  "parameters": {
    "topic": "quantum computing applications in medicine",
    "focus_areas": ["drug discovery", "diagnostic imaging", "personalized medicine"],
    "quality_threshold": 0.8,
    "output_format": "json"
  }
}
```

Example output

```json
{
  "extracted_knowledge": [
    {
      "concept": "Quantum Simulation for Drug Discovery",
      "description": "Quantum computers can simulate molecular interactions at an atomic level, accelerating the discovery of new drugs by predicting compound behavior more accurately than classical methods.",
      "related_terms": ["molecular modeling", "drug design", "protein folding"],
      "quality_score": 0.92
    },
    {
      "concept": "Quantum Sensors in Diagnostic Imaging",
      "description": "Highly sensitive quantum sensors could lead to more precise and earlier disease detection through advanced imaging techniques, such as improved MRI or novel detection of biomarkers.",
      "related_terms": ["MRI enhancement", "biomarker detection", "medical diagnostics"],
      "quality_score": 0.88
    }
  ],
  "source_model": "in-session_claude",
  "timestamp": "2024-07-30T10:00:00Z"
}
```

When to use this skill

When you need to systematically capture and organize information discussed or generated by an AI model; When you want to leverage open-source LLMs locally to extract knowledge without incurring API costs; When the goal is to create a structured, verifiable knowledge base from AI interactions rather than just receiving free-form text; When aiming to develop unique domain expertise by compounding AI-generated insights; When data privacy or cost-efficiency are critical concerns for knowledge extraction.

When not to use this skill

When simple, unstructured summaries or direct answers from an AI model are sufficient; When the primary need is real-time access to dynamic, external data sources that are not part of the AI's internal knowledge; When the domain knowledge required is readily available through conventional search or existing databases and doesn't need to be extracted from an AI's internal model knowledge; If the overhead of structuring and quality-scoring is not justified by the specific use case.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bdistill-knowledge-extraction/SKILL.md --create-dirs "https://raw.githubusercontent.com/sickn33/antigravity-awesome-skills/main/plugins/antigravity-awesome-skills-claude/skills/bdistill-knowledge-extraction/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bdistill-knowledge-extraction/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bdistill-knowledge-extraction Compares

Feature / Agent	bdistill-knowledge-extraction	Standard Approach
Platform Support	Claude, Cursor, Codex, GitHub Copilot	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	medium	N/A

Frequently Asked Questions

What does this skill do?

Extract structured domain knowledge from AI models in-session or from local open-source models via Ollama. No API key needed.

Which AI agents support this skill?

This skill is designed for Claude, Cursor, Codex, GitHub Copilot.

How difficult is it to install?

The installation complexity is rated as medium. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

SKILL.md Source

# Knowledge Extraction

Extract structured, quality-scored domain knowledge from any AI model — in-session from closed models (no API key) or locally from open-source models via Ollama.

## Overview

bdistill turns your AI subscription sessions into a compounding knowledge base. The agent answers targeted domain questions, bdistill structures and quality-scores the responses, and the output accumulates into a searchable, exportable reference dataset.

Adversarial mode challenges the agent's claims — forcing evidence, corrections, and acknowledged limitations — producing validated knowledge entries.

## When to Use This Skill

- Use when you need structured reference data on any domain (medical, legal, finance, cybersecurity)
- Use when building lookup tables, Q&A datasets, or research corpora
- Use when generating training data for traditional ML models (regression, classification — NOT competing LLMs)
- Use when you want cross-model comparison on domain knowledge

## How It Works

### Step 1: Install

```bash
pip install bdistill
claude mcp add bdistill -- bdistill-mcp   # Claude Code
```

### Step 2: Extract knowledge in-session

```
/distill medical cardiology                    # Preset domain
/distill --custom kubernetes docker helm       # Custom terms
/distill --adversarial medical                 # With adversarial validation
```

### Step 3: Search, export, compound

```bash
bdistill kb list                               # Show all domains
bdistill kb search "atrial fibrillation"       # Keyword search
bdistill kb export -d medical -f csv           # Export as spreadsheet
bdistill kb export -d medical -f markdown      # Readable knowledge document
```

## Output Format

Structured reference JSONL — not training data:

```json
{
  "question": "What causes myocardial infarction?",
  "answer": "Myocardial infarction results from acute coronary artery occlusion...",
  "domain": "medical",
  "category": "cardiology",
  "tags": ["mechanistic", "evidence-based"],
  "quality_score": 0.73,
  "confidence": 1.08,
  "validated": true,
  "source_model": "Claude Sonnet 4"
}
```

## Tabular ML Data Generation

Generate structured training data for traditional ML models:

```
/schema sepsis | hr:float, bp:float, temp:float, wbc:float | risk:category[low,moderate,high,critical]
```

Exports as CSV ready for pandas/sklearn. Each row tracks source_model for cross-model analysis.

## Local Model Extraction (Ollama)

For open-source models running locally:

```bash
# Install Ollama from https://ollama.com
ollama serve
ollama pull qwen3:4b

bdistill extract --domain medical --model qwen3:4b
```

## Security & Safety Notes

- In-session extraction uses your existing subscription — no additional API keys
- Local extraction runs entirely on your machine via Ollama
- No data is sent to external services
- Output is reference data, not LLM training format

## Related Skills

- `@bdistill-behavioral-xray` - X-ray a model's behavioral patterns

Related Skills

bdistill-behavioral-xray

31392

from sickn33/antigravity-awesome-skills

X-ray any AI model's behavioral patterns — refusal boundaries, hallucination tendencies, reasoning style, formatting defaults. No API key needed.

AI Testing & EvaluationClaudeCursorCodex

nft-standards

31392

from sickn33/antigravity-awesome-skills

Master ERC-721 and ERC-1155 NFT standards, metadata best practices, and advanced NFT features.

Web3 & BlockchainClaude

nextjs-app-router-patterns

31392

from sickn33/antigravity-awesome-skills

Comprehensive patterns for Next.js 14+ App Router architecture, Server Components, and modern full-stack React development.

Web FrameworksClaude

new-rails-project

31392

from sickn33/antigravity-awesome-skills

Create a new Rails project

Code GenerationClaude

networkx

31392

from sickn33/antigravity-awesome-skills

NetworkX is a Python package for creating, manipulating, and analyzing complex networks and graphs.

Network AnalysisClaude

network-engineer

31392

from sickn33/antigravity-awesome-skills

Expert network engineer specializing in modern cloud networking, security architectures, and performance optimization.

Network EngineeringClaude

nestjs-expert

31392

from sickn33/antigravity-awesome-skills

You are an expert in Nest.js with deep knowledge of enterprise-grade Node.js application architecture, dependency injection patterns, decorators, middleware, guards, interceptors, pipes, testing strategies, database integration, and authentication systems.

Frameworks & LibrariesClaude

nerdzao-elite

31392

from sickn33/antigravity-awesome-skills

Senior Elite Software Engineer (15+) and Senior Product Designer. Full workflow with planning, architecture, TDD, clean code, and pixel-perfect UX validation.

Software DevelopmentClaude

nerdzao-elite-gemini-high

31392

from sickn33/antigravity-awesome-skills

Modo Elite Coder + UX Pixel-Perfect otimizado especificamente para Gemini 3.1 Pro High. Workflow completo com foco em qualidade máxima e eficiência de tokens.

Software DevelopmentClaudeGemini

native-data-fetching

31392

from sickn33/antigravity-awesome-skills

Use when implementing or debugging ANY network request, API call, or data fetching. Covers fetch API, React Query, SWR, error handling, caching, offline support, and Expo Router data loaders (useLoaderData).

API IntegrationClaude

n8n-workflow-patterns

31392

from sickn33/antigravity-awesome-skills

Proven architectural patterns for building n8n workflows.

Workflow AutomationClaude

n8n-validation-expert

31392

from sickn33/antigravity-awesome-skills

Expert guide for interpreting and fixing n8n validation errors.

Workflow AutomationClaude