kol-discovery

Find Key Opinion Leaders (KOLs) in a given domain by combining web research with LinkedIn post search. Given a company/idea and target domain, generates authority keywords, searches LinkedIn posts to find prolific authors with high engagement, and merges with web-researched influencers. Use when someone wants to "find influencers in X space" or "who are the KOLs for Y industry."

381 stars

bygooseworks-ai

View on GitHub Installation ↓

Best use case

kol-discovery is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using kol-discovery should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/kol-discovery/SKILL.md --create-dirs "https://raw.githubusercontent.com/gooseworks-ai/goose-skills/main/skills/capabilities/kol-discovery/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/kol-discovery/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How kol-discovery Compares

Feature / Agent	kol-discovery	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

SKILL.md Source

# KOL Discovery

Find Key Opinion Leaders in any domain by searching LinkedIn posts for prolific, high-engagement authors and merging with web-researched influencers.

**Core principle:** Search for **authority/thought-leadership keywords**, not pain-language. We want people who shape conversation in the space — conference speakers, newsletter writers, podcast hosts, and prolific LinkedIn posters.

## Phase 0: Intake

Ask the user these questions:

### Domain & Audience

1. What does your company/product do? What space are you in?
2. What specific domain or topic are the KOLs you want to find expert in?
3. Who is your target audience? (The people the KOLs influence)
4. Any KOLs you already know about? (LinkedIn URLs — these become the baseline)
5. Anyone to EXCLUDE? (Competitors, your own team, irrelevant voices)

## Phase 1: Generate Domain Keywords

Based on intake, generate 15-25 topic/authority keywords. These are NOT pain-language — they're the terms thought leaders use when sharing expertise:

- **Industry terms** — "freight tech", "supply chain innovation"
- **Thought leadership signals** — "lessons learned in logistics", "future of dispatch"
- **Conference/event terms** — "supply chain summit keynote"
- **Content creator signals** — "newsletter freight", "podcast logistics"

Also generate:
- **KOL title keywords** — titles that signal thought leadership (vp, founder, analyst, editor, host)
- **Vendor exclusion keywords** — titles to filter out (software engineer, recruiter, saas)
- **Domain relevance keywords** — core industry terms for relevance scoring

**Present keywords to user for approval before running.**

Save config in the client workspace:
```bash
clients/{client-name}/configs/kol-discovery.json
```

Config JSON structure:
```json
{
  "client_name": "example",
  "domain_keywords": ["\"freight tech\" thought leadership", "supply chain innovation"],
  "exclusion_patterns": ["hiring.*position", "we.re recruiting"],
  "kol_title_keywords": ["vp", "founder", "analyst", "editor", "host"],
  "vendor_exclude_keywords": ["software engineer", "saas", "recruiter"],
  "domain_relevance_keywords": ["freight", "logistics", "supply chain"],
  "country_filter": "",
  "max_posts_per_keyword": 50,
  "min_posts": 2,
  "min_total_engagement": 50,
  "top_n_kols": 50
}
```

## Phase 2: Run KOL Discovery Pipeline

```bash
python3 skills/kol-discovery/scripts/kol_discovery.py \
  --config clients/{client-name}/configs/kol-discovery.json \
  --output-dir clients/{client-name}/leads \
  [--test] [--web-kols clients/{client-name}/configs/kol-web-kols.json] [--yes]
```

**Flags:**
- `--config` (required) — path to client config JSON
- `--output-dir` — directory for output CSV (default: current working directory)
- `--test` — limit to 5 keywords (validation run)
- `--web-kols` — path to web-researched KOL JSON (agent generates this)
- `--yes` — skip cost confirmation prompts
- `--max-runs` — override Apify run limit

**What the script does:**

1. **Keyword search** — `harvestapi/linkedin-post-search` for each domain keyword
2. **Author aggregation** — Group posts by author, compute engagement metrics
3. **Scoring** — Composite KOL score: engagement volume (log-scaled) + consistency (post count) + quality (avg engagement) + relevance (keyword breadth) + web research bonus
4. **Merge** — Combine post-data KOLs with web-researched KOLs, flag overlaps
5. **Export** — Ranked CSV

**Cost estimate:** ~$0.10 per keyword. Full run with 20 keywords: ~$2-3.

**Always run with `--test` first.**

## Phase 2b: Web Research (Agent-Driven)

Before or alongside the script, do web research to find known KOLs:
- Search for "top [industry] influencers on LinkedIn"
- Find conference speakers, newsletter authors, podcast hosts
- Check industry publications for frequent contributors

Save as JSON in the client workspace:
```bash
clients/{client-name}/configs/kol-web-kols.json
```

```json
[
  {
    "name": "Jane Doe",
    "linkedin_url": "https://www.linkedin.com/in/janedoe/",
    "source": "FreightWaves conference speaker 2025",
    "notes": "Hosts weekly logistics podcast"
  }
]
```

Pass to script via `--web-kols`.

## Phase 3: Review & Refine

Present results:
- **Top 20 KOLs** — rank, name, headline, KOL score, total engagement, top post
- **Source breakdown** — how many from post-data vs web-research vs both
- **Keyword performance** — which keywords surfaced the most KOLs

Common adjustments:
- **Too many irrelevant authors** — refine domain keywords, add exclusion patterns
- **Missing known KOLs** — add more keyword variants, expand web research
- **Too few results** — lower `min_posts` or `min_total_engagement` thresholds

## Phase 4: Output

CSV exported to `clients/{client-name}/leads/{client-name}-kols-{date}.csv`:

| Column | Description |
|--------|-------------|
| Rank | Overall rank by KOL Score |
| Name | Full name |
| LinkedIn URL | Profile link |
| Headline | From LinkedIn |
| KOL Score | Composite score |
| Total Posts | Posts found in search |
| Total Reactions | Sum of reactions across posts |
| Total Comments | Sum of comments across posts |
| Avg Engagement | Average reactions+comments per post |
| Top Post URL | Highest engagement post |
| Top Post Preview | First 100 chars of top post |
| Source | post-data / web-research / both |

## Tools Required

- **Apify API token** — set as `APIFY_API_TOKEN` in `.env`
- **Apify actors used:**
  - `harvestapi/linkedin-post-search` (keyword search)

## Example Usage

**Trigger phrases:**
- "Find KOLs in the freight/logistics space"
- "Who are the influencers in [industry]?"
- "Discover thought leaders for [domain]"
- "Run KOL discovery for [client]"

**With existing config:**
```bash
python3 skills/kol-discovery/scripts/kol_discovery.py \
  --config clients/example/configs/kol-discovery.json \
  --output-dir clients/example/leads --yes
```

Related Skills

lead-discovery

381

from gooseworks-ai/goose-skills

Orchestrator that runs first for lead generation requests. Gathers business context via website analysis or questions, identifies competitors, builds ICP, and routes to signal skills with pre-filled inputs.

linkedin-influencer-discovery

381

from gooseworks-ai/goose-skills

Discover top LinkedIn influencers and voices by topic, industry, follower count, and country. Use when you need to find the top 100 voices in a space, build influencer lists for outreach, or identify thought leaders on LinkedIn.

customer-discovery

381

from gooseworks-ai/goose-skills

Discover all customers of a given company by scanning websites, case studies, review sites, press, social media, job postings, and more. Use when you need competitive intelligence on who a company sells to.

signal-detection-pipeline

381

from gooseworks-ai/goose-skills

Detect buying signals from multiple sources, qualify leads, and generate outreach context

seo-content-engine

381

from gooseworks-ai/goose-skills

Build and run an SEO content engine: audit current state, identify gaps, build keyword architecture, generate content calendar, draft content.

outbound-prospecting-engine

381

from gooseworks-ai/goose-skills

End-to-end outbound prospecting: detect intent signals, research companies, find decision-maker contacts, personalize messaging, launch campaign.

event-prospecting-pipeline

381

from gooseworks-ai/goose-skills

Find attendees at conferences/events, research their companies, qualify against ICP, and launch outreach

competitor-monitoring-system

381

from gooseworks-ai/goose-skills

Set up and run ongoing competitive intelligence monitoring for a client. Tracks competitor content, ads, reviews, social, and product moves.

client-packet-engine

381

from gooseworks-ai/goose-skills

Batch client packet generator. Takes company names/URLs, runs intelligence + strategy generation, presents strategies for human selection, executes selected strategies in pitch-packet mode (no live campaigns or paid enrichment), and packages into local delivery packets.

client-package-notion

381

from gooseworks-ai/goose-skills

Package all work done for a client into a shareable Notion page with subpages and Google Sheets. Reads the client's folder (strategies, campaigns, content, leads, notes) and builds a structured Notion workspace the client can browse. Lead list CSVs are uploaded to Google Sheets and linked from the Notion pages. Use when you want to deliver work to a client in a polished, navigable format.

client-package-local

381

from gooseworks-ai/goose-skills

Package all work done for a client into a local filesystem delivery package with .md files and Google Sheets. Reads the client's folder (strategies, campaigns, content, leads, notes) and builds a structured directory with dated deliverables. Lead lists are uploaded to Google Sheets and linked from the markdown files. Use when you want to deliver work to a client in a polished, navigable format without requiring Notion.

client-onboarding

381

from gooseworks-ai/goose-skills

Full client onboarding: intelligence gathering, synthesis into Client Intelligence Package, and growth strategy generation. Phases 1-3 of the Client Launch Playbook.