customer-discovery

Discover all customers of a given company by scanning websites, case studies, review sites, press, social media, job postings, and more. Use when you need competitive intelligence on who a company sells to.

381 stars

bygooseworks-ai

View on GitHub Installation ↓

Best use case

customer-discovery is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using customer-discovery should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/customer-discovery/SKILL.md --create-dirs "https://raw.githubusercontent.com/gooseworks-ai/goose-skills/main/skills/capabilities/customer-discovery/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/customer-discovery/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How customer-discovery Compares

Feature / Agent	customer-discovery	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

SKILL.md Source

# Customer Discovery

Find all customers of a company by scanning multiple public data sources. Produces a deduplicated report with confidence scoring.

## Quick Start

```
Find all customers of Datadog
```

```
Who are Notion's customers? Use deep mode.
```

## Inputs

| Input | Required | Default | Description |
|-------|----------|---------|-------------|
| Company name | Yes | — | The company to research |
| Website URL | No | Auto-detected | The company's website URL |
| Depth | No | standard | `quick`, `standard`, or `deep` |

## Procedure

### Step 1: Gather Inputs

Ask the user for:
1. **Company name** (required)
2. **Company website URL** (optional — if not provided, WebSearch for it)
3. **Depth tier** — present these options, default to Standard:
   - **Quick** (~2-3 min): Website logos, case studies, G2 reviews, press search
   - **Standard** (~5-8 min): Quick + blog posts, Wayback Machine, LinkedIn, Twitter, Reddit, HN, job postings, YouTube
   - **Deep** (~10-15 min): Standard + SEC filings, podcasts, GitHub, integration directories, BuiltWith, Crunchbase

### Step 2: Create Output Directory

```bash
mkdir -p customer-discovery-[company-slug]
```

### Step 3: Run Sources for Selected Tier

Collect all results into a running list. For each customer found, record:
- **name**: Company name
- **confidence**: high / medium / low
- **source_type**: e.g., "logo_wall", "case_study", "g2_review", "press", "job_posting"
- **evidence_url**: URL where the evidence was found
- **notes**: Brief description of the evidence

#### Quick Sources

**1. Website logo wall**

Run the scrape_website_logos.py script:
```bash
python3 skills/capabilities/customer-discovery/scripts/scrape_website_logos.py \
  --url "[company-url]" --output json
```

Parse the JSON output and add each result to the customer list.

**2. Case studies page**

Use WebFetch on the company's case studies page (try `/case-studies`, `/customers`, `/resources/case-studies`). Extract customer names from page headings and content.

**3. G2/Capterra reviews**

If the `review-scraper` skill is available, use it to find reviewer companies:
```bash
python3 skills/capabilities/review-scraper/scripts/scrape_reviews.py \
  --platform g2 --url "[g2-product-url]" --max-reviews 50 --output json
```

First, WebSearch for the company's G2 page: `site:g2.com "[company]"`. Extract reviewer company names from review author info.

**4. Web search for press**

WebSearch these queries and extract customer mentions from results:
- `"[company]" customer OR "case study" OR partnership`
- `"[company]" "we use" OR "switched to" OR "chose"`

#### Standard Sources (in addition to Quick)

**5. Company blog posts**

WebSearch: `site:[company-domain] customer OR "case study" OR partnership OR "customer story"`

**6. Wayback Machine logos**

Run the scrape_wayback_logos.py script:
```bash
python3 skills/capabilities/customer-discovery/scripts/scrape_wayback_logos.py \
  --url "[company-url]" --output json
```

Logos marked `still_present: false` are especially interesting — they indicate former customers.

**7. Founder/exec LinkedIn posts**

WebSearch: `site:linkedin.com "[company]" customer OR "excited to announce" OR "welcome"`

**8. Twitter/X mentions**

WebSearch: `site:twitter.com "[company]" "we use" OR "just switched to" OR "loving"`

**9. Reddit/HN mentions**

WebSearch these queries:
- `site:reddit.com "we use [company]" OR "[company] customer"`
- `site:news.ycombinator.com "[company]" customer OR user`

**10. Job postings**

WebSearch: `"experience with [company]" site:linkedin.com/jobs OR site:greenhouse.io OR site:lever.co`

Companies requiring experience with the product are likely customers.

**11. YouTube testimonials**

WebSearch: `site:youtube.com "[company]" customer OR testimonial OR review`

#### Deep Sources (in addition to Standard)

**12. SEC filings**

WebSearch: `site:sec.gov "[company]"` — Look for mentions in 10-K and 10-Q filings.

**13. Podcast transcripts**

WebSearch: `"[company]" podcast customer OR transcript OR interview`

**14. GitHub usage signals**

WebSearch: `site:github.com "[company-package-name]"` in dependency files, package.json, requirements.txt, etc.

**15. Integration directories**

WebFetch marketplace pages where the company lists integrations:
- Salesforce AppExchange
- Zapier integrations page
- Slack App Directory
- Any marketplace relevant to the company

**16. BuiltWith detection**

```bash
python3 skills/capabilities/customer-discovery/scripts/search_builtwith.py \
  --technology "[company-slug]" --max-results 50 --output json
```

**17. Crunchbase**

WebSearch: `site:crunchbase.com "[company]" customers OR partners`

### Step 4: Deduplicate Results

Merge results by company name using fuzzy matching:
- Normalize: lowercase, strip suffixes (Inc, Corp, LLC, Ltd, Co., GmbH)
- Treat "Acme Inc" = "Acme" = "ACME Corp" = "acme.com" as the same company
- When merging, keep the highest confidence level and all evidence URLs

### Step 5: Assign Confidence

Apply these rules:

**High confidence:**
- Logo on current website (from scrape_website_logos.py with confidence "high")
- Published case study or customer story
- Direct quote or testimonial on the company's site
- Official partnership page listing

**Medium confidence:**
- G2/Capterra review (reviewer's company)
- Press article mentioning customer relationship
- Job posting requiring experience with the product
- YouTube testimonial or video review
- Logo found only in Wayback Machine (was on site, now removed)

**Low confidence:**
- Single social media mention (tweet, Reddit post)
- Indirect reference ("heard good things about X")
- BuiltWith detection only (technology on site doesn't mean they're a paying customer)
- HN discussion mention

### Step 6: Generate Report

Create two output files:

**`customer-discovery-[company]/report.md`:**

```markdown
# Customer Discovery: [Company Name]

**Date:** YYYY-MM-DD
**Depth:** quick | standard | deep
**Total customers found:** N

## High Confidence (N)

| Customer | Source | Evidence |
|----------|--------|----------|
| Shopify | Case study | [link] |
| ... | ... | ... |

## Medium Confidence (N)

| Customer | Source | Evidence |
|----------|--------|----------|
| ... | ... | ... |

## Low Confidence (N)

| Customer | Source | Evidence |
|----------|--------|----------|
| ... | ... | ... |

## Sources Scanned

- Website logo wall: [url] — N customers found
- G2 reviews: N reviews analyzed — N companies identified
- Wayback Machine: N snapshots checked — N logos found (N removed)
- Web search: N queries — N mentions
- ...

## Methodology

This report was generated using the customer-discovery skill, which scans
public data sources to identify companies that use [Company Name]. Confidence
levels reflect the strength and directness of the evidence found.
```

**`customer-discovery-[company]/customers.csv`:**

CSV with columns: `company_name,confidence,source_type,evidence_url,notes`

Write the CSV using a code block or Python script.

## Scripts Reference

| Script | Purpose | Key flags |
|--------|---------|-----------|
| `scrape_website_logos.py` | Extract logos from current website | `--url`, `--output json\|summary` |
| `scrape_wayback_logos.py` | Find historical logos via Wayback Machine | `--url`, `--paths`, `--output json\|summary` |
| `search_builtwith.py` | BuiltWith technology detection (deep mode) | `--technology`, `--max-results`, `--output json\|summary` |

All scripts require `requests`: `pip3 install requests`

External skill scripts (use if available):
- `skills/capabilities/review-scraper/scripts/scrape_reviews.py` — G2/Capterra/Trustpilot reviews (requires Apify token)
- `skills/capabilities/linkedin-post-research/scripts/search_posts.py` — LinkedIn post search (requires Crustdata API key)

## Cost

- **Quick / Standard:** Free (uses WebSearch + free APIs like Wayback Machine CDX)
- **Deep:** Mostly free. BuiltWith paid API is optional (`--api-key` flag); free scraping is used by default.
- External skills (review-scraper, linkedin-post-research) may require paid API tokens.

Related Skills

lead-discovery

381

from gooseworks-ai/goose-skills

Orchestrator that runs first for lead generation requests. Gathers business context via website analysis or questions, identifies competitors, builds ICP, and routes to signal skills with pre-filled inputs.

voice-of-customer-synthesizer

381

from gooseworks-ai/goose-skills

Aggregate customer feedback from multiple sources — support tickets, NPS comments, Slack messages, G2 reviews, call transcripts, survey responses — into a unified VoC report with theme clustering, sentiment analysis, trend detection, and actionable recommendations for product, marketing, and CS teams. Chains review-scraper for public review data.

customer-win-back-sequencer

381

from gooseworks-ai/goose-skills

For churned accounts, research what has changed since they left — new funding, team growth, competitor dissatisfaction, product updates that address their pain — then assess re-engagement potential and generate a personalized win-back email sequence with timing recommendations. Chains web research and LinkedIn monitoring with email sequence generation.

customer-story-builder

381

from gooseworks-ai/goose-skills

Take raw customer inputs — interview transcripts, survey responses, Slack quotes, support tickets, review excerpts — and generate a structured case study draft with problem/solution/result narrative, pull-quotes, metric callouts, and multi-format outputs (full case study, one-pager, social proof snippet, sales deck slide). Pure reasoning skill. Use when a product marketing team has customer signal but no time to write the story.

linkedin-influencer-discovery

381

from gooseworks-ai/goose-skills

Discover top LinkedIn influencers and voices by topic, industry, follower count, and country. Use when you need to find the top 100 voices in a space, build influencer lists for outreach, or identify thought leaders on LinkedIn.

kol-discovery

381

from gooseworks-ai/goose-skills

Find Key Opinion Leaders (KOLs) in a given domain by combining web research with LinkedIn post search. Given a company/idea and target domain, generates authority keywords, searches LinkedIn posts to find prolific authors with high engagement, and merges with web-researched influencers. Use when someone wants to "find influencers in X space" or "who are the KOLs for Y industry."

signal-detection-pipeline

381

from gooseworks-ai/goose-skills

Detect buying signals from multiple sources, qualify leads, and generate outreach context

seo-content-engine

381

from gooseworks-ai/goose-skills

Build and run an SEO content engine: audit current state, identify gaps, build keyword architecture, generate content calendar, draft content.

outbound-prospecting-engine

381

from gooseworks-ai/goose-skills

End-to-end outbound prospecting: detect intent signals, research companies, find decision-maker contacts, personalize messaging, launch campaign.

event-prospecting-pipeline

381

from gooseworks-ai/goose-skills

Find attendees at conferences/events, research their companies, qualify against ICP, and launch outreach

competitor-monitoring-system

381

from gooseworks-ai/goose-skills

Set up and run ongoing competitive intelligence monitoring for a client. Tracks competitor content, ads, reviews, social, and product moves.

client-packet-engine

381

from gooseworks-ai/goose-skills

Batch client packet generator. Takes company names/URLs, runs intelligence + strategy generation, presents strategies for human selection, executes selected strategies in pitch-packet mode (no live campaigns or paid enrichment), and packages into local delivery packets.