customer-discovery

Discover all customers of a given company by scanning websites, case studies, review sites, press, social media, job postings, and more. Use when you need competitive intelligence on who a company sells to.

381 stars

Best use case

customer-discovery is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Discover all customers of a given company by scanning websites, case studies, review sites, press, social media, job postings, and more. Use when you need competitive intelligence on who a company sells to.

Teams using customer-discovery should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/customer-discovery/SKILL.md --create-dirs "https://raw.githubusercontent.com/gooseworks-ai/goose-skills/main/skills/capabilities/customer-discovery/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/customer-discovery/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How customer-discovery Compares

Feature / Agentcustomer-discoveryStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Discover all customers of a given company by scanning websites, case studies, review sites, press, social media, job postings, and more. Use when you need competitive intelligence on who a company sells to.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Customer Discovery

Find all customers of a company by scanning multiple public data sources. Produces a deduplicated report with confidence scoring.

## Quick Start

```
Find all customers of Datadog
```

```
Who are Notion's customers? Use deep mode.
```

## Inputs

| Input | Required | Default | Description |
|-------|----------|---------|-------------|
| Company name | Yes | — | The company to research |
| Website URL | No | Auto-detected | The company's website URL |
| Depth | No | standard | `quick`, `standard`, or `deep` |

## Procedure

### Step 1: Gather Inputs

Ask the user for:
1. **Company name** (required)
2. **Company website URL** (optional — if not provided, WebSearch for it)
3. **Depth tier** — present these options, default to Standard:
   - **Quick** (~2-3 min): Website logos, case studies, G2 reviews, press search
   - **Standard** (~5-8 min): Quick + blog posts, Wayback Machine, LinkedIn, Twitter, Reddit, HN, job postings, YouTube
   - **Deep** (~10-15 min): Standard + SEC filings, podcasts, GitHub, integration directories, BuiltWith, Crunchbase

### Step 2: Create Output Directory

```bash
mkdir -p customer-discovery-[company-slug]
```

### Step 3: Run Sources for Selected Tier

Collect all results into a running list. For each customer found, record:
- **name**: Company name
- **confidence**: high / medium / low
- **source_type**: e.g., "logo_wall", "case_study", "g2_review", "press", "job_posting"
- **evidence_url**: URL where the evidence was found
- **notes**: Brief description of the evidence

#### Quick Sources

**1. Website logo wall**

Run the scrape_website_logos.py script:
```bash
python3 skills/capabilities/customer-discovery/scripts/scrape_website_logos.py \
  --url "[company-url]" --output json
```

Parse the JSON output and add each result to the customer list.

**2. Case studies page**

Use WebFetch on the company's case studies page (try `/case-studies`, `/customers`, `/resources/case-studies`). Extract customer names from page headings and content.

**3. G2/Capterra reviews**

If the `review-scraper` skill is available, use it to find reviewer companies:
```bash
python3 skills/capabilities/review-scraper/scripts/scrape_reviews.py \
  --platform g2 --url "[g2-product-url]" --max-reviews 50 --output json
```

First, WebSearch for the company's G2 page: `site:g2.com "[company]"`. Extract reviewer company names from review author info.

**4. Web search for press**

WebSearch these queries and extract customer mentions from results:
- `"[company]" customer OR "case study" OR partnership`
- `"[company]" "we use" OR "switched to" OR "chose"`

#### Standard Sources (in addition to Quick)

**5. Company blog posts**

WebSearch: `site:[company-domain] customer OR "case study" OR partnership OR "customer story"`

**6. Wayback Machine logos**

Run the scrape_wayback_logos.py script:
```bash
python3 skills/capabilities/customer-discovery/scripts/scrape_wayback_logos.py \
  --url "[company-url]" --output json
```

Logos marked `still_present: false` are especially interesting — they indicate former customers.

**7. Founder/exec LinkedIn posts**

WebSearch: `site:linkedin.com "[company]" customer OR "excited to announce" OR "welcome"`

**8. Twitter/X mentions**

WebSearch: `site:twitter.com "[company]" "we use" OR "just switched to" OR "loving"`

**9. Reddit/HN mentions**

WebSearch these queries:
- `site:reddit.com "we use [company]" OR "[company] customer"`
- `site:news.ycombinator.com "[company]" customer OR user`

**10. Job postings**

WebSearch: `"experience with [company]" site:linkedin.com/jobs OR site:greenhouse.io OR site:lever.co`

Companies requiring experience with the product are likely customers.

**11. YouTube testimonials**

WebSearch: `site:youtube.com "[company]" customer OR testimonial OR review`

#### Deep Sources (in addition to Standard)

**12. SEC filings**

WebSearch: `site:sec.gov "[company]"` — Look for mentions in 10-K and 10-Q filings.

**13. Podcast transcripts**

WebSearch: `"[company]" podcast customer OR transcript OR interview`

**14. GitHub usage signals**

WebSearch: `site:github.com "[company-package-name]"` in dependency files, package.json, requirements.txt, etc.

**15. Integration directories**

WebFetch marketplace pages where the company lists integrations:
- Salesforce AppExchange
- Zapier integrations page
- Slack App Directory
- Any marketplace relevant to the company

**16. BuiltWith detection**

```bash
python3 skills/capabilities/customer-discovery/scripts/search_builtwith.py \
  --technology "[company-slug]" --max-results 50 --output json
```

**17. Crunchbase**

WebSearch: `site:crunchbase.com "[company]" customers OR partners`

### Step 4: Deduplicate Results

Merge results by company name using fuzzy matching:
- Normalize: lowercase, strip suffixes (Inc, Corp, LLC, Ltd, Co., GmbH)
- Treat "Acme Inc" = "Acme" = "ACME Corp" = "acme.com" as the same company
- When merging, keep the highest confidence level and all evidence URLs

### Step 5: Assign Confidence

Apply these rules:

**High confidence:**
- Logo on current website (from scrape_website_logos.py with confidence "high")
- Published case study or customer story
- Direct quote or testimonial on the company's site
- Official partnership page listing

**Medium confidence:**
- G2/Capterra review (reviewer's company)
- Press article mentioning customer relationship
- Job posting requiring experience with the product
- YouTube testimonial or video review
- Logo found only in Wayback Machine (was on site, now removed)

**Low confidence:**
- Single social media mention (tweet, Reddit post)
- Indirect reference ("heard good things about X")
- BuiltWith detection only (technology on site doesn't mean they're a paying customer)
- HN discussion mention

### Step 6: Generate Report

Create two output files:

**`customer-discovery-[company]/report.md`:**

```markdown
# Customer Discovery: [Company Name]

**Date:** YYYY-MM-DD
**Depth:** quick | standard | deep
**Total customers found:** N

## High Confidence (N)

| Customer | Source | Evidence |
|----------|--------|----------|
| Shopify | Case study | [link] |
| ... | ... | ... |

## Medium Confidence (N)

| Customer | Source | Evidence |
|----------|--------|----------|
| ... | ... | ... |

## Low Confidence (N)

| Customer | Source | Evidence |
|----------|--------|----------|
| ... | ... | ... |

## Sources Scanned

- Website logo wall: [url] — N customers found
- G2 reviews: N reviews analyzed — N companies identified
- Wayback Machine: N snapshots checked — N logos found (N removed)
- Web search: N queries — N mentions
- ...

## Methodology

This report was generated using the customer-discovery skill, which scans
public data sources to identify companies that use [Company Name]. Confidence
levels reflect the strength and directness of the evidence found.
```

**`customer-discovery-[company]/customers.csv`:**

CSV with columns: `company_name,confidence,source_type,evidence_url,notes`

Write the CSV using a code block or Python script.

## Scripts Reference

| Script | Purpose | Key flags |
|--------|---------|-----------|
| `scrape_website_logos.py` | Extract logos from current website | `--url`, `--output json\|summary` |
| `scrape_wayback_logos.py` | Find historical logos via Wayback Machine | `--url`, `--paths`, `--output json\|summary` |
| `search_builtwith.py` | BuiltWith technology detection (deep mode) | `--technology`, `--max-results`, `--output json\|summary` |

All scripts require `requests`: `pip3 install requests`

External skill scripts (use if available):
- `skills/capabilities/review-scraper/scripts/scrape_reviews.py` — G2/Capterra/Trustpilot reviews (requires Apify token)
- `skills/capabilities/linkedin-post-research/scripts/search_posts.py` — LinkedIn post search (requires Crustdata API key)

## Cost

- **Quick / Standard:** Free (uses WebSearch + free APIs like Wayback Machine CDX)
- **Deep:** Mostly free. BuiltWith paid API is optional (`--api-key` flag); free scraping is used by default.
- External skills (review-scraper, linkedin-post-research) may require paid API tokens.

Related Skills

lead-discovery

381
from gooseworks-ai/goose-skills

Orchestrator that runs first for lead generation requests. Gathers business context via website analysis or questions, identifies competitors, builds ICP, and routes to signal skills with pre-filled inputs.

voice-of-customer-synthesizer

381
from gooseworks-ai/goose-skills

Aggregate customer feedback from multiple sources — support tickets, NPS comments, Slack messages, G2 reviews, call transcripts, survey responses — into a unified VoC report with theme clustering, sentiment analysis, trend detection, and actionable recommendations for product, marketing, and CS teams. Chains review-scraper for public review data.

customer-win-back-sequencer

381
from gooseworks-ai/goose-skills

For churned accounts, research what has changed since they left — new funding, team growth, competitor dissatisfaction, product updates that address their pain — then assess re-engagement potential and generate a personalized win-back email sequence with timing recommendations. Chains web research and LinkedIn monitoring with email sequence generation.

customer-story-builder

381
from gooseworks-ai/goose-skills

Take raw customer inputs — interview transcripts, survey responses, Slack quotes, support tickets, review excerpts — and generate a structured case study draft with problem/solution/result narrative, pull-quotes, metric callouts, and multi-format outputs (full case study, one-pager, social proof snippet, sales deck slide). Pure reasoning skill. Use when a product marketing team has customer signal but no time to write the story.

linkedin-influencer-discovery

381
from gooseworks-ai/goose-skills

Discover top LinkedIn influencers and voices by topic, industry, follower count, and country. Use when you need to find the top 100 voices in a space, build influencer lists for outreach, or identify thought leaders on LinkedIn.

kol-discovery

381
from gooseworks-ai/goose-skills

Find Key Opinion Leaders (KOLs) in a given domain by combining web research with LinkedIn post search. Given a company/idea and target domain, generates authority keywords, searches LinkedIn posts to find prolific authors with high engagement, and merges with web-researched influencers. Use when someone wants to "find influencers in X space" or "who are the KOLs for Y industry."

signal-detection-pipeline

381
from gooseworks-ai/goose-skills

Detect buying signals from multiple sources, qualify leads, and generate outreach context

seo-content-engine

381
from gooseworks-ai/goose-skills

Build and run an SEO content engine: audit current state, identify gaps, build keyword architecture, generate content calendar, draft content.

outbound-prospecting-engine

381
from gooseworks-ai/goose-skills

End-to-end outbound prospecting: detect intent signals, research companies, find decision-maker contacts, personalize messaging, launch campaign.

event-prospecting-pipeline

381
from gooseworks-ai/goose-skills

Find attendees at conferences/events, research their companies, qualify against ICP, and launch outreach

competitor-monitoring-system

381
from gooseworks-ai/goose-skills

Set up and run ongoing competitive intelligence monitoring for a client. Tracks competitor content, ads, reviews, social, and product moves.

client-packet-engine

381
from gooseworks-ai/goose-skills

Batch client packet generator. Takes company names/URLs, runs intelligence + strategy generation, presents strategies for human selection, executes selected strategies in pitch-packet mode (no live campaigns or paid enrichment), and packages into local delivery packets.