conference-speaker-scraper
Extract speaker names, titles, companies, and bios from conference websites. Supports direct HTML scraping and Apify web scraper fallback for JS-heavy sites. Use for pre-event research and outreach targeting.
Best use case
conference-speaker-scraper is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Extract speaker names, titles, companies, and bios from conference websites. Supports direct HTML scraping and Apify web scraper fallback for JS-heavy sites. Use for pre-event research and outreach targeting.
Teams using conference-speaker-scraper should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/conference-speaker-scraper/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How conference-speaker-scraper Compares
| Feature / Agent | conference-speaker-scraper | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Extract speaker names, titles, companies, and bios from conference websites. Supports direct HTML scraping and Apify web scraper fallback for JS-heavy sites. Use for pre-event research and outreach targeting.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Conference Speaker Scraper
Extract speaker names, titles, companies, and bios from conference website /speakers pages. Supports direct HTML scraping with multiple extraction strategies, plus Apify fallback for JS-heavy sites.
## Quick Start
No API key needed for direct scraping mode.
```bash
# Scrape speakers from a conference page
python3 skills/conference-speaker-scraper/scripts/scrape_speakers.py \
--url "https://example.com/speakers"
# Use Apify for JS-heavy sites
python3 skills/conference-speaker-scraper/scripts/scrape_speakers.py \
--url "https://example.com/speakers" --mode apify
# Custom conference name (otherwise inferred from URL)
python3 skills/conference-speaker-scraper/scripts/scrape_speakers.py \
--url "https://example.com/speakers" --conference "Sage Future 2026"
# Output formats
python3 skills/conference-speaker-scraper/scripts/scrape_speakers.py --url URL --output json # default
python3 skills/conference-speaker-scraper/scripts/scrape_speakers.py --url URL --output csv
python3 skills/conference-speaker-scraper/scripts/scrape_speakers.py --url URL --output summary
```
## How It Works
### Direct Mode (default)
Fetches the page HTML and tries multiple extraction strategies in order, using whichever returns the most results:
1. **Strategy A -- CSS class hints:** Looks for speaker cards with class names containing "speaker", "presenter", "faculty", "panelist", "team-member"
2. **Strategy B -- Heading + paragraph patterns:** Looks for repeated `<h2>`/`<h3>` + `<p>` structures
3. **Strategy C -- JSON-LD structured data:** Checks for `<script type="application/ld+json">` with speaker data
4. **Strategy D -- Platform embeds:** Detects Sched.com/Sessionize patterns used by many conferences
### Apify Mode
Uses `apify/cheerio-scraper` actor with a custom page function that targets common speaker card selectors. Standard POST/poll/GET dataset pattern.
## CLI Reference
| Flag | Default | Description |
|------|---------|-------------|
| `--url` | *required* | Conference speakers page URL |
| `--conference` | inferred | Conference name (otherwise inferred from URL domain) |
| `--mode` | direct | `direct` (HTML scraping) or `apify` (Apify cheerio scraper) |
| `--output` | json | Output format: `json`, `csv`, or `summary` |
| `--token` | env var | Apify token (only needed for apify mode) |
| `--timeout` | 300 | Max seconds for Apify run |
## Output Schema
```json
{
"name": "Jane Smith",
"title": "VP of Finance",
"company": "Acme Corp",
"bio": "Jane leads the finance transformation at...",
"linkedin_url": "https://linkedin.com/in/janesmith",
"image_url": "https://...",
"conference": "Sage Future 2026",
"source_url": "https://sagefuture2026.com/speakers"
}
```
## Cost
- **Direct mode:** Free (no API, no tokens)
- **Apify mode:** Uses `apify/cheerio-scraper` -- minimal Apify credits
## Testing Notes
HTML scraping is inherently fragile across conference sites. The multi-strategy approach maximizes coverage, but JS-heavy sites will require Apify mode. When direct scraping returns 0 results, try `--mode apify`.Related Skills
twitter-scraper
Search and scrape Twitter/X posts using Apify. Use when you need to find tweets, track brand mentions, monitor competitors on Twitter, or analyze Twitter discussions. Uses Twitter native search syntax (since:/until:) for reliable date filtering.
review-scraper
Scrape product reviews from G2, Capterra, and Trustpilot using Apify. Single script with platform dispatch. Use when you need to monitor competitor reviews, track product sentiment, or gather customer feedback from review sites.
reddit-scraper
Scrape and search Reddit posts using Apify. Use when you need to find Reddit discussions, track competitor mentions, monitor product feedback, discover pain points, or analyze subreddit content. Supports keyword filtering, time-based searches, and subreddit-specific queries.
meta-ad-scraper
Scrape competitor ads from Meta's Ad Library (Facebook, Instagram, Messenger, Threads, WhatsApp). Search by company name, Facebook Page URL, or keyword. Returns ad creatives, spend estimates, reach, impressions, and campaign details. Use for competitive ad research, messaging analysis, and creative inspiration.
blog-scraper
Scrape blog posts via RSS feeds (free, no API key) with Apify fallback for JS-heavy sites. Use when you need to monitor competitor blogs, track industry content, or aggregate blog posts by keyword.
web-archive-scraper
Search the Wayback Machine for archived versions of websites. Extract cached pages, customer lists, testimonials, and partner directories from sites that have changed or gone offline. Uses the free CDX API — no API key needed.
review-site-scraper
Scrape product reviews from G2, Capterra, and Trustpilot using Apify. Single script with platform dispatch. Use when you need to monitor competitor reviews, track product sentiment, or gather customer feedback from review sites.
product-hunt-scraper
Scrape Product Hunt trending products using Apify. Use when you need to discover new product launches, track competitors on Product Hunt, or monitor the startup ecosystem for relevant launches.
orthogonal-linkedin-scraper
Get LinkedIn profiles, company pages, and posts
linkedin-profile-post-scraper
Scrape recent posts from LinkedIn profiles using Apify. Use when you need to monitor what specific people are posting on LinkedIn, track founder/exec activity, or gather LinkedIn content for competitive intelligence.
linkedin-job-scraper
Scrapes LinkedIn job postings using the JobSpy library (python-jobspy). Use this skill whenever the user wants to find jobs on LinkedIn, search for open roles, pull job listings, build a job pipeline, source job targets for GTM research, or monitor hiring signals. Even if the user just says "find me some jobs" or "what roles is [company] hiring for", use this skill. It runs a local Python script that outputs a CSV of job postings with title, company, location, salary, job type, description, and direct URLs.
job-scraper
Search for job postings across LinkedIn and Indeed. Use when users want to find open roles, monitor hiring signals, identify companies hiring for specific positions, or research competitor hiring activity. Returns job title, company, location, salary, description, seniority level, and direct apply URLs. No login or cookies required.