github-repo-signals

Extract and score leads from GitHub repositories by analyzing stars, forks, issues, PRs, comments, and contributions. Produces unified multi-repo CSV with deduplicated user profiles. No paid API credits required.

380 stars

Best use case

github-repo-signals is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Extract and score leads from GitHub repositories by analyzing stars, forks, issues, PRs, comments, and contributions. Produces unified multi-repo CSV with deduplicated user profiles. No paid API credits required.

Teams using github-repo-signals should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/github-repo-signals/SKILL.md --create-dirs "https://raw.githubusercontent.com/gooseworks-ai/goose-skills/main/skills/packs/lead-gen-devtools/github-repo-signals/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/github-repo-signals/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How github-repo-signals Compares

Feature / Agentgithub-repo-signalsStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Extract and score leads from GitHub repositories by analyzing stars, forks, issues, PRs, comments, and contributions. Produces unified multi-repo CSV with deduplicated user profiles. No paid API credits required.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# GitHub Repository Signals

Extract high-intent leads from one or more GitHub repositories by analyzing every type of user interaction. This skill uses only free GitHub API data — no enrichment credits are spent.

## When to Use

- User wants to find leads from open-source GitHub repositories
- User wants to identify people who interact with competitor or category repos
- User wants cross-repo interaction analysis to find high-intent prospects
- User asks for GitHub-based lead generation without paid enrichment
- User says their ICP, target audience, or buyers are developers, engineers, or technical people who are active on GitHub
- User describes prospects who use open-source tools, contribute to open source, or build with specific technologies — and those technologies have public GitHub repos
- User wants to find leads in a technical space (e.g., "real-time communication", "AI agents", "infrastructure") where the community congregates around GitHub repositories

**Note:** If the user describes their ICP as GitHub-active but hasn't identified specific repositories yet, this skill still applies. In that case, ask the user which repositories their ICP is likely to interact with, or help them identify relevant repos based on the technology/space they describe.

## Prerequisites

- `gh` CLI authenticated (`gh auth status` to verify)
- Python 3.9+ with `PyYAML` installed
- Working directory: the project root containing this skill

## Inputs to Collect from User

Before running, ask the user for:

1. **Repositories** (required): One or more GitHub repository URLs or `owner/repo` strings
2. **User limit** (required): How many top users to include in the output. Explain that more users = longer runtime due to GitHub profile fetching (~5,000 profiles/hour). Suggest 500 as a good starting point for testing.

## Execution Steps

### Step 1: Verify Environment

```bash
gh auth status
```

### Step 2: Run the Tool

```bash
python3 ${CLAUDE_SKILL_DIR}/scripts/gh_repo_signals.py \
    --repos "owner1/repo1,owner2/repo2" \
    --limit <USER_LIMIT> \
    --output ${CLAUDE_SKILL_DIR}/../.tmp/repo_signals.csv
```

Replace the repos and limit with user-provided values.

The tool will:
1. **Extract** all interaction types per repo (stars, forks, contributors, issues, PRs, comments, watchers, commit emails)
2. **Filter out** bots and org members automatically (fetches org member lists and detects org email domains)
3. **Score** each user by interaction depth using these weights:
   - Issue opener: 5 points
   - PR author: 5 points
   - Contributor: 4 points
   - Issue commenter: 3 points
   - Forker: 3 points
   - Watcher: 2 points
   - Stargazer: 1 point
4. **Rank** users by (repos_interacted desc, total_score desc) — multi-repo users surface first
5. **Fetch** GitHub profiles for the top N users (name, email, company, location, blog, twitter, bio, followers)
6. **Export** two CSV files: `_users.csv` and `_interactions.csv`

### Step 3: Review Output

The tool produces two CSV files:

**`repo_signals_users.csv`** — One row per person, deduplicated across all repos
| Column | Description |
|--------|-------------|
| username | GitHub login |
| name | Display name |
| email | Public GitHub email |
| commit_email | Email from git commits (if different from public) |
| company | Company from GitHub profile |
| location | Location from GitHub profile |
| blog | Website/blog URL |
| twitter | Twitter/X handle |
| bio | GitHub bio |
| followers | Follower count |
| public_repos | Number of public repos |
| total_repos_interacted | Number of input repos this user interacted with |
| interaction_score | Weighted score across all repos |

**`repo_signals_interactions.csv`** — One row per user x repo combination
| Column | Description |
|--------|-------------|
| username | GitHub login |
| repository | Which repo this row is about |
| is_contributor | YES/NO |
| is_stargazer | YES/NO |
| is_forker | YES/NO |
| is_watcher | YES/NO |
| is_issue_opener | YES/NO |
| is_pr_author | YES/NO |
| is_issue_commenter | YES/NO |
| contribution_count | Number of commits (0 if not contributor) |
| starred_at | Date starred (if applicable) |
| forked_at | Date forked (if applicable) |
| repo_score | Interaction score for this specific repo |

## Phase 3: Analyze & Recommend

Once the CSV files are generated, **do not stop**. Immediately proceed to analyze the data and brief the user.

### Step 5: Collect Company Context

Check if you already know the user's company and intent from prior conversation. If not, ask:

> "Before I analyze these results, I need to understand who you're finding leads for:
> 1. **What does your company/product do?** (one-liner is fine)
> 2. **Who is your ideal customer?** (role, company size, industry, tech stack — whatever is relevant)
> 3. **What's the goal for these leads?** (outbound sales, partnership, hiring, community building, etc.)"

Do NOT proceed to analysis until you have this context. It directly shapes the recommendations.

### Step 6: Analyze the Data

Read the generated .csv file and compute the following analysis. Present it to the user as a structured briefing.

**6a. Overall Stats**
- Total users in the sheet
- Score distribution (how many at 15+, 10-14, below 10)
- Email coverage: how many have any email (public or commit)
- Company coverage: how many have a company listed

**6b. Multi-Repo Users (if multiple repos were scanned)**
- How many users interacted with 2+ repos
- List the top 10 multi-repo users with their names, companies, and which repos they touched
- This is the highest-signal segment — call it out explicitly

**6c. Top Companies**
- Extract all company names from the Users sheet
- Group users by company (normalize company names — strip @, leading/trailing whitespace, lowercase comparison)
- List the top 15 companies by number of engaged users
- For each, note how many users, their average score, and which interaction types are most common
- Flag companies with 3+ engaged users as "organizational adoption signals"

**6d. Interaction Patterns**
- How many users are issue openers (highest intent)
- How many are PR authors (deep practitioners)
- How many are stargazer-only (lowest signal)
- Any notable patterns (e.g., a burst of recent stars, many forkers from one company)

**6e. Data Gaps**
- What percentage lack email — this determines enrichment priority
- What percentage lack company — affects ability to do company-level targeting
- How many have a blog/website or twitter that could help with manual research

### Step 7: Recommend Next Steps

Based on the analysis AND the user's company context/intent, recommend specific next steps. Tailor recommendations to what the data actually shows — do not give generic advice.

**Framework for recommendations:**

1. **If multi-repo users exist (2+ repos):**
   - These are the #1 priority segment. Recommend enriching them first.
   - Estimate credit cost: N users x cost per enrichment call.

2. **If company clusters exist (3+ users from same company):**
   - Recommend company-level enrichment via SixtyFour `/enrich-company`
   - Then use `/enrich-lead` to find the decision-maker at those companies (not the developer who starred — the person who signs off on purchases)
   - This is the "find the buyer, not the user" play

3. **If high email coverage (>40%):**
   - Can start outreach directly for users with emails
   - Recommend SixtyFour `/qa-agent` to qualify them against ICP before reaching out
   - Suggest segmenting by interaction type for personalized outreach (issue openers get a different message than stargazers)

4. **If low email coverage (<40%):**
   - Recommend SixtyFour `/find-email` for the top-scored users first
   - Estimate cost: N users x $0.05 (professional) or $0.20 (personal)
   - Suggest starting with a small batch (50-100) to validate quality before scaling

5. **If the user's goal is outbound sales:**
   - Prioritize: company clusters -> multi-repo users -> issue openers -> PR authors -> forkers -> stargazers
   - Recommend enriching companies first, then finding decision-makers
   - Suggest personalization angles based on interaction type (e.g., "I noticed your team has been active in the [repo] community...")

6. **If the user's goal is community/partnerships:**
   - Prioritize: PR authors -> contributors -> issue commenters who help others
   - These are potential advocates, not just buyers

7. **Always include a cost estimate:**
   - Break down what each enrichment step would cost
   - Suggest a phased approach: start small, validate, then scale

**Format the recommendation as a clear action plan with numbered steps, estimated costs, and expected outcomes.**

### Step 8: Ask for Go-Ahead

After presenting the analysis and recommendations, ask:

> "Would you like me to proceed with any of these steps? I can start with [recommended first action] — it would cost approximately [estimate] and take [time estimate]."

Wait for user confirmation before spending any credits or running enrichment tools.

## Output Interpretation Reference

- **total_repos_interacted > 1**: High-intent signal — user engages with multiple repos in the same category
- **interaction_score >= 15**: Deep engagement — multiple interaction types
- **is_issue_opener = YES**: Active user with real use case and pain points
- **is_pr_author = YES (non-org member)**: Technical practitioner invested in the ecosystem
- **is_forker = YES**: Taking code to build something — stronger than starring
- **is_stargazer only**: Lowest signal — casual interest

## Rate Limits & Runtime Estimates

- GitHub API: 5,000 requests/hour for authenticated users
- Each repo extraction uses ~500-2,000 API calls depending on repo size
- Profile fetching: 1 API call per user
- **Estimate for 1 repo, 500 users**: ~15-30 minutes
- **Estimate for 3 repos, 500 users**: ~45-90 minutes

Related Skills

event-signals

380
from gooseworks-ai/goose-skills

Extract leads from conferences, meetups, hackathons, and podcasts by analyzing speaker lists, sponsor lists, hackathon entries, and podcast guests. Discovers events via Sessionize, Confs.tech, Meetup, Luma, ListenNotes, and Devpost. Looks back 90 days and forward 180 days.

competitor-signals

380
from gooseworks-ai/goose-skills

Extract leads from competitor product activity — Product Hunt commenters/upvoters, HN posts about competitors, case studies, testimonials, tech press, and switching signals. Detects people actively switching from competitors as highest-priority leads.

community-signals

380
from gooseworks-ai/goose-skills

Extract leads from developer forums (Hacker News, Reddit) by detecting intent signals — alternative seeking, competitor pain, scaling challenges, DIY solutions, and migration intent. Scores users by intent strength and cross-platform presence.

competitor-monitoring-system

381
from gooseworks-ai/goose-skills

Set up and run ongoing competitive intelligence monitoring for a client. Tracks competitor content, ads, reviews, social, and product moves.

client-packet-engine

381
from gooseworks-ai/goose-skills

Batch client packet generator. Takes company names/URLs, runs intelligence + strategy generation, presents strategies for human selection, executes selected strategies in pitch-packet mode (no live campaigns or paid enrichment), and packages into local delivery packets.

client-package-notion

381
from gooseworks-ai/goose-skills

Package all work done for a client into a shareable Notion page with subpages and Google Sheets. Reads the client's folder (strategies, campaigns, content, leads, notes) and builds a structured Notion workspace the client can browse. Lead list CSVs are uploaded to Google Sheets and linked from the Notion pages. Use when you want to deliver work to a client in a polished, navigable format.

client-package-local

381
from gooseworks-ai/goose-skills

Package all work done for a client into a local filesystem delivery package with .md files and Google Sheets. Reads the client's folder (strategies, campaigns, content, leads, notes) and builds a structured directory with dated deliverables. Lead lists are uploaded to Google Sheets and linked from the markdown files. Use when you want to deliver work to a client in a polished, navigable format without requiring Notion.

client-onboarding

381
from gooseworks-ai/goose-skills

Full client onboarding: intelligence gathering, synthesis into Client Intelligence Package, and growth strategy generation. Phases 1-3 of the Client Launch Playbook.

lead-discovery

381
from gooseworks-ai/goose-skills

Orchestrator that runs first for lead generation requests. Gathers business context via website analysis or questions, identifies competitors, builds ICP, and routes to signal skills with pre-filled inputs.

serp-feature-sniper

381
from gooseworks-ai/goose-skills

Analyze SERP features per keyword (featured snippets, PAA, video carousels, knowledge panels, image packs) and produce optimized content structures to win them. Identifies which features are winnable, who currently holds them, and exactly how to format your content to steal them.

search-ad-keyword-architect

381
from gooseworks-ai/goose-skills

Deep keyword research for paid search. Analyzes competitor SEO keywords, review language, Reddit/community terminology, and existing site content to build a keyword architecture: branded vs non-branded, funnel stage mapping, match type recommendations, and estimated competition tiers. Use before building a Google Ads campaign or to audit an existing one.

sales-performance-review

381
from gooseworks-ai/goose-skills

Periodic sales performance review composite. Aggregates ALL sales initiatives taken in a given period — outbound campaigns, inbound efforts, events, partnerships, content, referrals — and measures the impact of each on pipeline and revenue. Produces a team-presentable report covering initiative-level performance, cross-initiative comparisons, pipeline attribution, what's working, what's not, and where to invest next. Tool-agnostic — pulls data from any combination of CRM, outreach tools, and tracking systems.