data-sourcing

Optimize provider selection, routing, and credit usage across 150+ enrichment sources for company/contact intelligence.

153 stars

byMicrock

View on GitHub Installation ↓

Best use case

data-sourcing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Optimize provider selection, routing, and credit usage across 150+ enrichment sources for company/contact intelligence.

Teams using data-sourcing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/data-sourcing/SKILL.md --create-dirs "https://raw.githubusercontent.com/Microck/ordinary-claude-skills/main/skills_all/data-sourcing/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/data-sourcing/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How data-sourcing Compares

Feature / Agent	data-sourcing	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Optimize provider selection, routing, and credit usage across 150+ enrichment sources for company/contact intelligence.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Data Sourcing & Provider Optimization Skill

## When to Use

- Selecting provider stacks for email, phone, company, or intent enrichment
- Building or tuning waterfall sequences to improve success rates
- Auditing credit consumption or provider performance
- Designing enrichment logic for GTM ops, RevOps, or data engineering teams

## Framework

You are an expert at selecting and optimizing data providers from 150+ available options to maximize data quality while minimizing credit costs. Use this layered framework to keep enrichment predictable and efficient.

### Core Principles

1. **Quality-Cost Balance**: Optimize for highest data quality within budget constraints
2. **Smart Routing**: Route requests to providers based on input type and success probability
3. **Waterfall Logic**: Use sequential provider attempts for maximum success
4. **Caching Strategy**: Leverage cached data to reduce redundant API calls
5. **Bulk Optimization**: Process similar requests together for volume discounts

### Provider Selection Matrix

#### For Email Discovery

**Best Input Scenarios:**
- **Have LinkedIn URL**: ContactOut → RocketReach → Apollo
- **Have Name + Company**: Apollo → Hunter → RocketReach → FindyMail
- **Have Domain Only**: Hunter → Apollo → Clearbit
- **Have Email (need validation)**: ZeroBounce → NeverBounce → Debounce

**Quality Tiers:**
- **Premium** (90%+ success): ZoomInfo, BetterContact waterfall
- **Standard** (75%+ success): Apollo, Hunter, RocketReach
- **Budget** (60%+ success): Snov.io, Prospeo, ContactOut

#### For Company Intelligence

**Data Type Priority:**
- **Basic Firmographics**: Clearbit (fastest) → Ocean.io → Apollo
- **Financial Data**: Crunchbase → PitchBook → Dealroom
- **Technology Stack**: BuiltWith → HG Insights → Clearbit
- **Intent Signals**: B2D AI → ZoomInfo Intent → 6sense
- **News & Social**: Google News → Social platforms → Owler

**Industry Specialization:**
- **Startups**: Crunchbase, Dealroom, AngelList
- **Enterprise**: ZoomInfo, D&B, HG Insights
- **E-commerce**: Store Leads, BuiltWith, Shopify data
- **Healthcare**: Definitive Healthcare + compliance providers
- **Financial Services**: PitchBook, S&P Capital IQ

### Credit Optimization Strategies

#### Cost Tiers
```
Tier 0 (Free): Native operations, cached data, manual inputs
Tier 1 (0.5 credits): Validation, verification, basic lookups
Tier 2 (1-2 credits): Standard enrichments (Apollo, Hunter, Clearbit)
Tier 3 (2-3 credits): Premium data (ZoomInfo, technographics, intent)
Tier 4 (3-5 credits): Enterprise intelligence (PitchBook, custom AI)
Tier 5 (5-10 credits): Specialized services (video generation, deep AI research)
```

#### Optimization Tactics

**1. Cache Everything**
- Email: 30-day cache
- Company: 90-day cache
- Intent: 7-day cache
- Static data: Indefinite cache

**2. Batch Processing**
```python
# Process in batches for volume discounts
if record_count > 1000:
    use_provider("apollo_bulk")  # 10-30% discount
elif record_count > 100:
    use_parallel_processing()
else:
    use_standard_processing()
```

**3. Smart Waterfalls**
```python
waterfall_sequence = [
    {"provider": "cache", "credits": 0},
    {"provider": "apollo", "credits": 1.5, "stop_if_success": True},
    {"provider": "hunter", "credits": 1.2, "stop_if_success": True},
    {"provider": "bettercontact", "credits": 3, "stop_if_success": True},
    {"provider": "ai_research", "credits": 5, "last_resort": True}
]
```

### Provider-Specific Optimizations

#### Apollo.io
- **Strengths**: US B2B, LinkedIn data, phone numbers
- **Weaknesses**: International coverage, personal emails
- **Tips**: Use bulk API for 10%+ discount, batch similar companies

#### ZoomInfo
- **Strengths**: Enterprise data, org charts, intent signals
- **Weaknesses**: Expensive, SMB coverage
- **Tips**: Reserve for high-value accounts, negotiate enterprise deals

#### Hunter
- **Strengths**: Domain searches, email patterns, API reliability
- **Weaknesses**: Phone numbers, detailed contact info
- **Tips**: Best for initial domain exploration, use pattern detection

#### Clearbit
- **Strengths**: Real-time API, company data, speed
- **Weaknesses**: Email discovery rates, phone numbers
- **Tips**: Great for instant enrichment, combine with others for contacts

#### BuiltWith
- **Strengths**: Technology detection, historical data, e-commerce
- **Weaknesses**: Contact information, company financials
- **Tips**: Filter accounts by technology before enrichment

### Waterfall Strategies

#### Maximum Success Waterfall
```yaml
Priority: Success rate over cost
Sequence:
  1. BetterContact (aggregates 10+ sources)
  2. ZoomInfo (if enterprise)
  3. Apollo + Hunter + RocketReach
  4. AI web research
Expected Success: 95%+
Average Cost: 8-12 credits
```

#### Balanced Waterfall
```yaml
Priority: Good success with reasonable cost
Sequence:
  1. Apollo.io
  2. Hunter (if domain match)
  3. RocketReach (if name match)
  4. Stop or continue based on confidence
Expected Success: 80%
Average Cost: 3-5 credits
```

#### Budget Waterfall
```yaml
Priority: Minimize cost
Sequence:
  1. Cache check
  2. Hunter (domain only)
  3. Free sources (Google, LinkedIn public)
  4. Stop at first result
Expected Success: 60%
Average Cost: 1-2 credits
```

### Quality Scoring Framework

```python
def calculate_data_quality_score(data, sources):
    score = 0
    
    # Multi-source validation (30 points)
    if len(sources) > 1:
        score += min(len(sources) * 10, 30)
    
    # Data completeness (30 points)
    required_fields = ["email", "phone", "title", "company"]
    score += sum(10 for field in required_fields if data.get(field))
    
    # Verification status (20 points)
    if data.get("email_verified"):
        score += 10
    if data.get("phone_verified"):
        score += 10
    
    # Recency (20 points)
    days_old = get_data_age(data)
    if days_old < 30:
        score += 20
    elif days_old < 90:
        score += 10
    
    return score
```

### Industry-Specific Provider Selection

#### SaaS/Technology
- Primary: Apollo, Clearbit, BuiltWith
- Secondary: ZoomInfo, HG Insights
- Intent: G2, TrustRadius, 6sense

#### Financial Services
- Primary: PitchBook, ZoomInfo
- Compliance: LexisNexis, D&B
- News: Bloomberg, Reuters

#### Healthcare
- Primary: Definitive Healthcare
- Compliance: NPPES, state boards
- Standard: ZoomInfo with healthcare filters

#### E-commerce
- Primary: Store Leads, BuiltWith
- Platform-specific: Shopify, Amazon seller data
- Standard: Clearbit with e-commerce signals

### Troubleshooting Common Issues

#### Low Email Discovery Rate
- Check email patterns with Hunter
- Try personal email providers
- Use AI research for executives
- Consider LinkedIn outreach instead

#### High Credit Usage
- Audit waterfall sequences
- Increase cache TTL
- Negotiate volume deals
- Use native operations first

#### Poor Data Quality
- Add verification steps
- Cross-reference multiple sources
- Set minimum confidence thresholds
- Implement human review for critical data

### Advanced Techniques

#### Hybrid Enrichment
```python
# Combine AI and traditional providers
def hybrid_enrichment(company):
    # Fast, cheap base data
    base = clearbit_lookup(company)
    
    # AI for missing pieces
    if not base.get("description"):
        base["description"] = ai_generate_description(company)
    
    # Premium for high-value
    if is_enterprise_account(base):
        base.update(zoominfo_enrich(company))
    
    return base
```

#### Progressive Enrichment
```python
# Enrich in stages based on engagement
def progressive_enrichment(lead):
    # Stage 1: Basic (on import)
    if lead.stage == "new":
        return basic_enrichment(lead)  # 1-2 credits
    
    # Stage 2: Engaged (opened email)
    elif lead.stage == "engaged":
        return standard_enrichment(lead)  # 3-5 credits
    
    # Stage 3: Qualified (booked meeting)
    elif lead.stage == "qualified":
        return comprehensive_enrichment(lead)  # 10+ credits
```

## Templates
- **Provider Cheat Sheet**: See `references/provider_cheat_sheet.md` for provider selection.
- **Cost Calculator**: See `scripts/cost_calculator.py` for estimating credit usage.
- **Integration Code Templates**:
```javascript
// JavaScript/Node.js template
const enrichContact = async (name, company) => {
  // Check cache first
  const cached = await checkCache(name, company);
  if (cached) return cached;
  
  // Try providers in sequence
  const providers = ['apollo', 'hunter', 'rocketreach'];
  
  for (const provider of providers) {
    try {
      const result = await callProvider(provider, {name, company});
      if (result.email) {
        await saveToCache(result);
        return result;
      }
    } catch (error) {
      console.log(`${provider} failed, trying next...`);
    }
  }
  
  // Fallback to AI research
  return await aiResearch(name, company);
};
```

---

## Tips

- **Pre-build waterfalls per motion** so GTM teams can call a single orchestration command rather than juggling providers.
- **Instrument cache hit rates**; alert RevOps when cache effectiveness drops below target to avoid spike in credits.
- **Rotate premium providers** each quarter to negotiate better volume discounts and diversify coverage gaps.
- **Pair enrichment with QA hooks** (e.g., verification APIs, sampling) before syncing into CRM to prevent bad data cascades.

---

*Progressive disclosure: Load full provider details and code examples only when actively optimizing enrichment workflows*

Related Skills

Validate with Database

153

from Microck/ordinary-claude-skills

Connect to live PostgreSQL database to validate schema assumptions, compare pg_dump vs pgschema output, and query system catalogs interactively

rsc-data-optimizer

153

from Microck/ordinary-claude-skills

Optimize Next.js App Router data fetching by converting slow client-side fetching to fast server-side fetching using React Server Components (RSC). Use when: - User reports slow initial page load with loading spinners - Page uses useEffect + useState for data fetching - StoreContext/useStore pattern causes waterfall fetching - Need to improve SEO (content not in initial HTML) - Converting "use client" pages to Server Components Triggers: "slow loading", "optimize fetching", "SSR data", "RSC optimization", "remove loading spinner", "server-side fetch", "convert to server component", "data fetch lambat", "loading lama"

relational-database-mcp-cloudbase

153

from Microck/ordinary-claude-skills

This is the required documentation for agents operating on the CloudBase Relational Database. It lists the only four supported tools for running SQL and managing security rules. Read the full content to understand why you must NOT use standard Application SDKs and how to safely execute INSERT, UPDATE, or DELETE operations without corrupting production data.

moai-domain-database

153

from Microck/ordinary-claude-skills

Database specialist covering PostgreSQL, MongoDB, Redis, and advanced data patterns for modern applications

databases

153

from Microck/ordinary-claude-skills

Work with MongoDB (document database, BSON documents, aggregation pipelines, Atlas cloud) and PostgreSQL (relational database, SQL queries, psql CLI, pgAdmin). Use when designing database schemas, writing queries and aggregations, optimizing indexes for performance, performing database migrations, configuring replication and sharding, implementing backup and restore strategies, managing database users and permissions, analyzing query performance, or administering production databases.

database-testing

153

from Microck/ordinary-claude-skills

Database schema validation, data integrity testing, migration testing, transaction isolation, and query performance. Use when testing data persistence, ensuring referential integrity, or validating database migrations.

database-migration

153

from Microck/ordinary-claude-skills

Execute database migrations across ORMs and platforms with zero-downtime strategies, data transformation, and rollback procedures. Use when migrating databases, changing schemas, performing data transformations, or implementing zero-downtime deployment strategies.

Database Implementation

153

from Microck/ordinary-claude-skills

Database schema design, migrations, query optimization with SQL, Exposed ORM, Flyway. Use for database, migration, schema, sql, flyway tags. Provides migration patterns, validation commands, rollback strategies.

data-viz-plots

153

from Microck/ordinary-claude-skills

Create publication-quality plots and visualizations using matplotlib and seaborn. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).

data-transform

153

from Microck/ordinary-claude-skills

Transform, clean, reshape, and preprocess data using pandas and numpy. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).

data-model-creation

153

from Microck/ordinary-claude-skills

Optional advanced tool for complex data modeling. For simple table creation, use relational-database-tool directly with SQL statements.

data-migration

153

from Microck/ordinary-claude-skills

Plan and execute database migrations, data transformations, and system migrations safely with rollback strategies and data integrity validation. Use when migrating databases, transforming data schemas, moving between database systems, implementing versioned migrations, handling data transformations, ensuring data integrity, or planning zero-downtime migrations.