databricks-ci-integration

Configure Databricks CI/CD integration with GitHub Actions and Asset Bundles. Use when setting up automated testing, configuring CI pipelines, or integrating Databricks deployments into your build process. Trigger with phrases like "databricks CI", "databricks GitHub Actions", "databricks automated tests", "CI databricks", "databricks pipeline".

25 stars

Best use case

databricks-ci-integration is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Configure Databricks CI/CD integration with GitHub Actions and Asset Bundles. Use when setting up automated testing, configuring CI pipelines, or integrating Databricks deployments into your build process. Trigger with phrases like "databricks CI", "databricks GitHub Actions", "databricks automated tests", "CI databricks", "databricks pipeline".

Teams using databricks-ci-integration should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/databricks-ci-integration/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/jeremylongshore/claude-code-plugins-plus-skills/databricks-ci-integration/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/databricks-ci-integration/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How databricks-ci-integration Compares

Feature / Agentdatabricks-ci-integrationStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Configure Databricks CI/CD integration with GitHub Actions and Asset Bundles. Use when setting up automated testing, configuring CI pipelines, or integrating Databricks deployments into your build process. Trigger with phrases like "databricks CI", "databricks GitHub Actions", "databricks automated tests", "CI databricks", "databricks pipeline".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Databricks CI Integration

## Overview
Automate Databricks deployments with Declarative Automation Bundles (DABs) and GitHub Actions. Covers bundle validation, unit testing PySpark transforms locally, deploying to staging on PR, production on merge, and integration testing against live workspaces. Uses `databricks/setup-cli` action and OAuth M2M for secure CI auth.

## Prerequisites
- Databricks workspace with service principal (OAuth M2M)
- Asset Bundle (`databricks.yml`) configured
- GitHub repo with Actions enabled
- GitHub environment secrets: `DATABRICKS_HOST`, `DATABRICKS_CLIENT_ID`, `DATABRICKS_CLIENT_SECRET`

## Instructions

### Step 1: GitHub Actions — Validate and Test on PR
```yaml
# .github/workflows/databricks-ci.yml
name: Databricks CI

on:
  pull_request:
    paths: ['src/**', 'resources/**', 'databricks.yml', 'tests/**']

jobs:
  validate-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: |
          pip install pytest pyspark delta-spark databricks-sdk
          pip install -e .  # If using pyproject.toml

      - name: Run unit tests (local Spark, no cluster needed)
        run: pytest tests/unit/ -v --tb=short

      - name: Install Databricks CLI
        uses: databricks/setup-cli@main

      - name: Validate bundle
        env:
          DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
          DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }}
          DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }}
        run: databricks bundle validate -t staging

  deploy-staging:
    needs: validate-and-test
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - uses: actions/checkout@v4
      - uses: databricks/setup-cli@main

      - name: Deploy to staging
        env:
          DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
          DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }}
          DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }}
        run: databricks bundle deploy -t staging

      - name: Run integration tests on staging
        env:
          DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
          DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }}
          DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }}
        run: |
          databricks bundle run integration_tests -t staging
          # Verify output tables
          databricks sql execute \
            --warehouse-id "$WAREHOUSE_ID" \
            --statement "SELECT COUNT(*) AS rows FROM staging_catalog.silver.orders WHERE date >= current_date() - 1"
```

### Step 2: Unit Tests for PySpark Transforms
```python
# tests/unit/test_transformations.py
import pytest
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, DoubleType

@pytest.fixture(scope="session")
def spark():
    return SparkSession.builder.master("local[*]").appName("tests").getOrCreate()

def test_silver_dedup(spark):
    """Test deduplication logic in silver layer."""
    from src.pipelines.silver import dedup_orders

    data = [
        ("order-1", "user-a", 10.0),
        ("order-1", "user-a", 10.0),  # duplicate
        ("order-2", "user-b", 20.0),
    ]
    schema = StructType([
        StructField("order_id", StringType()),
        StructField("user_id", StringType()),
        StructField("amount", DoubleType()),
    ])
    df = spark.createDataFrame(data, schema)
    result = dedup_orders(df)

    assert result.count() == 2
    assert set(r.order_id for r in result.collect()) == {"order-1", "order-2"}

def test_gold_aggregation(spark):
    """Test daily aggregation in gold layer."""
    from src.pipelines.gold import aggregate_daily_revenue
    # ... test with sample data
```

### Step 3: Deploy to Production on Merge
```yaml
# .github/workflows/databricks-deploy.yml
name: Databricks Deploy

on:
  push:
    branches: [main]
    paths: ['src/**', 'resources/**', 'databricks.yml']

jobs:
  deploy-production:
    runs-on: ubuntu-latest
    environment: production  # Requires approval if configured
    concurrency:
      group: databricks-prod-deploy
      cancel-in-progress: false
    steps:
      - uses: actions/checkout@v4
      - uses: databricks/setup-cli@main

      - name: Validate production bundle
        env:
          DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_PROD }}
          DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID_PROD }}
          DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET_PROD }}
        run: databricks bundle validate -t prod

      - name: Deploy to production
        env:
          DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_PROD }}
          DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID_PROD }}
          DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET_PROD }}
        run: |
          databricks bundle deploy -t prod
          echo "## Deployment Summary" >> $GITHUB_STEP_SUMMARY
          databricks bundle summary -t prod >> $GITHUB_STEP_SUMMARY

      - name: Trigger smoke test
        env:
          DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_PROD }}
          DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID_PROD }}
          DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET_PROD }}
        run: databricks bundle run prod_etl_pipeline -t prod --no-wait
```

### Step 4: OIDC Authentication (Keyless CI)
Eliminate long-lived secrets by using GitHub OIDC federation with Databricks.

```yaml
# In GitHub Actions — no client_secret needed
jobs:
  deploy:
    permissions:
      id-token: write  # Required for OIDC
      contents: read
    steps:
      - uses: actions/checkout@v4
      - uses: databricks/setup-cli@main

      - name: Deploy with OIDC
        env:
          DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
          DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }}
          # No DATABRICKS_CLIENT_SECRET — uses GitHub OIDC token
          ARM_USE_OIDC: true
        run: databricks bundle deploy -t prod
```

## Output
- CI workflow validating bundles and running unit tests on every PR
- Staging deployment with integration tests before merge
- Production deployment on merge to main with approval gate
- Concurrency control preventing parallel deployments

## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Bundle validation fails | Invalid YAML or missing variables | Run `databricks bundle validate` locally first |
| Auth error in CI | Client secret expired | Regenerate OAuth secret or switch to OIDC |
| Integration test timeout | Cluster cold start | Use instance pools or increase timeout |
| Deploy conflict | Concurrent CI runs | Use `concurrency` group in GitHub Actions |
| PySpark import error | Missing `pyspark` in CI | Add to `pip install` step |

## Examples

### Local Validation Before Push
```bash
# Validate and dry-run before committing
databricks bundle validate -t staging
databricks bundle deploy -t staging --dry-run
pytest tests/unit/ -v
```

### Branch-Based Development Targets
```yaml
# databricks.yml — auto-name resources per developer
targets:
  dev:
    default: true
    mode: development
    # In dev mode, resources auto-prefixed with [dev username]
    workspace:
      root_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/dev
```

## Resources
- [CI/CD with Bundles](https://docs.databricks.com/aws/en/dev-tools/bundles/ci-cd-bundles)
- [databricks/setup-cli Action](https://github.com/databricks/setup-cli)
- [OAuth M2M](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m)
- [Bundle Configuration](https://docs.databricks.com/aws/en/dev-tools/bundles/settings)

## Next Steps
For Asset Bundle deployment details, see `databricks-deploy-integration`.

Related Skills

zapier-integration-helper

25
from ComeOnOliver/skillshub

Zapier Integration Helper - Auto-activating skill for Business Automation. Triggers on: zapier integration helper, zapier integration helper Part of the Business Automation skill category.

integration-test-setup

25
from ComeOnOliver/skillshub

Integration Test Setup - Auto-activating skill for Test Automation. Triggers on: integration test setup, integration test setup Part of the Test Automation skill category.

running-integration-tests

25
from ComeOnOliver/skillshub

This skill enables Claude to run and manage integration test suites. It automates environment setup, database seeding, service orchestration, and cleanup. Use this skill when the user asks to "run integration tests", "execute integration tests", or any command that implies running integration tests for a project, including specifying particular test suites or options like code coverage. It is triggered by phrases such as "/run-integration", "/rit", or requests mentioning "integration tests". The plugin handles database creation, migrations, seeding, and dependent service management.

integration-test-generator

25
from ComeOnOliver/skillshub

Integration Test Generator - Auto-activating skill for API Integration. Triggers on: integration test generator, integration test generator Part of the API Integration skill category.

fathom-ci-integration

25
from ComeOnOliver/skillshub

Test Fathom integrations in CI/CD pipelines. Trigger with phrases like "fathom CI", "fathom github actions", "test fathom pipeline".

exa-deploy-integration

25
from ComeOnOliver/skillshub

Deploy Exa integrations to Vercel, Docker, and Cloud Run platforms. Use when deploying Exa-powered applications to production, configuring platform-specific secrets, or building search API endpoints. Trigger with phrases like "deploy exa", "exa Vercel", "exa production deploy", "exa Cloud Run", "exa Docker".

exa-ci-integration

25
from ComeOnOliver/skillshub

Configure Exa CI/CD integration with GitHub Actions and automated testing. Use when setting up automated testing for Exa integrations, configuring CI pipelines, or adding Exa health checks to builds. Trigger with phrases like "exa CI", "exa GitHub Actions", "exa automated tests", "CI exa", "exa pipeline".

evernote-deploy-integration

25
from ComeOnOliver/skillshub

Deploy Evernote integrations to production environments. Use when deploying to cloud platforms, configuring production, or setting up deployment pipelines. Trigger with phrases like "deploy evernote", "evernote production deploy", "release evernote", "evernote cloud deployment".

evernote-ci-integration

25
from ComeOnOliver/skillshub

Configure CI/CD pipelines for Evernote integrations. Use when setting up automated testing, continuous integration, or deployment pipelines for Evernote projects. Trigger with phrases like "evernote ci", "evernote github actions", "evernote pipeline", "automate evernote tests".

elevenlabs-deploy-integration

25
from ComeOnOliver/skillshub

Deploy ElevenLabs TTS applications to Vercel, Fly.io, and Cloud Run. Use when deploying ElevenLabs-powered apps to production, configuring platform-specific secrets, or setting up serverless TTS. Trigger: "deploy elevenlabs", "elevenlabs Vercel", "elevenlabs Cloud Run", "elevenlabs Fly.io", "elevenlabs serverless", "host TTS API".

elevenlabs-ci-integration

25
from ComeOnOliver/skillshub

Configure CI/CD pipelines for ElevenLabs with mocked unit tests and gated integration tests. Use when setting up GitHub Actions for TTS projects, configuring CI test strategies, or automating ElevenLabs integration validation. Trigger: "elevenlabs CI", "elevenlabs GitHub Actions", "elevenlabs automated tests", "CI elevenlabs", "elevenlabs pipeline".

documenso-deploy-integration

25
from ComeOnOliver/skillshub

Deploy Documenso integrations across different platforms and environments. Use when deploying to cloud platforms, containerizing applications, or setting up infrastructure for Documenso integrations. Trigger with phrases like "deploy documenso", "documenso docker", "documenso kubernetes", "documenso cloud deployment".