survey-data-processing

Clean, recode, and prepare survey response data for analysis

191 stars

Best use case

survey-data-processing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Clean, recode, and prepare survey response data for analysis

Teams using survey-data-processing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/survey-data-processing/SKILL.md --create-dirs "https://raw.githubusercontent.com/wentorai/research-plugins/main/skills/analysis/wrangling/survey-data-processing/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/survey-data-processing/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How survey-data-processing Compares

Feature / Agent	survey-data-processing	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Clean, recode, and prepare survey response data for analysis

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Survey Data Processing

A skill for cleaning, recoding, and preparing survey response data for statistical analysis. Covers handling common survey data issues such as incomplete responses, attention check failures, reverse-coded items, scale construction, open-ended response coding, and export to analysis-ready formats compatible with SPSS, Stata, and R.

## Survey Data Quality Assessment

### Initial Inspection Workflow

Survey data from platforms like Qualtrics, SurveyMonkey, REDCap, and Google Forms each have their own export formats and quirks. The first step is always standardization.

```python
import pandas as pd
import numpy as np

def assess_survey_quality(df, duration_col="duration_seconds",
                          min_duration=60):
    """
    Generate a survey data quality report.

    Checks:
    - Completion rates per question
    - Response duration (speeders and slow responders)
    - Straight-line responding patterns
    - Attention check failures
    """
    report = {}

    # Overall completion
    total_respondents = len(df)
    complete = df.dropna(thresh=int(len(df.columns) * 0.8))
    report["total_responses"] = total_respondents
    report["substantially_complete"] = len(complete)
    report["completion_rate"] = f"{len(complete)/total_respondents*100:.1f}%"

    # Duration analysis
    if duration_col in df.columns:
        durations = df[duration_col].dropna()
        report["median_duration_seconds"] = durations.median()
        report["speeders"] = (durations < min_duration).sum()
        report["speeder_pct"] = f"{(durations < min_duration).mean()*100:.1f}%"

    # Missing data per question
    missing_by_col = df.isna().sum().sort_values(ascending=False)
    report["most_skipped_questions"] = missing_by_col.head(10).to_dict()

    return report
```

### Identifying Low-Quality Responses

```python
def detect_straightlining(df, likert_columns, threshold=0.9):
    """
    Detect respondents who select the same answer for nearly
    all Likert-scale questions (straight-line responding).

    A respondent is flagged if the proportion of their most
    common response exceeds the threshold.
    """
    flagged = []
    for idx, row in df[likert_columns].iterrows():
        responses = row.dropna()
        if len(responses) == 0:
            continue
        most_common_pct = responses.value_counts().iloc[0] / len(responses)
        if most_common_pct >= threshold:
            flagged.append(idx)

    return flagged


def check_attention_items(df, attention_checks):
    """
    Validate attention check (trap) questions.

    Args:
        attention_checks: dict of {column_name: correct_answer}
        Example: {"q15_attention": 4, "q32_trap": "strongly agree"}
    """
    failed = pd.Series(False, index=df.index)
    for col, correct in attention_checks.items():
        failed = failed | (df[col] != correct)

    return df.index[failed].tolist()
```

## Recoding and Transformation

### Reverse Coding

Many validated psychological scales include reverse-coded items to detect acquiescence bias. These must be recoded before computing scale scores.

```python
def reverse_code(df, columns, scale_max, scale_min=1):
    """
    Reverse-code specified columns for Likert-type scales.

    Formula: reversed = (scale_max + scale_min) - original

    Example for a 1-5 scale:
      1 -> 5, 2 -> 4, 3 -> 3, 4 -> 2, 5 -> 1
    """
    df_recoded = df.copy()
    for col in columns:
        df_recoded[col] = (scale_max + scale_min) - df[col]
    return df_recoded


# Example usage with a Big Five personality scale
reverse_items = {
    "extraversion": ["ext_2", "ext_4", "ext_6"],
    "neuroticism": ["neur_1", "neur_3", "neur_5"],
    "agreeableness": ["agree_3", "agree_5"],
}

# For a 1-7 Likert scale:
for construct, items in reverse_items.items():
    df = reverse_code(df, items, scale_max=7, scale_min=1)
```

### Scale Construction

```python
def compute_scale_scores(df, scale_definitions, method="mean"):
    """
    Compute composite scale scores from individual items.

    Args:
        scale_definitions: dict mapping scale name to list of columns
        method: "mean" or "sum"

    Returns:
        DataFrame with new scale score columns
    """
    for scale_name, items in scale_definitions.items():
        if method == "mean":
            df[scale_name] = df[items].mean(axis=1)
        elif method == "sum":
            df[scale_name] = df[items].sum(axis=1)

        # Also compute Cronbach's alpha for reliability
        alpha = cronbachs_alpha(df[items])
        print(f"{scale_name}: alpha = {alpha:.3f} "
              f"(n_items = {len(items)})")

    return df


def cronbachs_alpha(item_df):
    """
    Compute Cronbach's alpha for internal consistency reliability.
    Values above 0.70 are generally considered acceptable.
    """
    item_df = item_df.dropna()
    n_items = item_df.shape[1]
    if n_items < 2:
        return np.nan

    item_variances = item_df.var(axis=0, ddof=1)
    total_variance = item_df.sum(axis=1).var(ddof=1)

    alpha = (n_items / (n_items - 1)) * (
        1 - item_variances.sum() / total_variance
    )
    return alpha
```

## Open-Ended Response Processing

### Coding Qualitative Responses

```python
def code_open_responses(df, text_column, codebook):
    """
    Apply a predefined codebook to open-ended responses using
    keyword matching. For research-quality coding, this should
    be supplemented with manual coding by trained raters.

    Args:
        codebook: dict mapping code names to keyword lists
        Example: {
            "financial_concern": ["money", "cost", "expensive", "afford"],
            "time_constraint": ["time", "busy", "schedule", "hours"],
            "quality_issue": ["quality", "broken", "defect", "poor"],
        }
    """
    for code_name, keywords in codebook.items():
        pattern = "|".join(keywords)
        df[f"code_{code_name}"] = (
            df[text_column]
            .str.lower()
            .str.contains(pattern, na=False)
            .astype(int)
        )

    return df
```

### Inter-Rater Reliability

```
When multiple coders classify open-ended responses:

Cohen's Kappa (2 raters):
  - < 0.20: poor agreement
  - 0.21-0.40: fair
  - 0.41-0.60: moderate
  - 0.61-0.80: substantial
  - 0.81-1.00: almost perfect

Fleiss' Kappa (3+ raters):
  - Same interpretation scale as Cohen's
  - Use when more than two raters code the same responses

Process:
  1. Develop codebook with definitions and examples
  2. Train coders on 10-20 practice responses
  3. Code 20% of responses independently (overlap set)
  4. Calculate inter-rater reliability on the overlap set
  5. If kappa < 0.70, discuss disagreements and refine codebook
  6. Repeat until acceptable reliability is achieved
  7. Divide remaining responses among coders
```

## Data Reshaping for Analysis

### Wide to Long Format

Survey data is typically exported in wide format (one row per respondent, one column per question). Many analyses require long format.

```python
def reshape_repeated_measures(df, id_col, time_points,
                              measure_prefix):
    """
    Reshape repeated-measures survey data from wide to long.

    Example: columns q1_pre, q1_post -> long format with
    time column ("pre", "post") and value column.
    """
    value_vars = [f"{measure_prefix}_{t}" for t in time_points]

    long_df = pd.melt(
        df,
        id_vars=[id_col],
        value_vars=value_vars,
        var_name="time_point",
        value_name=measure_prefix
    )

    # Clean time_point column
    long_df["time_point"] = (
        long_df["time_point"]
        .str.replace(f"{measure_prefix}_", "")
    )

    return long_df
```

## Export for Statistical Software

```
Export formats by software:

SPSS (.sav):
  - Use pyreadstat: pyreadstat.write_sav(df, "output.sav")
  - Include variable labels and value labels
  - Set measurement level (nominal, ordinal, scale)

Stata (.dta):
  - Use pandas: df.to_stata("output.dta")
  - Include variable labels via write_stata with labels dict

R (.csv with codebook):
  - Export CSV plus a separate codebook document
  - Or use pyreadstat to write .rds format
  - Include factor level definitions

General best practices:
  - Include a unique respondent ID column
  - Use numeric codes for categorical variables (with labels)
  - Document all recoding in a companion codebook
  - Save both raw and processed versions
  - Include a timestamp column for data versioning
```

Proper survey data processing is essential for valid statistical inference. Decisions made during cleaning and recoding directly affect research conclusions, making transparent documentation of every step a methodological requirement rather than a convenience.

Related Skills

json-data-visualizer

191

from wentorai/research-plugins

Guide to JSON Crack for visualizing complex JSON data structures

datagen-research-guide

191

from wentorai/research-plugins

AI-driven multi-agent research assistant for end-to-end studies

data-collection-automation

191

from wentorai/research-plugins

Automate survey deployment, data collection, and pipeline management

database-comparison-guide

191

from wentorai/research-plugins

Compare major academic databases and when to use each for research

wikidata-api-guide

191

from wentorai/research-plugins

Query Wikidata SPARQL for scholarly metadata, authors, and entities

datacite-api

191

from wentorai/research-plugins

Resolve dataset DOIs and query research data metadata via DataCite

crossref-event-data-api

191

from wentorai/research-plugins

Track scholarly mentions across the web via Crossref Event Data

metadata-skills

191

from wentorai/research-plugins

24 metadata & bibliometrics skills. Trigger: DOI resolution, citation metrics, author disambiguation, bibliometrics. Design: metadata APIs and bibliometric analysis tools for scholarly records.

dataverse-api

191

from wentorai/research-plugins

Deposit and discover research datasets via Harvard Dataverse API

survey-research-guide

191

from wentorai/research-plugins

Design, deploy, and analyze surveys for social science and organizational res...

ipums-microdata-api

191

from wentorai/research-plugins

Access harmonized census and survey microdata via the IPUMS API

astrophysics-data-guide

191

from wentorai/research-plugins

Astronomical data processing with Astropy, FITS files, and sky surveys