streamline-analyst-guide

End-to-end data analysis AI agent with Streamlit UI

191 stars

Best use case

streamline-analyst-guide is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

End-to-end data analysis AI agent with Streamlit UI

Teams using streamline-analyst-guide should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/streamline-analyst-guide/SKILL.md --create-dirs "https://raw.githubusercontent.com/wentorai/research-plugins/main/skills/analysis/wrangling/streamline-analyst-guide/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/streamline-analyst-guide/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How streamline-analyst-guide Compares

Feature / Agent	streamline-analyst-guide	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

End-to-end data analysis AI agent with Streamlit UI

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Streamline Analyst Guide

## Overview

Streamline Analyst is an end-to-end data analysis AI agent with a Streamlit web interface. Upload a dataset and describe your analysis goal in natural language — the agent handles data cleaning, EDA, feature engineering, model training, evaluation, and report generation. Provides an interactive UI for reviewing each step and adjusting parameters.

## Installation

```bash
git clone https://github.com/Wilson-ZheLin/Streamline-Analyst.git
cd Streamline-Analyst
pip install -r requirements.txt
streamlit run app.py
```

## Workflow

```
Upload Dataset (CSV, Excel, Parquet)
         ↓
   Data Profiling
   ├── Column types and distributions
   ├── Missing value analysis
   ├── Correlation matrix
   └── Outlier detection
         ↓
   Data Cleaning (interactive)
   ├── Handle missing values
   ├── Remove/fix outliers
   ├── Type conversions
   └── Feature encoding
         ↓
   EDA (automated + custom)
   ├── Univariate analysis
   ├── Bivariate relationships
   ├── Statistical tests
   └── Custom visualizations
         ↓
   Modeling (if applicable)
   ├── Train/test split
   ├── Model selection + training
   ├── Hyperparameter tuning
   └── Evaluation metrics
         ↓
   Report Generation
```

## Features

```python
# Streamline Analyst provides:

# 1. Smart data profiling
# - Auto-detect column types (numeric, categorical, datetime)
# - Distribution analysis per column
# - Missing value patterns (MCAR, MAR, MNAR hints)
# - Correlation analysis with significance

# 2. Interactive cleaning
# - Imputation strategies (mean, median, mode, KNN, model)
# - Outlier handling (IQR, Z-score, isolation forest)
# - Encoding (one-hot, label, target, ordinal)
# - Scaling (standard, minmax, robust)

# 3. Automated EDA
# - Distribution plots (histogram, KDE, box, violin)
# - Relationship plots (scatter, pair, heatmap)
# - Time series decomposition
# - Statistical tests (t-test, ANOVA, chi-square, Mann-Whitney)

# 4. Model pipeline
# - Classification: LR, RF, GBM, SVM, MLP
# - Regression: LR, RF, GBM, SVR, ElasticNet
# - Cross-validation with confidence intervals
# - Feature importance visualization
# - SHAP explanations

# 5. Report
# - HTML report with all plots and findings
# - Downloadable cleaned dataset
# - Model artifacts (pickle)
```

## Natural Language Interface

```markdown
### Example Prompts
- "Show me the distribution of all numeric columns"
- "Is there a significant difference in income between genders?"
- "Build a classifier to predict churn using all features"
- "What are the top 5 most important features for prediction?"
- "Clean the data: fill missing values and remove outliers"
- "Generate a summary report of this dataset"
```

## Use Cases

1. **Quick EDA**: Rapid exploration of unfamiliar datasets
2. **Data cleaning**: Interactive preprocessing with AI guidance
3. **Baseline models**: Quick ML prototyping without coding
4. **Report generation**: Automated analysis reports
5. **Teaching**: Interactive data science demonstrations

## References

- [Streamline-Analyst GitHub](https://github.com/Wilson-ZheLin/Streamline-Analyst)
- [Streamlit](https://streamlit.io/)