python-ml-predictive

Use when adding forecasting, classification, regression, or anomaly detection to a SaaS feature — demand/sales/cash-flow forecasting, churn and risk scoring, anomaly detection — with scikit-learn, Prophet, and statsmodels. Covers data prep, model serving, monitoring, and explainability.

8 stars

bypeterbamuhigire

View on GitHub Installation ↓

Best use case

python-ml-predictive is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using python-ml-predictive should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/python-ml-predictive/SKILL.md --create-dirs "https://raw.githubusercontent.com/peterbamuhigire/skills-web-dev/main/skills/languages/python-ml-predictive/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/python-ml-predictive/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How python-ml-predictive Compares

Feature / Agent	python-ml-predictive	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# Python ML & Predictive Analytics
Acknowledgement: Shared by Peter Bamuhigire, techguypeter.com, +256 784 464178.

<!-- dual-compat-start -->
## Use When

- Use when adding forecasting, classification, regression, or anomaly detection to a SaaS feature — demand/sales/cash-flow forecasting, churn and risk scoring, anomaly detection — with scikit-learn, Prophet, and statsmodels. Covers data prep, model serving, monitoring, and explainability.
- The task needs reusable judgment, domain constraints, or a proven workflow rather than ad hoc advice.

## Do Not Use When

- The task is unrelated to `python-ml-predictive` or would be better handled by a more specific companion skill.
- The request only needs a trivial answer and none of this skill's constraints or references materially help.

## Required Inputs

- Gather relevant project context, constraints, and the concrete problem to solve; load `references` only as needed.
- Confirm the desired deliverable: design, code, review, migration plan, audit, or documentation.

## Workflow

- Read this `SKILL.md` first, then load only the referenced deep-dive files that are necessary for the task.
- Apply the ordered guidance, checklists, and decision rules in this skill instead of cherry-picking isolated snippets.
- Produce the deliverable with assumptions, risks, and follow-up work made explicit when they matter.

## Quality Standards

- Keep outputs execution-oriented, concise, and aligned with the repository's baseline engineering standards.
- Preserve compatibility with existing project conventions unless the skill explicitly requires a stronger standard.
- Prefer deterministic, reviewable steps over vague advice or tool-specific magic.

## Anti-Patterns

- Treating examples as copy-paste truth without checking fit, constraints, or failure modes.
- Loading every reference file by default instead of using progressive disclosure.

## Outputs

- A concrete result that fits the task: implementation guidance, review findings, architecture decisions, templates, or generated artifacts.
- Clear assumptions, tradeoffs, or unresolved gaps when the task cannot be completed from available context alone.
- References used, companion skills, or follow-up actions when they materially improve execution.

## Evidence Produced

| Category | Artifact | Format | Example |
|----------|----------|--------|---------|
| Correctness | Model evaluation report | Markdown doc covering train/test split, baseline comparison, and per-segment metrics | `docs/python/ml-eval-2026-04-16.md` |
| Operability | Model deployment runbook | Markdown doc per `skill-composition-standards/references/runbook-template.md` covering deploy, drift detection, and re-train procedure | `docs/python/ml-runbook.md` |

## References

- Use the `references/` directory for deep detail after reading the core workflow below.
<!-- dual-compat-end -->
Real statistical / machine-learning models for SaaS features: forecasting, classification, regression, anomaly detection. Complements the LLM-based `ai-*` skills — use ML/stats when the problem is numeric, data-rich, and requires explainable, stable outputs.

**Prerequisites:** Load `python-modern-standards` and `python-saas-integration` before this skill. Load `python-data-analytics` for feature engineering.

## When this skill applies

- Demand / sales / cash-flow forecasting (time-series).
- Churn prediction, credit / risk scoring, fraud flagging (classification).
- Price optimization, quantity regression (regression).
- Anomaly detection on SaaS metrics (transaction volume, latency, error rates).
- Any feature where a PHP developer would write a rule that needs tuning from data.

## When ML vs LLM vs rules (decision rule)

```text
Output is a number or label derived from many numeric features  -> ML
Output is text, summary, classification of language, extraction -> LLM
Output can be stated in < 5 unambiguous rules                    -> Rules (PHP)
Hybrid (e.g., LLM extracts features -> ML classifies)            -> both, pipeline
```

Rules beat ML whenever you can enumerate them — faster, explainable, testable. Reach for ML when the signal is in the data, the rules are fuzzy, and the cost of a wrong answer is modest.

See `references/when-ml-vs-llm-vs-rules.md`.

## Core stack

- **scikit-learn** — classification, regression, clustering, model pipelines, metrics. Default for non-timeseries.
- **statsmodels** — ARIMA / ETS / SARIMA, rigorous statistical output.
- **Prophet** (or NeuralProphet) — seasonal + holiday-aware forecasts with minimal tuning.
- **XGBoost / LightGBM** — when trees beat linear models, which is often.
- **PyOD** — anomaly detection algorithms (isolation forest, LOF, ECOD).
- **SHAP** — explainability.
- **joblib** — model serialization.
- **numpy / pandas** — pre-/post-processing.

Avoid starting with deep learning. Tabular SaaS problems are almost always best solved with gradient-boosted trees or linear models. Deep learning earns its keep only for unstructured data (text, images, audio) — for which we usually call LLMs or pre-trained models instead of training from scratch.

## Data preparation discipline

Leakage is the silent killer. Prevent it with process, not vigilance.

**Splits:**
- Random split for IID tabular → `train_test_split(..., stratify=y)`.
- Time-series → **time-based split** only. Training data must precede validation, which must precede test. No shuffling.
- Grouped data (per customer, per tenant) → `GroupKFold`. Never let the same customer appear in train and test.

**Leakage sources to eliminate:**
- Target-derived features (e.g., "days since last invoice" when predicting next invoice).
- Future data in features (aggregations that include the label row).
- Preprocessing on full data before split (fit scalers/encoders on train only).

Always wrap preprocessing in a **Pipeline** so fit-on-train/apply-on-test is automatic:

```python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import GradientBoostingClassifier

preprocess = ColumnTransformer([
    ("num", StandardScaler(), numeric_cols),
    ("cat", OneHotEncoder(handle_unknown="ignore", min_frequency=10), categorical_cols),
])

model = Pipeline([("prep", preprocess), ("clf", GradientBoostingClassifier(random_state=42))])
model.fit(X_train, y_train)   # scaler fits only on train
```

See `references/data-prep.md`.

## Time-series forecasting

**Prophet** — use when you have daily or weekly data with yearly / weekly seasonality and holidays matter. Minimal tuning, business-friendly confidence intervals.

```python
from prophet import Prophet

df_fc = df.rename(columns={"date": "ds", "sales": "y"})
m = Prophet(yearly_seasonality=True, weekly_seasonality=True, holidays=kenya_holidays)
m.fit(df_fc)
future = m.make_future_dataframe(periods=90, freq="D")
forecast = m.predict(future)     # yhat, yhat_lower, yhat_upper
```

**statsmodels ARIMA / SARIMA / ETS** — use when you need rigor (AIC, residual diagnostics, significance), simple series without holidays, or when Prophet is overkill.

```python
from statsmodels.tsa.statespace.sarimax import SARIMAX

model = SARIMAX(y_train, order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
res = model.fit(disp=False)
forecast = res.get_forecast(steps=12)
mean = forecast.predicted_mean
ci = forecast.conf_int()
```

**Always produce intervals**, not just point forecasts. Users need to see uncertainty. See `references/forecasting-prophet.md` and `references/forecasting-statsmodels.md`.

## Classification & regression (tabular)

Default model order to try:

1. Logistic regression (classification) / Ridge (regression) — fast, interpretable baseline.
2. Gradient Boosting (sklearn / XGBoost / LightGBM) — usually the winner on tabular.
3. Random Forest — second try if GB is unstable.
4. Neural network — only if you have > 100K rows, heavy feature engineering fails, and you have time.

**Hyperparameter tuning:** `RandomizedSearchCV` with a small budget first, then refine. Don't grid-search over 10,000 combinations.

**Metrics — never just accuracy:**
- Binary classification, balanced: ROC-AUC, F1.
- Binary classification, imbalanced (churn, fraud): precision@k, recall@k, precision-recall AUC.
- Multiclass: macro-F1, log-loss.
- Regression: MAE, RMSE, R². MAPE only when no zeros in target.

See `references/classification-regression-sklearn.md` and `references/evaluation-metrics.md`.

## Anomaly detection

Three tiers in order of complexity:

1. **Threshold on a metric** (simplest): rolling mean ± 3 × rolling std. Catch big, easy anomalies, run cheap.
2. **Statistical distribution**: Modified Z-score (robust to outliers), ESD test, STL decomposition residuals.
3. **Model-based**: **IsolationForest** (sklearn) or **ECOD** (PyOD). Best for multi-feature anomalies.

```python
from sklearn.ensemble import IsolationForest
iso = IsolationForest(contamination=0.01, random_state=42).fit(X_train)
scores = -iso.score_samples(X_new)   # higher = more anomalous
```

Calibrate thresholds on a known-clean period. Re-calibrate monthly. See `references/anomaly-detection.md`.

## Model serving

Lightweight by default. No MLflow/Kubeflow unless you're genuinely running dozens of models.

**Serialization:** `joblib.dump(model, path)`. Pin sklearn/xgboost versions in `pyproject.toml` — pickle format breaks across major versions.

**Loading:** eager-load at sidecar startup or worker startup. Keep the model in memory for the life of the process. Never load per request.

```python
# src/service_name/ml/churn.py
from joblib import load
from pathlib import Path

MODEL_PATH = Path(__file__).parent / "artifacts" / "churn_v3.joblib"
_model = load(MODEL_PATH)     # loads once at import

def score(features: dict) -> float:
    X = build_feature_frame(features)
    return float(_model.predict_proba(X)[0, 1])
```

**Versioning:** filename includes semver or date (`churn_v3.joblib`, `demand_20260301.joblib`). Current version is a symlink. Rollback = flip symlink + restart.

**A/B tests / shadow mode:** run new model alongside old on same inputs, log both predictions, compare offline.

See `references/model-serving.md`.

## Monitoring & drift

Models degrade silently. Detect it.

**Feature drift:** distributions of inputs shift over time. Monitor `mean`, `std`, `quantiles` of each input feature vs. a reference window. Use population stability index (PSI) or Kolmogorov-Smirnov statistic.

**Prediction drift:** distribution of predictions shifts. Cheap proxy for feature drift.

**Performance drift:** requires labels (arriving later). Log predictions, match to truth when it arrives, compute rolling metric.

**Alert when:** PSI > 0.2 for any feature, or monthly metric deviates > N% from training baseline.

**Retraining triggers:** calendar (monthly/quarterly), drift alert, or drop in business KPI.

See `references/monitoring-and-drift.md`.

## Explainability

Users and regulators ask "why?" Have an answer.

**Global:** feature importance from the model (`model.feature_importances_` for trees, coefficients for linear).

**Local (per prediction):** SHAP values. Log top-3 contributing features per prediction for high-stakes scores (credit, risk, fraud).

```python
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_sample)
```

Confidence intervals on predictions matter as much as the prediction itself. Regressors: use quantile regressors or bootstrap. Classifiers: calibrate probabilities with `CalibratedClassifierCV`.

See `references/explainability.md`.

## Integration patterns

**Sidecar (sync scoring):** small, fast models. `POST /score` → features in → prediction + confidence out. Latency budget < 200ms.

**Worker (batch scoring):** nightly scoring of all customers for churn; bulk forecast for all SKUs. Writes results to MySQL where PHP reads them.

**Training:** worker job, runs on a schedule. Output: new model artifact + metrics report. Never train in a sidecar.

## Pitfalls specific to SaaS

- **Multi-tenant models:** one global model vs. per-tenant models? Start global + tenant as a feature. Go per-tenant only with evidence.
- **Cold start:** new tenants have no history. Fall back to global model or rules. Plan this from day one.
- **Label leakage via user actions:** if users can act on a prediction (e.g., flagging fraud), make sure those actions don't become features for the next model version without an offset.
- **Class imbalance:** 2% churn rate = baseline accuracy of 98% with a constant "no" predictor. Use `class_weight="balanced"`, or downsample, or use threshold tuning.
- **Currency / units:** check whether you're predicting log(amount) or amount; mismatched exponents produce nonsense.

## Anti-patterns

- Training on the full dataset "to get better results." You can't measure the result.
- Deploying a model with no baseline (constant predictor, simple rule). You won't know if it's actually helping.
- Serving with `model.predict(X)` inside the request handler without timeout or concurrency limits.
- Storing model artifacts in git. Use a models/ directory that's gitignored; artifact store is S3 or similar.
- Retraining nightly "just because." Retrain on a trigger.
- Ignoring calibration on probability outputs shown to users. A "40% churn risk" is meaningless unless calibrated.

## References

- `references/when-ml-vs-llm-vs-rules.md`
- `references/data-prep.md`
- `references/forecasting-prophet.md`
- `references/forecasting-statsmodels.md`
- `references/classification-regression-sklearn.md`
- `references/anomaly-detection.md`
- `references/evaluation-metrics.md`
- `references/model-serving.md`
- `references/monitoring-and-drift.md`
- `references/explainability.md`

## See also

- `ai-predictive-analytics` — LLM-based prediction (use when features are unstructured text).
- `ai-evaluation` — for LLM output quality; ML evaluation is different (this skill).
- `saas-business-metrics` — to decide which outcome to model.

Related Skills

python-modern-standards

from peterbamuhigire/skills-web-dev

Use when writing or reviewing any Python code in our SaaS projects — defines Python version, project layout, tooling (uv, ruff, mypy), typing, Pydantic v2, logging, configuration, async rules, error handling, testing, and security baseline. Load this before any other Python skill.

python-data-pipelines

from peterbamuhigire/skills-web-dev

Use when building ETL jobs, document intelligence pipelines, OCR, PDF/Excel ingestion, image/media processing, or external-API sync pipelines in Python — idempotent scheduled jobs with validation, dead-letter queues, and multi-tenant isolation.

python-data-analytics

from peterbamuhigire/skills-web-dev

Use when computing complex analytics, KPIs, cohort/funnel/retention metrics, financial math (IRR/NPV/amortization), statistical tests, anomaly detection, or geospatial analytics in Python — for cases where SQL alone gets unwieldy.

web-app-security-audit

from peterbamuhigire/skills-web-dev

Use when auditing a PHP/JavaScript/HTML web application for security vulnerabilities. Covers configuration, authentication, authorization, input validation, XSS, API security, HTTP headers, and dependency scanning. Produces a severity-rated audit...

vibe-security-skill

from peterbamuhigire/skills-web-dev

Use when designing or reviewing security for a web application, API, or multi-tenant SaaS — produces threat model, abuse case list, auth/authz matrix, and secret handling plan; covers OWASP Top 10 2025 and the AI-code-generation blind spots. Neighbours — api-design-first owns auth model fields, deployment-release-engineering owns secret rotation choreography, ai-security and llm-security own model-specific threats.

network-security

from peterbamuhigire/skills-web-dev

Use when designing, hardening, or auditing network-layer security for self-managed Debian/Ubuntu SaaS infrastructure — firewalls (nftables/UFW), WAF (ModSecurity + OWASP CRS), VPN (WireGuard, OpenVPN, IPsec), TLS/PKI ops, IDS/IPS (Suricata, Fail2ban), zero-trust, SSH hardening, DDoS mitigation, DNS security. Complements web-app-security-audit (app layer) and cicd-devsecops (secrets/CI).

linux-security-hardening

from peterbamuhigire/skills-web-dev

Use when hardening a Debian/Ubuntu server — user/group/sudo hardening, file permission audits, PAM password policy + MFA, AppArmor mandatory access control, auditd system call logging, kernel sysctl hardening, file integrity monitoring (AIDE), rootkit detection (rkhunter/chkrootkit), unattended security patching, GRUB + UEFI + LUKS boot security, and CIS benchmark compliance.

dpia-generator

from peterbamuhigire/skills-web-dev

Generate a Data Protection Impact Assessment (DPIA), Uganda DPPA 2019-compliant. Use when producing or reviewing a data protection impact assessment, a privacy impact assessment, when uganda-dppa-compliance flags [DPIA-REQUIRED], or when processing large-scale or sensitive personal data for a new feature.

code-safety-scanner

from peterbamuhigire/skills-web-dev

Scan any codebase for 14 critical safety issues across security vulnerabilities, server stability (500 errors), and payment misconfigurations. Use when auditing code before deployment, reviewing AI-generated code for production readiness, or...

world-class-engineering

from peterbamuhigire/skills-web-dev

Use when designing, building, reviewing, or upgrading production software systems that must be secure, performant, maintainable, scalable, and user-centered. Apply before writing specs, code, architecture, APIs, databases, mobile apps, SaaS platforms, or ERP systems.

update-Codex-documentation

from peterbamuhigire/skills-web-dev

Update project documentation files (README.md, PROJECT_BRIEF.md, TECH_STACK.md, ARCHITECTURE.md, docs/API.md, docs/DATABASE.md, AGENTS.md, docs/plans/NEXT_FEATURES.md) when significant changes occur. MANDATORY at end of each work session to...

skill-writing

from peterbamuhigire/skills-web-dev

Use when creating or upgrading skills in this repository. Covers repository-specific frontmatter rules, progressive disclosure, reference-file strategy, validation, and the quality bar required for production-grade engineering skills.