senior-data-scientist
World-class data science skill for statistical modeling, experimentation, causal inference, and advanced analytics. Expertise in Python (NumPy, Pandas, Scikit-learn), R, SQL, statistical methods, A/B testing, time series, and business intelligence. Includes experiment design, feature engineering, model evaluation, and stakeholder communication. Use when designing experiments, building predictive models, performing causal analysis, or driving data-driven decisions.
Best use case
senior-data-scientist is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
World-class data science skill for statistical modeling, experimentation, causal inference, and advanced analytics. Expertise in Python (NumPy, Pandas, Scikit-learn), R, SQL, statistical methods, A/B testing, time series, and business intelligence. Includes experiment design, feature engineering, model evaluation, and stakeholder communication. Use when designing experiments, building predictive models, performing causal analysis, or driving data-driven decisions.
Teams using senior-data-scientist should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/senior-data-scientist/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How senior-data-scientist Compares
| Feature / Agent | senior-data-scientist | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
World-class data science skill for statistical modeling, experimentation, causal inference, and advanced analytics. Expertise in Python (NumPy, Pandas, Scikit-learn), R, SQL, statistical methods, A/B testing, time series, and business intelligence. Includes experiment design, feature engineering, model evaluation, and stakeholder communication. Use when designing experiments, building predictive models, performing causal analysis, or driving data-driven decisions.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
# Senior Data Scientist World-class senior data scientist skill for production-grade AI/ML/Data systems. ## Quick Start ### Main Capabilities ```bash # Core Tool 1 python scripts/experiment_designer.py --input data/ --output results/ # Core Tool 2 python scripts/feature_engineering_pipeline.py --target project/ --analyze # Core Tool 3 python scripts/model_evaluation_suite.py --config config.yaml --deploy ``` ## Core Expertise This skill covers world-class capabilities in: - Advanced production patterns and architectures - Scalable system design and implementation - Performance optimization at scale - MLOps and DataOps best practices - Real-time processing and inference - Distributed computing frameworks - Model deployment and monitoring - Security and compliance - Cost optimization - Team leadership and mentoring ## Tech Stack **Languages:** Python, SQL, R, Scala, Go **ML Frameworks:** PyTorch, TensorFlow, Scikit-learn, XGBoost **Data Tools:** Spark, Airflow, dbt, Kafka, Databricks **LLM Frameworks:** LangChain, LlamaIndex, DSPy **Deployment:** Docker, Kubernetes, AWS/GCP/Azure **Monitoring:** MLflow, Weights & Biases, Prometheus **Databases:** PostgreSQL, BigQuery, Snowflake, Pinecone ## Reference Documentation ### 1. Statistical Methods Advanced Comprehensive guide available in `references/statistical_methods_advanced.md` covering: - Advanced patterns and best practices - Production implementation strategies - Performance optimization techniques - Scalability considerations - Security and compliance - Real-world case studies ### 2. Experiment Design Frameworks Complete workflow documentation in `references/experiment_design_frameworks.md` including: - Step-by-step processes - Architecture design patterns - Tool integration guides - Performance tuning strategies - Troubleshooting procedures ### 3. Feature Engineering Patterns Technical reference guide in `references/feature_engineering_patterns.md` with: - System design principles - Implementation examples - Configuration best practices - Deployment strategies - Monitoring and observability ## Production Patterns ### Pattern 1: Scalable Data Processing Enterprise-scale data processing with distributed computing: - Horizontal scaling architecture - Fault-tolerant design - Real-time and batch processing - Data quality validation - Performance monitoring ### Pattern 2: ML Model Deployment Production ML system with high availability: - Model serving with low latency - A/B testing infrastructure - Feature store integration - Model monitoring and drift detection - Automated retraining pipelines ### Pattern 3: Real-Time Inference High-throughput inference system: - Batching and caching strategies - Load balancing - Auto-scaling - Latency optimization - Cost optimization ## Best Practices ### Development - Test-driven development - Code reviews and pair programming - Documentation as code - Version control everything - Continuous integration ### Production - Monitor everything critical - Automate deployments - Feature flags for releases - Canary deployments - Comprehensive logging ### Team Leadership - Mentor junior engineers - Drive technical decisions - Establish coding standards - Foster learning culture - Cross-functional collaboration ## Performance Targets **Latency:** - P50: < 50ms - P95: < 100ms - P99: < 200ms **Throughput:** - Requests/second: > 1000 - Concurrent users: > 10,000 **Availability:** - Uptime: 99.9% - Error rate: < 0.1% ## Security & Compliance - Authentication & authorization - Data encryption (at rest & in transit) - PII handling and anonymization - GDPR/CCPA compliance - Regular security audits - Vulnerability management ## Common Commands ```bash # Development python -m pytest tests/ -v --cov python -m black src/ python -m pylint src/ # Training python scripts/train.py --config prod.yaml python scripts/evaluate.py --model best.pth # Deployment docker build -t service:v1 . kubectl apply -f k8s/ helm upgrade service ./charts/ # Monitoring kubectl logs -f deployment/service python scripts/health_check.py ``` ## Resources - Advanced Patterns: `references/statistical_methods_advanced.md` - Implementation Guide: `references/experiment_design_frameworks.md` - Technical Reference: `references/feature_engineering_patterns.md` - Automation Scripts: `scripts/` directory ## Senior-Level Responsibilities As a world-class senior professional: 1. **Technical Leadership** - Drive architectural decisions - Mentor team members - Establish best practices - Ensure code quality 2. **Strategic Thinking** - Align with business goals - Evaluate trade-offs - Plan for scale - Manage technical debt 3. **Collaboration** - Work across teams - Communicate effectively - Build consensus - Share knowledge 4. **Innovation** - Stay current with research - Experiment with new approaches - Contribute to community - Drive continuous improvement 5. **Production Excellence** - Ensure high availability - Monitor proactively - Optimize performance - Respond to incidents
Related Skills
senior-security
Security engineering toolkit for threat modeling, vulnerability analysis, secure architecture, and penetration testing. Includes STRIDE analysis, OWASP guidance, cryptography patterns, and security scanning tools.
senior-secops
Comprehensive SecOps skill for application security, vulnerability management, compliance, and secure development practices. Includes security scanning, vulnerability assessment, compliance checking, and security automation. Use when implementing security controls, conducting security audits, responding to vulnerabilities, or ensuring compliance requirements.
senior-qa
This skill should be used when the user asks to "generate tests", "write unit tests", "analyze test coverage", "scaffold E2E tests", "set up Playwright", "configure Jest", "implement testing patterns", or "improve test quality". Use for React/Next.js testing with Jest, React Testing Library, and Playwright.
senior-prompt-engineer
This skill should be used when the user asks to "optimize prompts", "design prompt templates", "evaluate LLM outputs", "build agentic systems", "implement RAG", "create few-shot examples", "analyze token usage", or "design AI workflows". Use for prompt engineering patterns, LLM evaluation frameworks, agent architectures, and structured output design.
senior-ml-engineer
ML engineering skill for productionizing models, building MLOps pipelines, and integrating LLMs. Covers model deployment, feature stores, drift monitoring, RAG systems, and cost optimization.
senior-fullstack
Fullstack development toolkit with project scaffolding for Next.js/FastAPI/MERN/Django stacks and code quality analysis. Use when scaffolding new projects, analyzing codebase quality, or implementing fullstack architecture patterns.
senior-devops
Comprehensive DevOps skill for CI/CD, infrastructure automation, containerization, and cloud platforms (AWS, GCP, Azure). Includes pipeline setup, infrastructure as code, deployment automation, and monitoring. Use when setting up pipelines, deploying applications, managing infrastructure, implementing monitoring, or optimizing deployment processes.
senior-data-engineer
Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.
senior-computer-vision
Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Faster R-CNN/DETR detection, Mask R-CNN/SAM segmentation, and production deployment with ONNX/TensorRT. Includes PyTorch, torchvision, Ultralytics, Detectron2, and MMDetection frameworks. Use when building detection pipelines, training custom models, optimizing inference, or deploying vision systems.
senior-backend
This skill should be used when the user asks to "design REST APIs", "optimize database queries", "implement authentication", "build microservices", "review backend code", "set up GraphQL", "handle database migrations", or "load test APIs". Use for Node.js/Express/Fastify development, PostgreSQL optimization, API security, and backend architecture patterns.
senior-architect
This skill should be used when the user asks to "design system architecture", "evaluate microservices vs monolith", "create architecture diagrams", "analyze dependencies", "choose a database", "plan for scalability", "make technical decisions", or "review system design". Use for architecture decision records (ADRs), tech stack evaluation, system design reviews, dependency analysis, and generating architecture diagrams in Mermaid, PlantUML, or ASCII format.
senior-frontend
Frontend development skill for React, Next.js, TypeScript, and Tailwind CSS applications. Use when building React components, optimizing Next.js performance, analyzing bundle sizes, scaffolding frontend projects, implementing accessibility, or reviewing frontend code quality.