responsible-ai-guide

Resources for trustworthy, fair, and ethical AI research

191 stars

Best use case

responsible-ai-guide is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Resources for trustworthy, fair, and ethical AI research

Teams using responsible-ai-guide should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/responsible-ai-guide/SKILL.md --create-dirs "https://raw.githubusercontent.com/wentorai/research-plugins/main/skills/domains/ai-ml/responsible-ai-guide/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/responsible-ai-guide/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How responsible-ai-guide Compares

Feature / Agentresponsible-ai-guideStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Resources for trustworthy, fair, and ethical AI research

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Responsible AI Guide

## Overview

A comprehensive collection of resources for building trustworthy, fair, and ethical AI systems. Covers fairness metrics, bias detection and mitigation, explainability methods, privacy-preserving techniques, robustness testing, and governance frameworks. Essential reading for researchers working on AI safety, alignment, and deploying models in high-stakes domains.

## Topic Taxonomy

```
Responsible AI
├── Fairness
│   ├── Bias detection (data, model, outcome)
│   ├── Fairness metrics (demographic parity, equalized odds)
│   ├── Bias mitigation (pre/in/post-processing)
│   └── Intersectional fairness
├── Explainability
│   ├── Feature attribution (SHAP, LIME, IG)
│   ├── Concept-based (TCAV, concept bottleneck)
│   ├── Counterfactual explanations
│   └── Mechanistic interpretability
├── Privacy
│   ├── Differential privacy
│   ├── Federated learning
│   ├── Membership inference attacks
│   └── Machine unlearning
├── Robustness
│   ├── Adversarial attacks/defenses
│   ├── Distribution shift
│   ├── Uncertainty quantification
│   └── Out-of-distribution detection
├── Safety & Alignment
│   ├── RLHF and preference learning
│   ├── Constitutional AI
│   ├── Red teaming
│   └── Guardrails and filters
└── Governance
    ├── Model cards
    ├── Datasheets for datasets
    ├── AI impact assessments
    └── Regulatory compliance (EU AI Act)
```

## Key Tools

| Tool | Category | Purpose |
|------|----------|---------|
| **Fairlearn** | Fairness | Bias assessment + mitigation |
| **AI Fairness 360** | Fairness | IBM fairness toolkit |
| **SHAP** | Explainability | Shapley value explanations |
| **Captum** | Explainability | PyTorch interpretability |
| **Opacus** | Privacy | Differential privacy for PyTorch |
| **ART** | Robustness | Adversarial robustness toolbox |
| **Alibi** | Explainability | ML model explanations |

## Fairness Assessment

```python
from fairlearn.metrics import MetricFrame
from sklearn.metrics import accuracy_score, recall_score

# Assess fairness across demographic groups
metrics = MetricFrame(
    metrics={
        "accuracy": accuracy_score,
        "recall": recall_score,
    },
    y_true=y_test,
    y_pred=y_pred,
    sensitive_features=demographics,
)

print("Overall:")
print(metrics.overall)
print("\nBy group:")
print(metrics.by_group)
print("\nDifference (max - min):")
print(metrics.difference())
```

## Reading Roadmap

```markdown
### Foundations
1. "Fairness and Machine Learning" (Barocas, Hardt, Narayanan)
2. "Datasheets for Datasets" (Gebru et al., 2021)
3. "Model Cards for Model Reporting" (Mitchell et al., 2019)

### Fairness
4. "On Fairness and Calibration" (Pleiss et al., 2017)
5. "Fairness Through Awareness" (Dwork et al., 2012)

### Explainability
6. "A Unified Approach to Interpreting Model Predictions" (SHAP)
7. "Why Should I Trust You?" (LIME, Ribeiro et al., 2016)

### Safety
8. "Constitutional AI" (Bai et al., 2022)
9. "Red Teaming Language Models" (Perez et al., 2022)
10. "Scaling Monosemanticity" (Anthropic, 2024)
```

## Use Cases

1. **Bias auditing**: Check models for demographic biases
2. **Compliance**: EU AI Act and regulatory requirements
3. **Model documentation**: Model cards and impact assessments
4. **Research ethics**: Ethical considerations for AI research
5. **Course material**: Teach responsible AI principles

## References

- [AwesomeResponsibleAI](https://github.com/AthenaCore/AwesomeResponsibleAI)
- [Fairlearn](https://fairlearn.org/)
- [EU AI Act](https://artificialintelligenceact.eu/)