cohere-data-handling

Implement data privacy for Cohere API calls with PII redaction and compliance. Use when handling sensitive data, implementing PII redaction before API calls, or ensuring GDPR/CCPA compliance with Cohere integrations. Trigger with phrases like "cohere data", "cohere PII", "cohere GDPR", "cohere data retention", "cohere privacy", "cohere redact".

25 stars

byComeOnOliver

View on GitHub Installation ↓

Best use case

cohere-data-handling is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using cohere-data-handling should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/cohere-data-handling/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/jeremylongshore/claude-code-plugins-plus-skills/cohere-data-handling/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/cohere-data-handling/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How cohere-data-handling Compares

Feature / Agent	cohere-data-handling	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Cohere Data Handling

## Overview
Handle sensitive data when calling Cohere APIs. Cohere processes text server-side for Chat, Embed, Rerank, and Classify — any PII in your input reaches their servers. This skill covers pre-call redaction, post-call scrubbing, and compliance patterns.

## Prerequisites
- Understanding of GDPR/CCPA requirements
- `cohere-ai` SDK installed
- Database for audit logging

## Data Flow Awareness

```
Your App → [PII Redaction] → Cohere API → [Response Scrubbing] → Your App → User

Key point: Everything you send to cohere.chat(), cohere.embed(), etc.
is processed on Cohere's servers. Redact BEFORE the API call.
```

## Instructions

### Step 1: PII Detection

```typescript
interface PIIFinding {
  type: string;
  match: string;
  start: number;
  end: number;
}

const PII_PATTERNS: Array<{ type: string; regex: RegExp }> = [
  { type: 'email', regex: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g },
  { type: 'phone', regex: /\b(\+\d{1,3}[-.]?)?\d{3}[-.]?\d{3}[-.]?\d{4}\b/g },
  { type: 'ssn', regex: /\b\d{3}-\d{2}-\d{4}\b/g },
  { type: 'credit_card', regex: /\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b/g },
  { type: 'ip_address', regex: /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/g },
];

function detectPII(text: string): PIIFinding[] {
  const findings: PIIFinding[] = [];
  for (const { type, regex } of PII_PATTERNS) {
    for (const match of text.matchAll(new RegExp(regex))) {
      findings.push({
        type,
        match: match[0],
        start: match.index!,
        end: match.index! + match[0].length,
      });
    }
  }
  return findings;
}
```

### Step 2: Pre-Call Redaction

```typescript
function redactPII(text: string): { redacted: string; map: Map<string, string> } {
  const map = new Map<string, string>();
  let redacted = text;
  let counter = 0;

  for (const { type, regex } of PII_PATTERNS) {
    redacted = redacted.replace(new RegExp(regex), (match) => {
      const placeholder = `[${type.toUpperCase()}_${counter++}]`;
      map.set(placeholder, match);
      return placeholder;
    });
  }

  return { redacted, map };
}

// Usage: redact before sending to Cohere
async function safeCohereChat(userInput: string) {
  const { redacted, map } = redactPII(userInput);

  const response = await cohere.chat({
    model: 'command-a-03-2025',
    messages: [{ role: 'user', content: redacted }],
  });

  // Optionally restore PII in response (for internal use only)
  let answer = response.message?.content?.[0]?.text ?? '';
  for (const [placeholder, original] of map) {
    answer = answer.replace(placeholder, original);
  }

  return answer;
}
```

### Step 3: Safe Embedding

```typescript
// Embeddings are stored long-term in vector DBs — ensure no PII
async function safeEmbed(texts: string[]): Promise<number[][]> {
  // Check for PII before embedding
  for (const text of texts) {
    const pii = detectPII(text);
    if (pii.length > 0) {
      console.warn(`PII detected in embed input: ${pii.map(p => p.type).join(', ')}`);
      // Option 1: Redact and embed
      // Option 2: Reject and throw
      throw new Error(`PII found in embedding input: ${pii.map(p => p.type).join(', ')}`);
    }
  }

  return cohere.embed({
    model: 'embed-v4.0',
    texts,
    inputType: 'search_document',
    embeddingTypes: ['float'],
  }).then(r => r.embeddings.float);
}
```

### Step 4: Classify with Data Minimization

```typescript
// Classify endpoint receives text + examples — minimize both
async function safeClassify(inputs: string[]) {
  // Redact PII from classification inputs
  const safeInputs = inputs.map(text => redactPII(text).redacted);

  return cohere.classify({
    model: 'embed-english-v3.0',
    inputs: safeInputs,
    examples: [
      // Examples should never contain real PII
      { text: 'This product is great', label: 'positive' },
      { text: 'Amazing experience', label: 'positive' },
      { text: 'Terrible service', label: 'negative' },
      { text: 'Very disappointed', label: 'negative' },
    ],
  });
}
```

### Step 5: Audit Logging

```typescript
interface CohereAuditEntry {
  timestamp: Date;
  endpoint: string;
  model: string;
  piiDetected: string[];
  redacted: boolean;
  tokensUsed: { input: number; output: number };
  userId?: string;
}

async function auditCohereCall(entry: CohereAuditEntry): Promise<void> {
  // Log to database (not console — structured storage)
  await db.cohereAudit.insert({
    ...entry,
    // Never log the actual API input/output — only metadata
  });
}

// Usage
async function auditedChat(userId: string, message: string) {
  const pii = detectPII(message);
  const { redacted } = redactPII(message);

  const response = await cohere.chat({
    model: 'command-a-03-2025',
    messages: [{ role: 'user', content: redacted }],
  });

  await auditCohereCall({
    timestamp: new Date(),
    endpoint: 'chat',
    model: 'command-a-03-2025',
    piiDetected: pii.map(p => p.type),
    redacted: pii.length > 0,
    tokensUsed: {
      input: response.usage?.billedUnits?.inputTokens ?? 0,
      output: response.usage?.billedUnits?.outputTokens ?? 0,
    },
    userId,
  });

  return response;
}
```

### Step 6: Safety Modes for Content Filtering

```typescript
// Cohere's built-in safety modes (separate from PII — these handle harmful content)
await cohere.chat({
  model: 'command-a-03-2025',
  messages: [{ role: 'user', content: userInput }],
  safetyMode: 'STRICT',  // Maximum content filtering
  // Options: 'CONTEXTUAL' (default), 'STRICT', 'OFF'
  // Note: Not configurable when using tools or documents
});
```

## Data Retention Guidelines

| Data | Retention | Action |
|------|-----------|--------|
| API request logs (redacted) | 30 days | Auto-delete |
| Audit entries | 7 years | Archive to cold storage |
| Cached embeddings | Until source changes | Invalidate on update |
| Cohere API responses | Do not persist | Process in memory only |
| PII mappings | Per-request only | Never persist |

## Compliance Checklist

- [ ] PII redacted before all Cohere API calls
- [ ] Embeddings verified PII-free before vector DB storage
- [ ] Audit trail for all API calls with PII metadata
- [ ] Safety mode set to STRICT for user-facing applications
- [ ] API responses not persisted (processed in memory)
- [ ] Data retention policy enforced with automated cleanup
- [ ] Classify examples use synthetic data (no real PII)

## Output
- PII detection and redaction pipeline
- Safe wrappers for Chat, Embed, and Classify
- Audit logging with PII metadata (not content)
- Data retention policy with automated cleanup

## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| PII in embeddings | Missing pre-check | Add detectPII before embed |
| Redaction breaks context | Over-aggressive regex | Use domain-specific patterns |
| Audit gap | Async logging failed | Use sync fallback |
| Safety mode ignored | Used with tools/docs | Separate safety from RAG calls |

## Resources
- [Cohere Safety Modes](https://docs.cohere.com/docs/safety-modes)
- [Cohere Privacy Policy](https://cohere.com/privacy)
- [GDPR Developer Guide](https://gdpr.eu/developers/)

## Next Steps
For enterprise access control, see `cohere-enterprise-rbac`.

Related Skills

College Football Data (CFB)

from ComeOnOliver/skillshub

Before writing queries, consult `references/api-reference.md` for endpoints, conference IDs, team IDs, and data shapes.

College Basketball Data (CBB)

from ComeOnOliver/skillshub

Before writing queries, consult `references/api-reference.md` for endpoints, conference IDs, team IDs, and data shapes.

validating-database-integrity

from ComeOnOliver/skillshub

Process use when you need to ensure database integrity through comprehensive data validation. This skill validates data types, ranges, formats, referential integrity, and business rules. Trigger with phrases like "validate database data", "implement data validation rules", "enforce data integrity constraints", or "validate data formats".

forecasting-time-series-data

from ComeOnOliver/skillshub

This skill enables Claude to forecast future values based on historical time series data. It analyzes time-dependent data to identify trends, seasonality, and other patterns. Use this skill when the user asks to predict future values of a time series, analyze trends in data over time, or requires insights into time-dependent data. Trigger terms include "forecast," "predict," "time series analysis," "future values," and requests involving temporal data.

generating-test-data

from ComeOnOliver/skillshub

This skill enables Claude to generate realistic test data for software development. It uses the test-data-generator plugin to create users, products, orders, and custom schemas for comprehensive testing. Use this skill when you need to populate databases, simulate user behavior, or create fixtures for automated tests. Trigger phrases include "generate test data", "create fake users", "populate database", "generate product data", "create test orders", or "generate data based on schema". This skill is especially useful for populating testing environments or creating sample data for demonstrations.

test-data-builder

from ComeOnOliver/skillshub

Test Data Builder - Auto-activating skill for Test Automation. Triggers on: test data builder, test data builder Part of the Test Automation skill category.

splitting-datasets

from ComeOnOliver/skillshub

Process split datasets into training, validation, and testing sets for ML model development. Use when requesting "split dataset", "train-test split", or "data partitioning". Trigger with relevant phrases based on skill purpose.

scanning-database-security

from ComeOnOliver/skillshub

Process use when you need to work with security and compliance. This skill provides security scanning and vulnerability detection with comprehensive guidance and automation. Trigger with phrases like "scan for vulnerabilities", "implement security controls", or "audit security".

preprocessing-data-with-automated-pipelines

from ComeOnOliver/skillshub

Process automate data cleaning, transformation, and validation for ML tasks. Use when requesting "preprocess data", "clean data", "ETL pipeline", or "data transformation". Trigger with relevant phrases based on skill purpose.

optimizing-database-connection-pooling

from ComeOnOliver/skillshub

Process use when you need to work with connection management. This skill provides connection pooling and management with comprehensive guidance and automation. Trigger with phrases like "manage connections", "configure pooling", or "optimize connection usage".

modeling-nosql-data

from ComeOnOliver/skillshub

This skill enables Claude to design NoSQL data models. It activates when the user requests assistance with NoSQL database design, including schema creation, data modeling for MongoDB or DynamoDB, or defining document structures. Use this skill when the user mentions "NoSQL data model", "design MongoDB schema", "create DynamoDB table", or similar phrases related to NoSQL database architecture. It assists in understanding NoSQL modeling principles like embedding vs. referencing, access pattern optimization, and sharding key selection.

monitoring-database-transactions

from ComeOnOliver/skillshub

Monitor use when you need to work with monitoring and observability. This skill provides health monitoring and alerting with comprehensive guidance and automation. Trigger with phrases like "monitor system health", "set up alerts", or "track metrics".