langsmith-dataset

INVOKE THIS SKILL when creating evaluation datasets, uploading datasets to LangSmith, or managing existing datasets. Covers dataset types (final_response, single_step, trajectory, RAG), CLI management commands, SDK-based creation, and example management. Uses the langsmith CLI tool.

25 stars

Best use case

langsmith-dataset is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

INVOKE THIS SKILL when creating evaluation datasets, uploading datasets to LangSmith, or managing existing datasets. Covers dataset types (final_response, single_step, trajectory, RAG), CLI management commands, SDK-based creation, and example management. Uses the langsmith CLI tool.

Teams using langsmith-dataset should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/langsmith-dataset/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/Harmeet10000/skills/langsmith-dataset/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/langsmith-dataset/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How langsmith-dataset Compares

Feature / Agentlangsmith-datasetStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

INVOKE THIS SKILL when creating evaluation datasets, uploading datasets to LangSmith, or managing existing datasets. Covers dataset types (final_response, single_step, trajectory, RAG), CLI management commands, SDK-based creation, and example management. Uses the langsmith CLI tool.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

<oneliner>
Create, manage, and upload evaluation datasets to LangSmith for testing and validation.
</oneliner>

<setup>
Environment Variables

```bash
LANGSMITH_API_KEY=lsv2_pt_your_api_key_here          # Required
LANGSMITH_PROJECT=your-project-name                   # Check this to know which project has traces
LANGSMITH_WORKSPACE_ID=your-workspace-id              # Optional: for org-scoped keys
```

**IMPORTANT:** Always check the environment variables or `.env` file for `LANGSMITH_PROJECT` before querying or interacting with LangSmith. This tells you which project contains the relevant traces and data. If the LangSmith project is not available, use your best judgement to identify the right one.

Python Dependencies
```bash
pip install langsmith
```

JavaScript Dependencies
```bash
npm install langsmith
```

CLI Tool

```bash
curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh
```
</setup>

<usage>
Use the `langsmith` CLI to manage datasets and examples.

### Dataset Commands

- `langsmith dataset list` - List datasets in LangSmith
- `langsmith dataset get <name-or-id>` - View dataset details
- `langsmith dataset create --name <name>` - Create a new empty dataset
- `langsmith dataset delete <name-or-id>` - Delete a dataset
- `langsmith dataset export <name-or-id> <output-file>` - Export dataset to local JSON file
- `langsmith dataset upload <file> --name <name>` - Upload a local JSON file as a dataset

### Example Commands

- `langsmith example list --dataset <name>` - List examples in a dataset
- `langsmith example create --dataset <name> --inputs <json>` - Add an example to a dataset
- `langsmith example delete <example-id>` - Delete an example

### Experiment Commands

- `langsmith experiment list --dataset <name>` - List experiments for a dataset
- `langsmith experiment get <name>` - View experiment results

### Common Flags

- `--limit N` - Limit number of results
- `--yes` - Skip confirmation prompts (use with caution)

**IMPORTANT - Safety Prompts:**
- The CLI prompts for confirmation before destructive operations (delete, overwrite)
- **If you are running with user input:** ALWAYS wait for user input; NEVER use `--yes` unless the user explicitly requests it
- **If you are running non-interactively:** Use `--yes` to skip confirmation prompts
</usage>

<dataset_types_overview>
Common evaluation dataset types:

- **final_response** - Full conversation with expected output. Tests complete agent behavior.
- **single_step** - Single node inputs/outputs. Tests specific node behavior (e.g., one LLM call or tool).
- **trajectory** - Tool call sequence. Tests execution path (ordered list of tool names).
- **rag** - Question/chunks/answer/citations. Tests retrieval quality.
</dataset_types_overview>

<creating_datasets>
## Creating Datasets

Datasets are JSON files with an array of examples. Each example has `inputs` and `outputs`.

### From Exported Traces (Programmatic)

Export traces first, then process them into dataset format using code:

```bash
# 1. Export traces to JSONL files
langsmith trace export ./traces --project my-project --limit 20 --full
```

<python>
```python
import json
from pathlib import Path
from langsmith import Client

client = Client()

# 2. Process traces into dataset examples
examples = []
for jsonl_file in Path("./traces").glob("*.jsonl"):
    runs = [json.loads(line) for line in jsonl_file.read_text().strip().split("\n")]
    root = next((r for r in runs if r.get("parent_run_id") is None), None)
    if root and root.get("inputs") and root.get("outputs"):
        examples.append({
            "trace_id": root.get("trace_id"),
            "inputs": root["inputs"],
            "outputs": root["outputs"]
        })

# 3. Save locally
with open("/tmp/dataset.json", "w") as f:
    json.dump(examples, f, indent=2)
```
</python>

<typescript>
```typescript
import { Client } from "langsmith";
import { readFileSync, writeFileSync, readdirSync } from "fs";
import { join } from "path";

const client = new Client();

// 2. Process traces into dataset examples
const examples: Array<{trace_id?: string, inputs: Record<string, any>, outputs: Record<string, any>}> = [];
const files = readdirSync("./traces").filter(f => f.endsWith(".jsonl"));

for (const file of files) {
  const lines = readFileSync(join("./traces", file), "utf-8").trim().split("\n");
  const runs = lines.map(line => JSON.parse(line));
  const root = runs.find(r => r.parent_run_id == null);
  if (root?.inputs && root?.outputs) {
    examples.push({ trace_id: root.trace_id, inputs: root.inputs, outputs: root.outputs });
  }
}

// 3. Save locally
writeFileSync("/tmp/dataset.json", JSON.stringify(examples, null, 2));
```
</typescript>

### Upload to LangSmith

```bash
# Upload local JSON file as a dataset
langsmith dataset upload /tmp/dataset.json --name "My Evaluation Dataset"
```

### Using the SDK Directly

<python>
```python
from langsmith import Client

client = Client()

# Create dataset and add examples in one step
dataset = client.create_dataset("My Dataset", description="Evaluation dataset")

client.create_examples(
    inputs=[{"query": "What is AI?"}, {"query": "Explain RAG"}],
    outputs=[{"answer": "AI is..."}, {"answer": "RAG is..."}],
    dataset_name="My Dataset",
)
```
</python>

<typescript>
```typescript
import { Client } from "langsmith";

const client = new Client();

// Create dataset and add examples
const dataset = await client.createDataset("My Dataset", {
  description: "Evaluation dataset",
});

await client.createExamples({
  inputs: [{ query: "What is AI?" }, { query: "Explain RAG" }],
  outputs: [{ answer: "AI is..." }, { answer: "RAG is..." }],
  datasetName: "My Dataset",
});
```
</typescript>
</creating_datasets>

<dataset_structures>
## Dataset Structures by Type

### Final Response
```json
{"trace_id": "...", "inputs": {"query": "What are the top genres?"}, "outputs": {"response": "The top genres are..."}}
```

### Single Step
```json
{"trace_id": "...", "inputs": {"messages": [...]}, "outputs": {"content": "..."}, "metadata": {"node_name": "model"}}
```

### Trajectory
```json
{"trace_id": "...", "inputs": {"query": "..."}, "outputs": {"expected_trajectory": ["tool_a", "tool_b", "tool_c"]}}
```

### RAG
```json
{"trace_id": "...", "inputs": {"question": "How do I..."}, "outputs": {"answer": "...", "retrieved_chunks": ["..."], "cited_chunks": ["..."]}}
```
</dataset_structures>

<script_usage>
## CLI Usage

```bash
# List all datasets
langsmith dataset list

# Get dataset details
langsmith dataset get "My Dataset"

# Create an empty dataset
langsmith dataset create --name "New Dataset" --description "For evaluation"

# Upload a local JSON file
langsmith dataset upload /tmp/dataset.json --name "My Dataset"

# Export a dataset to local file
langsmith dataset export "My Dataset" /tmp/exported.json --limit 100

# Delete a dataset
langsmith dataset delete "My Dataset"

# List examples in a dataset
langsmith example list --dataset "My Dataset" --limit 10

# Add an example
langsmith example create --dataset "My Dataset" \
  --inputs '{"query": "test"}' \
  --outputs '{"answer": "result"}'

# List experiments
langsmith experiment list --dataset "My Dataset"
langsmith experiment get "eval-v1"
```
</script_usage>

<example_workflow>
Complete workflow from traces to uploaded LangSmith dataset:

```bash
# 1. Export traces from LangSmith
langsmith trace export ./traces --project my-project --limit 20 --full

# 2. Process traces into dataset format (using Python/JS code)
# See "Creating Datasets" section above

# 3. Upload to LangSmith
langsmith dataset upload /tmp/final_response.json --name "Skills: Final Response"
langsmith dataset upload /tmp/trajectory.json --name "Skills: Trajectory"

# 4. Verify upload
langsmith dataset list
langsmith dataset get "Skills: Final Response"
langsmith example list --dataset "Skills: Final Response" --limit 3

# 5. Run experiments
langsmith experiment list --dataset "Skills: Final Response"
```
</example_workflow>

<troubleshooting>
**Dataset upload fails:**
- Verify LANGSMITH_API_KEY is set
- Check JSON file is valid: each element needs `inputs` (and optionally `outputs`)
- Dataset name must be unique, or delete existing first with `langsmith dataset delete`

**Empty dataset after upload:**
- Verify JSON file contains an array of objects with `inputs` key
- Check file isn't empty: `langsmith example list --dataset "Name"`

**Export has no data:**
- Ensure traces were exported with `--full` flag to include inputs/outputs
- Verify traces have both `inputs` and `outputs` populated

**Example count mismatch:**
- Use `langsmith dataset get "Name"` to check remote count
- Compare with local file to verify upload completeness
</troubleshooting>
</output>

Related Skills

fiftyone-dataset-inference

25
from ComeOnOliver/skillshub

Create a FiftyOne dataset from a directory of media files (images, videos, point clouds), optionally import labels in common formats (COCO, YOLO, VOC), run model inference, and store predictions. Use when users want to load local files into FiftyOne, apply ML models for detection, classification, or segmentation, or build end-to-end inference pipelines.

fiftyone-dataset-import

25
from ComeOnOliver/skillshub

Universal dataset import for FiftyOne supporting all media types (images, videos, point clouds, 3D scenes), all label formats (COCO, YOLO, VOC, CVAT, KITTI, etc.), and multimodal grouped datasets. Use when users want to import any dataset regardless of format, automatically detect folder structure, handle autonomous driving data with multiple cameras and LiDAR, or create grouped datasets from multimodal data. Requires FiftyOne MCP server.

LangSmith

25
from ComeOnOliver/skillshub

## Overview

Azure Open Datasets Skill

25
from ComeOnOliver/skillshub

This skill provides expert guidance for Azure Open Datasets. Covers limits & quotas. It combines local quick-reference content with remote documentation fetching capabilities.

langsmith-trace

25
from ComeOnOliver/skillshub

INVOKE THIS SKILL when working with LangSmith tracing OR querying traces. Covers adding tracing to applications and querying/exporting trace data. Uses the langsmith CLI tool.

langsmith-evaluator

25
from ComeOnOliver/skillshub

INVOKE THIS SKILL when building evaluation pipelines for LangSmith. Covers three core components: (1) Creating Evaluators - LLM-as-Judge, custom code; (2) Defining Run Functions - how to capture outputs and trajectories from your agent; (3) Running Evaluations - locally with evaluate() or auto-run via LangSmith. Uses the langsmith CLI tool.

langsmith-fetch

25
from ComeOnOliver/skillshub

Debug LangChain and LangGraph agents by fetching execution traces from LangSmith Studio. Use when debugging agent behavior, investigating errors, analyzing tool calls, checking memory operations, or examining agent performance. Automatically fetches recent traces and analyzes execution patterns. Requires langsmith-fetch CLI installed.

Role Skill Wrapper

25
from ComeOnOliver/skillshub

当前文件是 Manus 的 role skill 入口。

Semantic Scholar API Skill

25
from ComeOnOliver/skillshub

## 功能描述

Paper Summary & Review Skill

25
from ComeOnOliver/skillshub

## 功能描述

paper-expert-generator

25
from ComeOnOliver/skillshub

Generate a specialized domain-expert research agent modeled on PaperClaw architecture. Use this skill when a user wants to create an AI agent that can automatically search, filter, summarize, and evaluate academic papers in a specific research field. Trigger phrases include help me create a paper tracking agent for my field, I want an agent to monitor latest papers in bioinformatics, build me a paper review agent for computer vision, create a PaperClaw-style agent for my domain, generate a domain-specific paper expert agent. The generated agent is a complete OpenClaw agent with all required skills (arxiv-search, semantic-scholar, paper-review, daily-search, weekly-report) fully adapted for the target domain.

Daily Paper Search Skill

25
from ComeOnOliver/skillshub

## 功能描述