agentdb-reinforcement-learning-training
Train AI agents using AgentDB's 9 reinforcement learning algorithms including Q-Learning, DQN, PPO, and Actor-Critic. Build self-learning agents, implement RL training loops with experience replay, and deploy optimized models to production.
Best use case
agentdb-reinforcement-learning-training is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Train AI agents using AgentDB's 9 reinforcement learning algorithms including Q-Learning, DQN, PPO, and Actor-Critic. Build self-learning agents, implement RL training loops with experience replay, and deploy optimized models to production.
Teams using agentdb-reinforcement-learning-training should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/agentdb-reinforcement-learning-training/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How agentdb-reinforcement-learning-training Compares
| Feature / Agent | agentdb-reinforcement-learning-training | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Train AI agents using AgentDB's 9 reinforcement learning algorithms including Q-Learning, DQN, PPO, and Actor-Critic. Build self-learning agents, implement RL training loops with experience replay, and deploy optimized models to production.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# AgentDB Reinforcement Learning Training
## Overview
Train AI learning plugins with AgentDB's 9 reinforcement learning algorithms including Decision Transformer, Q-Learning, SARSA, Actor-Critic, PPO, and more. Build self-learning agents, implement RL, and optimize agent behavior through experience.
## When to Use This Skill
Use this skill when you need to:
- Train autonomous agents that learn from experience
- Implement reinforcement learning systems
- Optimize agent behavior through trial and error
- Build self-improving AI systems
- Deploy RL agents in production environments
- Benchmark and compare RL algorithms
## Available RL Algorithms
1. **Q-Learning** - Value-based, off-policy
2. **SARSA** - Value-based, on-policy
3. **Deep Q-Network (DQN)** - Deep RL with experience replay
4. **Actor-Critic** - Policy gradient with value baseline
5. **Proximal Policy Optimization (PPO)** - Trust region policy optimization
6. **Decision Transformer** - Offline RL with transformers
7. **Advantage Actor-Critic (A2C)** - Synchronous advantage estimation
8. **Twin Delayed DDPG (TD3)** - Continuous control
9. **Soft Actor-Critic (SAC)** - Maximum entropy RL
## SOP Framework: 5-Phase RL Training Deployment
### Phase 1: Initialize Learning Environment (1-2 hours)
**Objective:** Setup AgentDB learning infrastructure with environment configuration
**Agent:** ml-developer
**Steps:**
1. **Install AgentDB Learning Module**
```bash
npm install agentdb-learning@latest
npm install @agentdb/rl-algorithms @agentdb/environments
```
2. **Initialize learning database**
```typescript
import { AgentDB, LearningPlugin } from 'agentdb-learning';
const learningDB = new AgentDB({
name: 'rl-training-db',
dimensions: 512, // State embedding dimension
learning: {
enabled: true,
persistExperience: true,
replayBufferSize: 100000
}
});
await learningDB.initialize();
// Create learning plugin
const learningPlugin = new LearningPlugin({
database: learningDB,
algorithms: ['q-learning', 'dqn', 'ppo', 'actor-critic'],
config: {
batchSize: 64,
learningRate: 0.001,
discountFactor: 0.99,
explorationRate: 1.0,
explorationDecay: 0.995
}
});
await learningPlugin.initialize();
```
3. **Define environment**
```typescript
import { Environment } from '@agentdb/environments';
const environment = new Environment({
name: 'grid-world',
stateSpace: {
type: 'continuous',
shape: [10, 10],
bounds: [[0, 10], [0, 10]]
},
actionSpace: {
type: 'discrete',
actions: ['up', 'down', 'left', 'right']
},
rewardFunction: (state, action, nextState) => {
// Distance to goal reward
const goalDistance = Math.sqrt(
Math.pow(nextState[0] - 9, 2) +
Math.pow(nextState[1] - 9, 2)
);
return -goalDistance + (goalDistance === 0 ? 100 : 0);
},
terminalCondition: (state) => {
return state[0] === 9 && state[1] === 9; // Reached goal
}
});
await environment.initialize();
```
4. **Setup monitoring**
```typescript
const monitor = learningPlugin.createMonitor({
metrics: ['reward', 'loss', 'exploration-rate', 'episode-length'],
logInterval: 100, // Log every 100 episodes
saveCheckpoints: true,
checkpointInterval: 1000
});
monitor.on('episode-complete', (episode) => {
console.log('Episode:', episode.number, 'Reward:', episode.totalReward);
});
```
**Memory Pattern:**
```typescript
await agentDB.memory.store('agentdb/learning/environment', {
name: environment.name,
stateSpace: environment.stateSpace,
actionSpace: environment.actionSpace,
initialized: Date.now()
});
```
**Validation:**
- Learning database initialized
- Environment configured and tested
- Monitor capturing metrics
- Configuration stored in memory
### Phase 2: Configure RL Algorithm (1-2 hours)
**Objective:** Select and configure RL algorithm for the learning task
**Agent:** ml-developer
**Steps:**
1. **Select algorithm**
```typescript
// Example: Deep Q-Network (DQN)
const dqnAgent = learningPlugin.createAgent({
algorithm: 'dqn',
config: {
networkArchitecture: {
layers: [
{ type: 'dense', units: 128, activation: 'relu' },
{ type: 'dense', units: 128, activation: 'relu' },
{ type: 'dense', units: environment.actionSpace.size, activation: 'linear' }
]
},
learningRate: 0.001,
batchSize: 64,
replayBuffer: {
size: 100000,
prioritized: true,
alpha: 0.6,
beta: 0.4
},
targetNetwork: {
updateFrequency: 1000,
tauSync: 0.001 // Soft update
},
exploration: {
initial: 1.0,
final: 0.01,
decay: 0.995
},
training: {
startAfter: 1000, // Start training after 1000 experiences
updateFrequency: 4
}
}
});
await dqnAgent.initialize();
```
2. **Configure hyperparameters**
```typescript
const hyperparameters = {
// Learning parameters
learningRate: 0.001,
discountFactor: 0.99, // Gamma
batchSize: 64,
// Exploration
epsilonStart: 1.0,
epsilonEnd: 0.01,
epsilonDecay: 0.995,
// Experience replay
replayBufferSize: 100000,
minReplaySize: 1000,
prioritizedReplay: true,
// Training
maxEpisodes: 10000,
maxStepsPerEpisode: 1000,
targetUpdateFrequency: 1000,
// Evaluation
evalFrequency: 100,
evalEpisodes: 10
};
dqnAgent.setHyperparameters(hyperparameters);
```
3. **Setup experience replay**
```typescript
import { PrioritizedReplayBuffer } from '@agentdb/rl-algorithms';
const replayBuffer = new PrioritizedReplayBuffer({
capacity: 100000,
alpha: 0.6, // Prioritization exponent
beta: 0.4, // Importance sampling
betaIncrement: 0.001,
epsilon: 0.01 // Small constant for stability
});
dqnAgent.setReplayBuffer(replayBuffer);
```
4. **Configure training loop**
```typescript
const trainingConfig = {
episodes: 10000,
stepsPerEpisode: 1000,
warmupSteps: 1000,
trainFrequency: 4,
targetUpdateFrequency: 1000,
saveFrequency: 1000,
evalFrequency: 100,
earlyStoppingPatience: 500,
earlyStoppingThreshold: 0.01
};
dqnAgent.setTrainingConfig(trainingConfig);
```
**Memory Pattern:**
```typescript
await agentDB.memory.store('agentdb/learning/algorithm-config', {
algorithm: 'dqn',
hyperparameters: hyperparameters,
trainingConfig: trainingConfig,
configured: Date.now()
});
```
**Validation:**
- Algorithm selected and configured
- Hyperparameters validated
- Replay buffer initialized
- Training config set
### Phase 3: Train Agents (3-4 hours)
**Objective:** Execute training iterations and optimize agent behavior
**Agent:** safla-neural
**Steps:**
1. **Start training loop**
```typescript
async function trainAgent() {
console.log('Starting RL training...');
const trainingStats = {
episodes: [],
totalReward: [],
episodeLength: [],
loss: [],
explorationRate: []
};
for (let episode = 0; episode < trainingConfig.episodes; episode++) {
let state = await environment.reset();
let episodeReward = 0;
let episodeLength = 0;
let episodeLoss = 0;
for (let step = 0; step < trainingConfig.stepsPerEpisode; step++) {
// Select action
const action = await dqnAgent.selectAction(state, {
explore: true
});
// Execute action
const { nextState, reward, done } = await environment.step(action);
// Store experience
await dqnAgent.storeExperience({
state,
action,
reward,
nextState,
done
});
// Train if enough experiences
if (dqnAgent.canTrain()) {
const loss = await dqnAgent.train();
episodeLoss += loss;
}
episodeReward += reward;
episodeLength += 1;
state = nextState;
if (done) break;
}
// Update target network
if (episode % trainingConfig.targetUpdateFrequency === 0) {
await dqnAgent.updateTargetNetwork();
}
// Decay exploration
dqnAgent.decayExploration();
// Log progress
trainingStats.episodes.push(episode);
trainingStats.totalReward.push(episodeReward);
trainingStats.episodeLength.push(episodeLength);
trainingStats.loss.push(episodeLoss / episodeLength);
trainingStats.explorationRate.push(dqnAgent.getExplorationRate());
if (episode % 100 === 0) {
console.log(`Episode ${episode}:`, {
reward: episodeReward.toFixed(2),
length: episodeLength,
loss: (episodeLoss / episodeLength).toFixed(4),
epsilon: dqnAgent.getExplorationRate().toFixed(3)
});
}
// Save checkpoint
if (episode % trainingConfig.saveFrequency === 0) {
await dqnAgent.save(`checkpoint-${episode}`);
}
// Evaluate
if (episode % trainingConfig.evalFrequency === 0) {
const evalReward = await evaluateAgent(dqnAgent, environment);
console.log(`Evaluation at episode ${episode}: ${evalReward.toFixed(2)}`);
}
// Early stopping
if (checkEarlyStopping(trainingStats, episode)) {
console.log('Early stopping triggered');
break;
}
}
return trainingStats;
}
const trainingStats = await trainAgent();
```
2. **Monitor training progress**
```typescript
monitor.on('training-update', (stats) => {
// Calculate moving averages
const window = 100;
const recentRewards = stats.totalReward.slice(-window);
const avgReward = recentRewards.reduce((a, b) => a + b, 0) / recentRewards.length;
// Store metrics
agentDB.memory.store('agentdb/learning/training-progress', {
episode: stats.episodes[stats.episodes.length - 1],
avgReward: avgReward,
explorationRate: stats.explorationRate[stats.explorationRate.length - 1],
timestamp: Date.now()
});
// Plot learning curve (if visualization enabled)
if (monitor.visualization) {
monitor.plot('reward-curve', stats.episodes, stats.totalReward);
monitor.plot('loss-curve', stats.episodes, stats.loss);
}
});
```
3. **Handle convergence**
```typescript
function checkConvergence(stats, windowSize = 100, threshold = 0.01) {
if (stats.totalReward.length < windowSize * 2) {
return false;
}
const recent = stats.totalReward.slice(-windowSize);
const previous = stats.totalReward.slice(-windowSize * 2, -windowSize);
const recentAvg = recent.reduce((a, b) => a + b, 0) / recent.length;
const previousAvg = previous.reduce((a, b) => a + b, 0) / previous.length;
const improvement = (recentAvg - previousAvg) / Math.abs(previousAvg);
return improvement < threshold;
}
```
4. **Save trained model**
```typescript
await dqnAgent.save('trained-agent-final', {
includeReplayBuffer: false,
includeOptimizer: false,
metadata: {
trainingStats: trainingStats,
hyperparameters: hyperparameters,
finalReward: trainingStats.totalReward[trainingStats.totalReward.length - 1]
}
});
console.log('Training complete. Model saved.');
```
**Memory Pattern:**
```typescript
await agentDB.memory.store('agentdb/learning/training-results', {
algorithm: 'dqn',
episodes: trainingStats.episodes.length,
finalReward: trainingStats.totalReward[trainingStats.totalReward.length - 1],
converged: checkConvergence(trainingStats),
modelPath: 'trained-agent-final',
timestamp: Date.now()
});
```
**Validation:**
- Training completed or converged
- Reward curve shows improvement
- Model saved successfully
- Training stats stored
### Phase 4: Validate Performance (1-2 hours)
**Objective:** Benchmark trained agent and validate performance
**Agent:** performance-benchmarker
**Steps:**
1. **Load trained agent**
```typescript
const trainedAgent = await learningPlugin.loadAgent('trained-agent-final');
```
2. **Run evaluation episodes**
```typescript
async function evaluateAgent(agent, env, numEpisodes = 100) {
const results = {
rewards: [],
episodeLengths: [],
successRate: 0
};
for (let i = 0; i < numEpisodes; i++) {
let state = await env.reset();
let episodeReward = 0;
let episodeLength = 0;
let success = false;
for (let step = 0; step < 1000; step++) {
const action = await agent.selectAction(state, { explore: false });
const { nextState, reward, done } = await env.step(action);
episodeReward += reward;
episodeLength += 1;
state = nextState;
if (done) {
success = env.isSuccessful(state);
break;
}
}
results.rewards.push(episodeReward);
results.episodeLengths.push(episodeLength);
if (success) results.successRate += 1;
}
results.successRate /= numEpisodes;
return {
meanReward: results.rewards.reduce((a, b) => a + b, 0) / results.rewards.length,
stdReward: calculateStd(results.rewards),
meanLength: results.episodeLengths.reduce((a, b) => a + b, 0) / results.episodeLengths.length,
successRate: results.successRate,
results: results
};
}
const evalResults = await evaluateAgent(trainedAgent, environment, 100);
console.log('Evaluation results:', evalResults);
```
3. **Compare with baseline**
```typescript
// Random policy baseline
const randomAgent = learningPlugin.createAgent({ algorithm: 'random' });
const randomResults = await evaluateAgent(randomAgent, environment, 100);
// Calculate improvement
const improvement = {
rewardImprovement: (evalResults.meanReward - randomResults.meanReward) / Math.abs(randomResults.meanReward),
lengthImprovement: (randomResults.meanLength - evalResults.meanLength) / randomResults.meanLength,
successImprovement: evalResults.successRate - randomResults.successRate
};
console.log('Improvement over random:', improvement);
```
4. **Run comprehensive benchmarks**
```typescript
const benchmarks = {
performanceMetrics: {
meanReward: evalResults.meanReward,
stdReward: evalResults.stdReward,
successRate: evalResults.successRate,
meanEpisodeLength: evalResults.meanLength
},
algorithmComparison: {
dqn: evalResults,
random: randomResults,
improvement: improvement
},
inferenceTiming: {
actionSelection: 0,
totalEpisode: 0
}
};
// Measure inference speed
const timingTrials = 1000;
const startTime = performance.now();
for (let i = 0; i < timingTrials; i++) {
const state = await environment.randomState();
await trainedAgent.selectAction(state, { explore: false });
}
const endTime = performance.now();
benchmarks.inferenceTiming.actionSelection = (endTime - startTime) / timingTrials;
await agentDB.memory.store('agentdb/learning/benchmarks', benchmarks);
```
**Memory Pattern:**
```typescript
await agentDB.memory.store('agentdb/learning/validation', {
evaluated: true,
meanReward: evalResults.meanReward,
successRate: evalResults.successRate,
improvement: improvement,
timestamp: Date.now()
});
```
**Validation:**
- Evaluation completed (100 episodes)
- Mean reward exceeds threshold
- Success rate acceptable
- Improvement over baseline demonstrated
### Phase 5: Deploy Trained Agents (1-2 hours)
**Objective:** Deploy trained agents to production environment
**Agent:** ml-developer
**Steps:**
1. **Export production model**
```typescript
await trainedAgent.export('production-agent', {
format: 'onnx', // or 'tensorflowjs', 'pytorch'
optimize: true,
quantize: 'int8', // Quantization for faster inference
includeMetadata: true
});
```
2. **Create inference API**
```typescript
import express from 'express';
const app = express();
app.use(express.json());
// Load production agent
const productionAgent = await learningPlugin.loadAgent('production-agent');
app.post('/api/predict', async (req, res) => {
try {
const { state } = req.body;
const action = await productionAgent.selectAction(state, {
explore: false,
returnProbabilities: true
});
res.json({
action: action.action,
probabilities: action.probabilities,
confidence: action.confidence
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});
app.listen(3000, () => {
console.log('RL agent API running on port 3000');
});
```
3. **Setup monitoring**
```typescript
import { ProductionMonitor } from '@agentdb/monitoring';
const prodMonitor = new ProductionMonitor({
agent: productionAgent,
metrics: ['inference-latency', 'action-distribution', 'reward-feedback'],
alerting: {
latencyThreshold: 100, // ms
anomalyDetection: true
}
});
await prodMonitor.start();
```
4. **Create deployment pipeline**
```typescript
const deploymentPipeline = {
stages: [
{
name: 'validation',
steps: [
'Load trained model',
'Run validation suite',
'Check performance metrics',
'Verify inference speed'
]
},
{
name: 'export',
steps: [
'Export to production format',
'Optimize model',
'Quantize weights',
'Package artifacts'
]
},
{
name: 'deployment',
steps: [
'Deploy to staging',
'Run smoke tests',
'Deploy to production',
'Monitor performance'
]
}
]
};
await agentDB.memory.store('agentdb/learning/deployment-pipeline', deploymentPipeline);
```
**Memory Pattern:**
```typescript
await agentDB.memory.store('agentdb/learning/production', {
deployed: true,
modelPath: 'production-agent',
apiEndpoint: 'http://localhost:3000/api/predict',
monitoring: true,
timestamp: Date.now()
});
```
**Validation:**
- Model exported successfully
- API running and responding
- Monitoring active
- Deployment pipeline documented
## Integration Scripts
### Complete Training Script
```bash
#!/bin/bash
# train-rl-agent.sh
set -e
echo "AgentDB RL Training Script"
echo "=========================="
# Phase 1: Initialize
echo "Phase 1: Initializing learning environment..."
npm install agentdb-learning @agentdb/rl-algorithms
# Phase 2: Configure
echo "Phase 2: Configuring algorithm..."
node -e "require('./config-algorithm.js')"
# Phase 3: Train
echo "Phase 3: Training agent..."
node -e "require('./train-agent.js')"
# Phase 4: Validate
echo "Phase 4: Validating performance..."
node -e "require('./evaluate-agent.js')"
# Phase 5: Deploy
echo "Phase 5: Deploying to production..."
node -e "require('./deploy-agent.js')"
echo "Training complete!"
```
### Quick Start Script
```typescript
// quickstart-rl.ts
import { setupRLTraining } from './setup';
async function quickStart() {
console.log('Starting RL training quick setup...');
// Setup
const { learningDB, environment, agent } = await setupRLTraining({
algorithm: 'dqn',
environment: 'grid-world',
episodes: 1000
});
// Train
console.log('Training agent...');
const stats = await agent.train(environment, {
episodes: 1000,
logInterval: 100
});
// Evaluate
console.log('Evaluating agent...');
const results = await agent.evaluate(environment, {
episodes: 100
});
console.log('Results:', results);
// Save
await agent.save('quickstart-agent');
console.log('Quick start complete!');
}
quickStart().catch(console.error);
```
## Evidence-Based Success Criteria
1. **Training Convergence (Self-Consistency)**
- Reward curve stabilizes
- Moving average improvement < 1%
- Agent achieves consistent performance
2. **Performance Benchmarks (Quantitative)**
- Mean reward exceeds baseline by 50%
- Success rate > 80%
- Inference time < 10ms per action
3. **Algorithm Validation (Chain-of-Verification)**
- Hyperparameters validated
- Exploration-exploitation balanced
- Experience replay functioning
4. **Production Readiness (Multi-Agent Consensus)**
- Model exported successfully
- API responds within latency threshold
- Monitoring active and alerting
- Deployment pipeline documented
## Additional Resources
- AgentDB Learning Documentation: https://agentdb.dev/docs/learning
- RL Algorithms Guide: https://agentdb.dev/docs/rl-algorithms
- Training Best Practices: https://agentdb.dev/docs/training
- Production Deployment: https://agentdb.dev/docs/deploymentRelated Skills
adapting-transfer-learning-models
This skill automates the adaptation of pre-trained machine learning models using transfer learning techniques. It is triggered when the user requests assistance with fine-tuning a model, adapting a pre-trained model to a new dataset, or performing transfer learning. It analyzes the user's requirements, generates code for adapting the model, includes data validation and error handling, provides performance metrics, and saves artifacts with documentation. Use this skill when you need to leverage existing models for new tasks or datasets, optimizing for performance and efficiency.
training-machine-learning-models
Build train machine learning models with automated workflows. Analyzes datasets, selects model types (classification, regression), configures parameters, trains with cross-validation, and saves model artifacts. Use when asked to "train model" or "evalua... Trigger with relevant phrases based on skill purpose.
evaluating-machine-learning-models
This skill allows Claude to evaluate machine learning models using a comprehensive suite of metrics. It should be used when the user requests model performance analysis, validation, or testing. Claude can use this skill to assess model accuracy, precision, recall, F1-score, and other relevant metrics. Trigger this skill when the user mentions "evaluate model", "model performance", "testing metrics", "validation results", or requests a comprehensive "model evaluation".
deploying-machine-learning-models
This skill enables Claude to deploy machine learning models to production environments. It automates the deployment workflow, implements best practices for serving models, optimizes performance, and handles potential errors. Use this skill when the user requests to deploy a model, serve a model via an API, or put a trained model into a production environment. The skill is triggered by requests containing terms like "deploy model," "productionize model," "serve model," or "model deployment."
learning-rate-scheduler
Learning Rate Scheduler - Auto-activating skill for ML Training. Triggers on: learning rate scheduler, learning rate scheduler Part of the ML Training skill category.
engineering-features-for-machine-learning
This skill empowers Claude to perform feature engineering tasks for machine learning. It creates, selects, and transforms features to improve model performance. Use this skill when the user requests feature creation, feature selection, feature transformation, or any request that involves improving the features used in a machine learning model. Trigger terms include "feature engineering", "feature selection", "feature transformation", "create features", "select features", "transform features", "improve model performance", and similar phrases related to feature manipulation.
explaining-machine-learning-models
Build this skill enables AI assistant to provide interpretability and explainability for machine learning models. it is triggered when the user requests explanations for model predictions, insights into feature importance, or help understanding model behavior... Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.
distributed-training-setup
Distributed Training Setup - Auto-activating skill for ML Training. Triggers on: distributed training setup, distributed training setup Part of the ML Training skill category.
optimizing-deep-learning-models
This skill optimizes deep learning models using various techniques. It is triggered when the user requests improvements to model performance, such as increasing accuracy, reducing training time, or minimizing resource consumption. The skill leverages advanced optimization algorithms like Adam, SGD, and learning rate scheduling. It analyzes the existing model architecture, training data, and performance metrics to identify areas for enhancement. The skill then automatically applies appropriate optimization strategies and generates optimized code. Use this skill when the user mentions "optimize deep learning model", "improve model accuracy", "reduce training time", or "optimize learning rate".
learning-a-tool
Create learning paths for programming tools, and define what information should be researched to create learning guides. Use when user asks to learn, understand, or get started with any programming tool, library, or framework.
skill-learning
Extracts actionable knowledge from external sources and enhances existing skills using a 4-tier novelty framework. Use when learning from URLs, documentation, or codebases. Proactively use when the user asks to extract patterns from a reference repository or skill marketplace.
machine-learning-ops-ml-pipeline
Design and implement a complete ML pipeline for: $ARGUMENTS