ai-native-development
Build AI-first applications with RAG pipelines, embeddings, vector databases, agentic workflows, and LLM integration. Master prompt engineering, function calling, streaming responses, and cost optimization for 2025+ AI development.
Best use case
ai-native-development is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Build AI-first applications with RAG pipelines, embeddings, vector databases, agentic workflows, and LLM integration. Master prompt engineering, function calling, streaming responses, and cost optimization for 2025+ AI development.
Build AI-first applications with RAG pipelines, embeddings, vector databases, agentic workflows, and LLM integration. Master prompt engineering, function calling, streaming responses, and cost optimization for 2025+ AI development.
Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.
Practical example
Example input
Use the "ai-native-development" skill to help with this workflow task. Context: Build AI-first applications with RAG pipelines, embeddings, vector databases, agentic workflows, and LLM integration. Master prompt engineering, function calling, streaming responses, and cost optimization for 2025+ AI development.
Example output
A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.
When to use this skill
- Use this skill when you want a reusable workflow rather than writing the same prompt again and again.
When not to use this skill
- Do not use this when you only need a one-off answer and do not need a reusable workflow.
- Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/ai-native-development/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How ai-native-development Compares
| Feature / Agent | ai-native-development | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Build AI-first applications with RAG pipelines, embeddings, vector databases, agentic workflows, and LLM integration. Master prompt engineering, function calling, streaming responses, and cost optimization for 2025+ AI development.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
SKILL.md Source
# AI-Native Development
## Overview
AI-Native Development focuses on building applications where AI is a first-class citizen, not an afterthought. This skill provides comprehensive patterns for integrating LLMs, implementing RAG (Retrieval-Augmented Generation), using vector databases, building agentic workflows, and optimizing AI application performance and cost.
**When to use this skill:**
- Building chatbots, Q&A systems, or conversational interfaces
- Implementing semantic search or recommendation engines
- Creating AI agents that can use tools and take actions
- Integrating LLMs (OpenAI, Anthropic, open-source models) into applications
- Building RAG systems for knowledge retrieval
- Optimizing AI costs and latency
- Implementing AI observability and monitoring
---
## Why AI-Native Development Matters
Traditional software is deterministic; AI-native applications are probabilistic:
- **Context is Everything**: LLMs need relevant context to provide accurate answers
- **RAG Over Fine-Tuning**: Retrieval is cheaper and more flexible than fine-tuning
- **Embeddings Enable Semantic Search**: Move beyond keyword matching to understanding meaning
- **Agentic Workflows**: LLMs can reason, plan, and use tools autonomously
- **Cost Management**: Token usage directly impacts operational costs
- **Observability**: Debugging probabilistic systems requires new approaches
- **Prompt Engineering**: How you ask matters as much as what you ask
---
## Core Concepts
### 1. Embeddings & Vector Search
Embeddings are vector representations of text that capture semantic meaning. Similar concepts have similar vectors.
**Key Capabilities:**
- Convert text to high-dimensional vectors (1536 or 3072 dimensions)
- Measure semantic similarity using cosine similarity
- Find relevant documents through vector search
- Batch process for efficiency
**Detailed Implementation:** See `references/vector-databases.md` for:
- OpenAI embeddings setup and batch processing
- Cosine similarity algorithms
- Chunking strategies (500-1000 tokens with 10-20% overlap)
### 2. Vector Databases
Store and retrieve embeddings efficiently at scale.
**Popular Options:**
- **Pinecone**: Serverless, managed service ($0.096/hour)
- **Chroma**: Open source, self-hosted
- **Weaviate**: Flexible schema, hybrid search
- **Qdrant**: Rust-based, high performance
**Detailed Implementation:** See `references/vector-databases.md` for:
- Complete setup guides for each database
- Upsert, query, update, delete operations
- Metadata filtering and hybrid search
- Cost comparison and best practices
### 3. RAG (Retrieval-Augmented Generation)
RAG combines retrieval systems with LLMs to provide accurate, grounded answers.
**Core Pattern:**
1. Retrieve relevant documents from vector database
2. Construct context from top results
3. Generate answer with LLM using retrieved context
**Advanced Patterns:**
- RAG with citations and source tracking
- Hybrid search (semantic + keyword)
- Multi-query RAG for better recall
- HyDE (Hypothetical Document Embeddings)
- Contextual compression for relevance
**Detailed Implementation:** See `references/rag-patterns.md` for:
- Basic and advanced RAG patterns with full code
- Citation strategies
- Hybrid search with Reciprocal Rank Fusion
- Conversation memory patterns
- Error handling and validation
### 4. Function Calling & Tool Use
Enable LLMs to use external tools and APIs reliably.
**Capabilities:**
- Define tools with JSON schemas
- Execute functions based on LLM decisions
- Handle parallel tool calls
- Stream responses with tool use
**Detailed Implementation:** See `references/function-calling.md` for:
- Tool definition patterns (OpenAI and Anthropic)
- Function calling loops
- Parallel and streaming tool execution
- Input validation with Zod
- Error handling and fallback strategies
### 5. Agentic Workflows
Enable LLMs to reason, plan, and take autonomous actions.
**Patterns:**
- **ReAct**: Reasoning + Acting loop with observations
- **Tree of Thoughts**: Explore multiple reasoning paths
- **Multi-Agent**: Specialized agents collaborating on complex tasks
- **Autonomous Agents**: Self-directed goal achievement
**Detailed Implementation:** See `references/agentic-workflows.md` for:
- Complete ReAct loop implementation
- Tree of Thoughts exploration
- Multi-agent coordinator patterns
- Agent memory management
- Error recovery and safety guards
### 5.1 Multi-Agent Orchestration (Opus 4.5)
Advanced multi-agent patterns leveraging Opus 4.5's extended thinking capabilities.
**When to Use Extended Thinking:**
- Coordinating 3+ specialized agents
- Complex dependency resolution between agent outputs
- Dynamic task allocation based on agent capabilities
- Conflict resolution when agents produce contradictory results
**Orchestrator Pattern:**
```typescript
interface AgentTask {
id: string;
type: 'research' | 'code' | 'review' | 'design';
input: unknown;
dependencies: string[]; // Task IDs that must complete first
}
interface AgentResult {
taskId: string;
output: unknown;
confidence: number;
reasoning: string;
}
async function orchestrateAgents(
goal: string,
availableAgents: Agent[]
): Promise<AgentResult[]> {
// Step 1: Use extended thinking to decompose goal into tasks
const taskPlan = await planTasks(goal, availableAgents);
// Step 2: Build dependency graph
const dependencyGraph = buildDependencyGraph(taskPlan.tasks);
// Step 3: Execute tasks respecting dependencies
const results: AgentResult[] = [];
const completed = new Set<string>();
while (completed.size < taskPlan.tasks.length) {
// Find tasks with satisfied dependencies
const ready = taskPlan.tasks.filter(task =>
!completed.has(task.id) &&
task.dependencies.every(dep => completed.has(dep))
);
// Execute ready tasks in parallel
const batchResults = await Promise.all(
ready.map(task => executeAgentTask(task, availableAgents))
);
// Validate results - use extended thinking for conflicts
const validatedResults = await validateAndResolveConflicts(
batchResults,
results
);
results.push(...validatedResults);
ready.forEach(task => completed.add(task.id));
}
return results;
}
```
**Task Planning with Extended Thinking:**
Based on [Anthropic's Extended Thinking documentation](https://platform.claude.com/docs/en/build-with-claude/extended-thinking):
```typescript
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic();
async function planTasks(
goal: string,
agents: Agent[]
): Promise<{ tasks: AgentTask[]; rationale: string }> {
// Extended thinking requires budget_tokens < max_tokens
// Minimum budget: 1,024 tokens
const response = await anthropic.messages.create({
model: 'claude-opus-4-5-20251101', // Or claude-sonnet-4-5-20250929
max_tokens: 16000,
thinking: {
type: 'enabled',
budget_tokens: 10000 // Extended thinking for complex planning
},
messages: [{
role: 'user',
content: `
Goal: ${goal}
Available agents and their capabilities:
${agents.map(a => `- ${a.name}: ${a.capabilities.join(', ')}`).join('\n')}
Decompose this goal into tasks. For each task, specify:
1. Which agent should handle it
2. What input it needs
3. Which other tasks it depends on
4. Expected output format
Think carefully about:
- Optimal parallelization opportunities
- Potential conflicts between agent outputs
- Information that needs to flow between tasks
`
}]
});
// Response contains thinking blocks followed by text blocks
// content: [{ type: 'thinking', thinking: '...' }, { type: 'text', text: '...' }]
return parseTaskPlan(response);
}
```
**Conflict Resolution:**
```typescript
async function validateAndResolveConflicts(
newResults: AgentResult[],
existingResults: AgentResult[]
): Promise<AgentResult[]> {
// Check for conflicts with existing results
const conflicts = detectConflicts(newResults, existingResults);
if (conflicts.length === 0) {
return newResults;
}
// Use extended thinking to resolve conflicts
const resolution = await anthropic.messages.create({
model: 'claude-opus-4-5-20251101',
max_tokens: 8000,
thinking: {
type: 'enabled',
budget_tokens: 5000
},
messages: [{
role: 'user',
content: `
The following agent outputs conflict:
${conflicts.map(c => `
Conflict: ${c.description}
Agent A (${c.agentA.name}): ${JSON.stringify(c.resultA)}
Agent B (${c.agentB.name}): ${JSON.stringify(c.resultB)}
`).join('\n\n')}
Analyze each conflict and determine:
1. Which output is more likely correct and why
2. If both have merit, how to synthesize them
3. What additional verification might be needed
`
}]
});
return applyResolutions(newResults, resolution);
}
```
**Adaptive Agent Selection:**
```typescript
async function selectOptimalAgent(
task: AgentTask,
agents: Agent[],
context: ExecutionContext
): Promise<Agent> {
// Score each agent based on:
// - Capability match
// - Current load
// - Historical performance on similar tasks
// - Cost (model tier)
const scores = agents.map(agent => ({
agent,
score: calculateAgentScore(agent, task, context)
}));
// For complex tasks, use Opus; for simple tasks, use Haiku
const complexity = assessTaskComplexity(task);
if (complexity > 0.7) {
// Filter to agents that can use Opus
const opusCapable = scores.filter(s => s.agent.supportsOpus);
return opusCapable.sort((a, b) => b.score - a.score)[0].agent;
}
return scores.sort((a, b) => b.score - a.score)[0].agent;
}
```
**Agent Communication Protocol:**
```typescript
interface AgentMessage {
from: string;
to: string | 'broadcast';
type: 'request' | 'response' | 'update' | 'conflict';
payload: unknown;
timestamp: Date;
}
class AgentCommunicationBus {
private messages: AgentMessage[] = [];
private subscribers: Map<string, (msg: AgentMessage) => void> = new Map();
send(message: AgentMessage): void {
this.messages.push(message);
if (message.to === 'broadcast') {
this.subscribers.forEach(callback => callback(message));
} else {
this.subscribers.get(message.to)?.(message);
}
}
subscribe(agentId: string, callback: (msg: AgentMessage) => void): void {
this.subscribers.set(agentId, callback);
}
getHistory(agentId: string): AgentMessage[] {
return this.messages.filter(
m => m.from === agentId || m.to === agentId || m.to === 'broadcast'
);
}
}
```
### 6. Streaming Responses
Deliver real-time AI responses for better UX.
**Capabilities:**
- Stream LLM output token-by-token
- Server-Sent Events (SSE) for web clients
- Streaming with function calls
- Backpressure handling
**Detailed Implementation:** See `../streaming-api-patterns/SKILL.md` for streaming patterns
### 7. Cost Optimization
**Strategies:**
- Use smaller models for simple tasks (GPT-3.5 vs GPT-4)
- Implement prompt caching (Anthropic's ephemeral cache)
- Batch requests when possible
- Set max_tokens to prevent runaway generation
- Monitor usage with alerts
**Token Counting:**
```typescript
import { encoding_for_model } from 'tiktoken'
function countTokens(text: string, model = 'gpt-4'): number {
const encoder = encoding_for_model(model)
const tokens = encoder.encode(text)
encoder.free()
return tokens.length
}
```
**Detailed Implementation:** See `references/observability.md` for:
- Cost estimation and budget tracking
- Model selection strategies
- Prompt caching patterns
### 8. Observability & Monitoring
Track LLM performance, costs, and quality in production.
**Tools:**
- **LangSmith**: Tracing, evaluation, monitoring
- **LangFuse**: Open-source observability
- **Custom Logging**: Structured logs with metrics
**Key Metrics:**
- Throughput (requests/minute)
- Latency (P50, P95, P99)
- Token usage and cost
- Error rate
- Quality scores (relevance, coherence, factuality)
**Detailed Implementation:** See `references/observability.md` for:
- LangSmith and LangFuse integration
- Custom logger implementation
- Performance monitoring
- Quality evaluation
- Debugging and error analysis
---
## Searching References
This skill includes detailed reference material. Use grep to find specific patterns:
```bash
# Find RAG patterns
grep -r "RAG" references/
# Search for specific vector database
grep -A 10 "Pinecone Setup" references/vector-databases.md
# Find agentic workflow examples
grep -B 5 "ReAct Pattern" references/agentic-workflows.md
# Locate function calling patterns
grep -n "parallel.*tool" references/function-calling.md
# Search for cost optimization
grep -i "cost\|pricing\|budget" references/observability.md
# Find all code examples for embeddings
grep -A 20 "async function.*embedding" references/
```
---
## Best Practices
### Context Management
- ✅ Keep context windows under 75% of model limit
- ✅ Use sliding window for long conversations
- ✅ Summarize old messages before they scroll out
- ✅ Remove redundant or irrelevant context
### Embedding Strategy
- ✅ Chunk documents to 500-1000 tokens
- ✅ Overlap chunks by 10-20% for continuity
- ✅ Include metadata (title, source, date) with chunks
- ✅ Re-embed when source data changes
### RAG Quality
- ✅ Use hybrid search (semantic + keyword)
- ✅ Re-rank results for relevance
- ✅ Include citation/source in context
- ✅ Set temperature low (0.1-0.3) for factual answers
- ✅ Validate answers against retrieved context
### Function Calling
- ✅ Provide clear, concise function descriptions
- ✅ Use strict JSON schema for parameters
- ✅ Handle missing or invalid parameters gracefully
- ✅ Limit to 10-20 tools to avoid confusion
- ✅ Validate function outputs before returning to LLM
### Cost Optimization
- ✅ Use smaller models for simple tasks
- ✅ Implement prompt caching for repeated content
- ✅ Batch requests when possible
- ✅ Set max_tokens to prevent runaway generation
- ✅ Monitor usage with alerts for anomalies
### Security
- ✅ Validate and sanitize user inputs
- ✅ Never include secrets in prompts
- ✅ Implement rate limiting
- ✅ Filter outputs for harmful content
- ✅ Use separate API keys per environment
---
## Templates
Use the provided templates for common AI patterns:
- **`templates/rag-pipeline.ts`** - Basic RAG implementation
- **`templates/agentic-workflow.ts`** - ReAct agent pattern
---
## Examples
### Complete RAG Chatbot
See `examples/chatbot-with-rag/` for a full-stack implementation:
- Vector database setup with document ingestion
- RAG query with citations
- Streaming chat interface
- Cost tracking and monitoring
---
## Checklists
### AI Implementation Checklist
See `checklists/ai-implementation.md` for comprehensive validation covering:
- [ ] Vector database setup and configuration
- [ ] Embedding generation and chunking strategy
- [ ] RAG pipeline with quality validation
- [ ] Function calling with error handling
- [ ] Streaming response implementation
- [ ] Cost monitoring and budget alerts
- [ ] Observability and logging
- [ ] Security and input validation
---
## Common Patterns
### Semantic Caching
Reduce costs by caching similar queries:
```typescript
const cache = new Map<string, { embedding: number[]; response: string }>()
async function cachedRAG(query: string) {
const queryEmbedding = await createEmbedding(query)
// Check if similar query exists in cache
for (const [cachedQuery, cached] of cache.entries()) {
const similarity = cosineSimilarity(queryEmbedding, cached.embedding)
if (similarity > 0.95) {
return cached.response
}
}
// Not cached, perform RAG
const response = await ragQuery(query)
cache.set(query, { embedding: queryEmbedding, response })
return response
}
```
### Conversational Memory
Maintain context across multiple turns:
```typescript
interface ConversationMemory {
messages: Message[] // Last 10 messages
summary?: string // Summary of older messages
}
async function getConversationContext(userId: string): Promise<Message[]> {
const memory = await db.memory.findUnique({ where: { userId } })
return [
{ role: 'system', content: `Previous conversation summary: ${memory.summary}` },
...memory.messages.slice(-5) // Last 5 messages
]
}
```
---
## Prompt Engineering
### Few-Shot Learning
Provide examples to guide LLM behavior:
```typescript
const fewShotExamples = `
Example 1:
Input: "I love this product!"
Sentiment: Positive
Example 2:
Input: "It's okay, nothing special"
Sentiment: Neutral
`
// Include in system prompt
```
### Chain of Thought (CoT)
Ask LLM to show reasoning:
```typescript
const prompt = `${problem}\n\nLet's think step by step:`
```
---
## Resources
- [OpenAI API Documentation](https://platform.openai.com/docs)
- [Anthropic Claude API](https://docs.anthropic.com)
- [LangChain Documentation](https://python.langchain.com/docs/)
- [Pinecone Documentation](https://docs.pinecone.io/)
- [Chroma Documentation](https://docs.trychroma.com/)
- [LangSmith Observability](https://docs.smith.langchain.com/)
---
## Next Steps
After mastering AI-Native Development:
1. Explore **Streaming API Patterns** skill for real-time AI responses
2. Use **Type Safety & Validation** skill for AI input/output validation
3. Apply **Edge Computing Patterns** skill for global AI deployment
4. Reference **Observability Patterns** for production monitoringRelated Skills
react-native-design
Master React Native styling, navigation, and Reanimated animations for cross-platform mobile development. Use when building React Native apps, implementing navigation patterns, or creating performant animations.
vue-development-guides
A collection of best practices and tips for developing applications using Vue.js. This skill MUST be apply when developing, refactoring or reviewing Vue.js or Nuxt projects.
wordpress-woocommerce-development
WooCommerce store development workflow covering store setup, payment integration, shipping configuration, and customization.
wordpress-theme-development
WordPress theme development workflow covering theme architecture, template hierarchy, custom post types, block editor support, and responsive design.
wordpress-plugin-development
WordPress plugin development workflow covering plugin architecture, hooks, admin interfaces, REST API, and security best practices.
voice-ai-engine-development
Build real-time conversational AI voice engines using async worker pipelines, streaming transcription, LLM agents, and TTS synthesis with interrupt handling and multi-provider support
voice-ai-development
Expert in building voice AI applications - from real-time voice agents to voice-enabled apps. Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs for synthesis, LiveKit for real-time infrastructure, and WebRTC fundamentals. Knows how to build low-latency, production-ready voice experiences. Use when: voice ai, voice agent, speech to text, text to speech, realtime voice.
salesforce-development
Expert patterns for Salesforce platform development including Lightning Web Components (LWC), Apex triggers and classes, REST/Bulk APIs, Connected Apps, and Salesforce DX with scratch orgs and 2nd generation packages (2GP). Use when: salesforce, sfdc, apex, lwc, lightning web components.
react-native-architecture
Build production React Native apps with Expo, navigation, native modules, offline sync, and cross-platform patterns. Use when developing mobile apps, implementing native integrations, or architecting React Native projects.
python-fastapi-development
Python FastAPI backend development with async patterns, SQLAlchemy, Pydantic, authentication, and production API patterns.
python-development-python-scaffold
You are a Python project architecture expert specializing in scaffolding production-ready Python applications. Generate complete project structures with modern tooling (uv, FastAPI, Django), type hint
moodle-external-api-development
Create custom external web service APIs for Moodle LMS. Use when implementing web services for course management, user tracking, quiz operations, or custom plugin functionality. Covers parameter validation, database operations, error handling, service registration, and Moodle coding standards.