onenote-rate-limits
Implement proper rate limit handling for OneNote Graph API with queue-based throttling. Use when building high-throughput OneNote integrations or debugging 429 errors. Trigger with "onenote rate limit", "onenote 429", "onenote throttling", "graph api throttle".
Best use case
onenote-rate-limits is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Implement proper rate limit handling for OneNote Graph API with queue-based throttling. Use when building high-throughput OneNote integrations or debugging 429 errors. Trigger with "onenote rate limit", "onenote 429", "onenote throttling", "graph api throttle".
Teams using onenote-rate-limits should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/onenote-rate-limits/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How onenote-rate-limits Compares
| Feature / Agent | onenote-rate-limits | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Implement proper rate limit handling for OneNote Graph API with queue-based throttling. Use when building high-throughput OneNote integrations or debugging 429 errors. Trigger with "onenote rate limit", "onenote 429", "onenote throttling", "graph api throttle".
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
SKILL.md Source
# OneNote — Rate Limit Handling & Request Throttling
## Overview
Microsoft Graph rate limits OneNote at **600 requests per 60 seconds per user** and **10,000 requests per 10 minutes per app/tenant**. When you exceed either limit, the API returns `429 Too Many Requests` with a `Retry-After` header specifying how many seconds to wait. Most implementations either ignore this header entirely (retrying immediately, making things worse) or use a fixed backoff that wastes capacity.
This skill implements a token bucket rate limiter, queue-based request throttling, and proper `Retry-After` header parsing. For multi-user apps, it tracks per-user and per-tenant budgets independently.
Key pain points addressed:
- The `Retry-After` header value is in seconds (not milliseconds) — many implementations parse this wrong
- The per-user limit (600/60s) is separate from the per-tenant limit (10,000/10min) — you can hit one without the other
- Batch requests (`$batch`) count as one request toward the limit, regardless of how many operations are inside
- After a 429, subsequent requests to ANY OneNote endpoint are throttled — not just the endpoint that triggered it
## Prerequisites
- Azure app registration with delegated permissions: `Notes.ReadWrite`
- App-only auth deprecated March 31, 2025 — use delegated auth only
- Python: `pip install msgraph-sdk azure-identity`
- Node/TypeScript: `npm install @microsoft/microsoft-graph-client @azure/identity @azure/msal-node`
- Optional: `npm install p-queue` for production queue management
## Instructions
### Step 1 — Understand the Rate Limit Structure
| Limit | Scope | Window | Threshold |
|-------|-------|--------|-----------|
| Per-user | Single user's delegated token | 60 seconds (rolling) | 600 requests |
| Per-tenant | All users + all apps in the tenant | 10 minutes (rolling) | 10,000 requests |
When either limit is hit:
- Response status: `429 Too Many Requests`
- Response header: `Retry-After: <seconds>` (integer, not milliseconds)
- All subsequent OneNote requests for that scope are blocked until the window resets
- Non-OneNote Graph endpoints (Outlook, OneDrive) are **not** affected
### Step 2 — Token Bucket Rate Limiter (TypeScript)
A token bucket preemptively throttles requests to stay below the limit, avoiding 429s entirely:
```typescript
class TokenBucket {
private tokens: number;
private lastRefill: number;
private readonly maxTokens: number;
private readonly refillRate: number; // tokens per millisecond
constructor(maxTokens: number, refillWindowMs: number) {
this.maxTokens = maxTokens;
this.tokens = maxTokens;
this.lastRefill = Date.now();
this.refillRate = maxTokens / refillWindowMs;
}
private refill(): void {
const now = Date.now();
const elapsed = now - this.lastRefill;
this.tokens = Math.min(this.maxTokens, this.tokens + elapsed * this.refillRate);
this.lastRefill = now;
}
async acquire(): Promise<void> {
this.refill();
if (this.tokens >= 1) {
this.tokens -= 1;
return;
}
// Wait until a token is available
const waitMs = Math.ceil((1 - this.tokens) / this.refillRate);
await new Promise((resolve) => setTimeout(resolve, waitMs));
this.tokens -= 1;
}
get available(): number {
this.refill();
return Math.floor(this.tokens);
}
}
// Per-user bucket: 600 requests per 60 seconds
const userBucket = new TokenBucket(600, 60_000);
// Use with a safety margin (80% of limit)
const safeUserBucket = new TokenBucket(480, 60_000);
```
### Step 3 — Queue-Based Request Throttling
Wrap all OneNote API calls through a throttled queue that respects both the token bucket and `Retry-After` headers:
```typescript
import { Client } from "@microsoft/microsoft-graph-client";
class ThrottledOneNoteClient {
private bucket: TokenBucket;
private queue: Array<{
resolve: (value: any) => void;
reject: (error: any) => void;
fn: () => Promise<any>;
}> = [];
private processing = false;
private retryAfterUntil: number = 0; // Timestamp when retry-after expires
constructor(
private client: Client,
maxRequestsPerMinute: number = 480 // 80% safety margin
) {
this.bucket = new TokenBucket(maxRequestsPerMinute, 60_000);
}
async request<T>(fn: (client: Client) => Promise<T>): Promise<T> {
return new Promise((resolve, reject) => {
this.queue.push({ resolve, reject, fn: () => fn(this.client) });
this.processQueue();
});
}
private async processQueue(): Promise<void> {
if (this.processing) return;
this.processing = true;
while (this.queue.length > 0) {
// Respect Retry-After if we've been throttled
const now = Date.now();
if (this.retryAfterUntil > now) {
const waitMs = this.retryAfterUntil - now;
console.warn(`Rate limited — waiting ${Math.ceil(waitMs / 1000)}s`);
await new Promise((r) => setTimeout(r, waitMs));
}
await this.bucket.acquire();
const item = this.queue.shift()!;
try {
const result = await item.fn();
item.resolve(result);
} catch (err: any) {
if (err.statusCode === 429) {
const retryAfter = parseInt(err.headers?.["retry-after"] ?? "30", 10);
this.retryAfterUntil = Date.now() + retryAfter * 1000;
// Re-queue the failed request
this.queue.unshift(item);
console.warn(`429 received — Retry-After: ${retryAfter}s`);
} else {
item.reject(err);
}
}
}
this.processing = false;
}
}
// Usage
const throttled = new ThrottledOneNoteClient(client);
const notebooks = await throttled.request((c) =>
c.api("/me/onenote/notebooks").get()
);
```
### Step 4 — Per-User Tracking for Multi-User Apps
Multi-user apps must track rate limits per user, not globally:
```typescript
class MultiUserRateLimiter {
private userBuckets: Map<string, TokenBucket> = new Map();
private tenantBucket: TokenBucket;
constructor() {
// Tenant-wide: 10,000 per 10 minutes
this.tenantBucket = new TokenBucket(8_000, 600_000); // 80% safety margin
}
async acquire(userId: string): Promise<void> {
// Get or create per-user bucket
if (!this.userBuckets.has(userId)) {
this.userBuckets.set(userId, new TokenBucket(480, 60_000));
}
const userBucket = this.userBuckets.get(userId)!;
// Must acquire from BOTH buckets
await userBucket.acquire();
await this.tenantBucket.acquire();
}
getStatus(userId: string): { userRemaining: number; tenantRemaining: number } {
const userBucket = this.userBuckets.get(userId);
return {
userRemaining: userBucket?.available ?? 480,
tenantRemaining: this.tenantBucket.available,
};
}
}
```
### Step 5 — Exponential Backoff with Jitter
For 429 responses without a `Retry-After` header (rare but possible), use exponential backoff with jitter:
```typescript
async function withBackoff<T>(
fn: () => Promise<T>,
maxRetries: number = 5
): Promise<T> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (err: any) {
if (err.statusCode !== 429 || attempt === maxRetries) throw err;
const retryAfter = err.headers?.["retry-after"];
let delayMs: number;
if (retryAfter) {
// Prefer server-specified delay (in seconds)
delayMs = parseInt(retryAfter, 10) * 1000;
} else {
// Exponential backoff: 1s, 2s, 4s, 8s, 16s + jitter
const base = Math.pow(2, attempt) * 1000;
const jitter = Math.random() * 1000;
delayMs = base + jitter;
}
console.warn(`Retry ${attempt + 1}/${maxRetries} in ${Math.ceil(delayMs / 1000)}s`);
await new Promise((r) => setTimeout(r, delayMs));
}
}
throw new Error("Unreachable");
}
// Usage
const pages = await withBackoff(() =>
client.api("/me/onenote/pages").top(50).get()
);
```
### Step 6 — Batch Requests to Reduce Call Count
The Graph `$batch` endpoint lets you send up to 20 operations in a single HTTP request. The entire batch counts as **one** request toward your rate limit:
```typescript
async function batchGetPages(client: Client, pageIds: string[]): Promise<any[]> {
const batchSize = 20; // Graph batch limit
const allResults: any[] = [];
for (let i = 0; i < pageIds.length; i += batchSize) {
const chunk = pageIds.slice(i, i + batchSize);
const batchBody = {
requests: chunk.map((id, idx) => ({
id: String(idx + 1),
method: "GET",
url: `/me/onenote/pages/${id}?$select=id,title,lastModifiedDateTime`,
})),
};
const batchResponse = await client.api("/$batch").post(batchBody);
for (const response of batchResponse.responses) {
if (response.status === 200) {
allResults.push(response.body);
} else {
console.warn(`Batch item ${response.id} failed: ${response.status}`);
}
}
}
return allResults;
}
// 100 pages = 5 HTTP requests instead of 100
const pages = await batchGetPages(client, hundredPageIds);
```
### Step 7 — Python Rate Limiter with asyncio
```python
import asyncio
import time
class RateLimiter:
"""Token bucket rate limiter for OneNote Graph API."""
def __init__(self, max_requests: int = 480, window_seconds: int = 60):
self.max_tokens = max_requests
self.tokens = float(max_requests)
self.refill_rate = max_requests / window_seconds
self.last_refill = time.monotonic()
self._lock = asyncio.Lock()
async def acquire(self):
async with self._lock:
now = time.monotonic()
elapsed = now - self.last_refill
self.tokens = min(self.max_tokens, self.tokens + elapsed * self.refill_rate)
self.last_refill = now
if self.tokens < 1:
wait = (1 - self.tokens) / self.refill_rate
await asyncio.sleep(wait)
self.tokens = 0
else:
self.tokens -= 1
# Usage — combines token bucket with Retry-After handling
limiter = RateLimiter(max_requests=480, window_seconds=60)
async def safe_get_pages(client, section_id: str, max_retries: int = 3):
for attempt in range(max_retries):
await limiter.acquire()
try:
return await client.me.onenote.sections.by_onenote_section_id(
section_id
).pages.get()
except Exception as e:
# Handle 429 with Retry-After header
if hasattr(e, "response") and e.response.status_code == 429 and attempt < max_retries - 1:
retry_after = int(e.response.headers.get("Retry-After", "30"))
await asyncio.sleep(retry_after)
else:
raise
raise RuntimeError("Max retries exceeded for OneNote API call")
```
### Step 8 — Monitor and Adjust Preemptively
Track your 429 rate over time and adjust thresholds:
```typescript
class RateLimitMonitor {
private requestCount = 0;
private throttleCount = 0;
private windowStart = Date.now();
record(wasThrottled: boolean): void {
this.requestCount++;
if (wasThrottled) this.throttleCount++;
}
getMetrics(): { total: number; throttled: number; throttleRate: number; windowMinutes: number } {
const windowMinutes = (Date.now() - this.windowStart) / 60_000;
return {
total: this.requestCount,
throttled: this.throttleCount,
throttleRate: this.throttleCount / Math.max(this.requestCount, 1),
windowMinutes: Math.round(windowMinutes * 10) / 10,
};
}
// Alert if throttle rate exceeds threshold
shouldReduceRate(): boolean {
return this.getMetrics().throttleRate > 0.05; // >5% throttled = slow down
}
}
```
## Output
Rate limit handling produces:
- Preemptive throttling via token bucket — requests are delayed before sending, not after 429
- `Retry-After` compliance — exact server-specified delays honored
- Batch consolidation — 20 operations per HTTP request for bulk workloads
- Monitoring metrics — request count, throttle count, throttle rate percentage
## Error Handling
| Status | Cause | Fix |
|--------|-------|-----|
| 429 (with Retry-After) | Per-user or per-tenant limit exceeded | Wait exactly `Retry-After` seconds; do not retry sooner |
| 429 (no Retry-After) | Rare edge case, limit exceeded | Exponential backoff with jitter starting at 1 second |
| 503 | Service throttling under load | Treat like 429 — backoff and retry |
| 500 | Internal error during throttled state | Do not count as rate limit; retry with normal backoff |
## Examples
**Calculate request budget for polling + CRUD:**
```typescript
const BUDGET_PER_MINUTE = 600;
const SAFETY_MARGIN = 0.8; // Use 80% of limit
const safeBudget = BUDGET_PER_MINUTE * SAFETY_MARGIN; // 480
// Allocate budget
const pollingSections = 20;
const pollIntervalSec = 30;
const pollRequestsPerMin = pollingSections * (60 / pollIntervalSec); // 40/min
const remainingForCrud = safeBudget - pollRequestsPerMin; // 440/min for user operations
console.log(`Polling: ${pollRequestsPerMin}/min | CRUD: ${remainingForCrud}/min`);
```
**Production health check:**
```typescript
const monitor = new RateLimitMonitor();
// After each API call:
monitor.record(/* wasThrottled */ false);
// Periodic check
setInterval(() => {
const metrics = monitor.getMetrics();
if (monitor.shouldReduceRate()) {
console.warn(`High throttle rate: ${(metrics.throttleRate * 100).toFixed(1)}%`);
// Dynamically increase poll interval or reduce batch concurrency
}
}, 60_000);
```
## Resources
- [OneNote API Overview](https://learn.microsoft.com/en-us/graph/api/resources/onenote-api-overview)
- [Error Codes](https://learn.microsoft.com/en-us/graph/onenote-error-codes)
- [Best Practices](https://learn.microsoft.com/en-us/graph/onenote-best-practices)
- [Known Issues](https://learn.microsoft.com/en-us/graph/known-issues)
- [Graph API Reference](https://learn.microsoft.com/en-us/graph/api/overview)
- [Graph Explorer](https://developer.microsoft.com/en-us/graph/graph-explorer)
## Next Steps
- See `onenote-webhooks-events` for polling patterns that consume rate budget
- See `onenote-performance-tuning` for batch operations and `$select` to reduce payload size
- See `onenote-core-workflow-a` for CRUD operations that benefit from throttled clientsRelated Skills
workhuman-rate-limits
Workhuman rate limits for employee recognition and rewards API. Use when integrating Workhuman Social Recognition, or building recognition workflows with HRIS systems. Trigger: "workhuman rate limits".
wispr-rate-limits
Wispr Flow rate limits for voice-to-text API integration. Use when integrating Wispr Flow dictation, WebSocket streaming, or building voice-powered applications. Trigger: "wispr rate limits".
windsurf-rate-limits
Understand and manage Windsurf credit system, usage limits, and model selection. Use when running out of credits, optimizing AI usage costs, or understanding the credit-per-model pricing structure. Trigger with phrases like "windsurf credits", "windsurf rate limit", "windsurf usage", "windsurf out of credits", "windsurf model costs".
webflow-rate-limits
Handle Webflow Data API v2 rate limits — per-key limits, Retry-After headers, exponential backoff, request queuing, and bulk endpoint optimization. Use when hitting 429 errors, implementing retry logic, or optimizing API request throughput. Trigger with phrases like "webflow rate limit", "webflow throttling", "webflow 429", "webflow retry", "webflow backoff", "webflow too many requests".
vercel-rate-limits
Handle Vercel API rate limits, implement retry logic, and configure WAF rate limiting. Use when hitting 429 errors, implementing retry logic, or setting up rate limiting for your Vercel-deployed API endpoints. Trigger with phrases like "vercel rate limit", "vercel throttling", "vercel 429", "vercel retry", "vercel backoff", "vercel WAF rate limit".
veeva-rate-limits
Veeva Vault rate limits for REST API and clinical operations. Use when working with Veeva Vault document management and CRM. Trigger: "veeva rate limits".
vastai-rate-limits
Handle Vast.ai API rate limits with backoff and request optimization. Use when encountering 429 errors, implementing retry logic, or optimizing API request throughput. Trigger with phrases like "vastai rate limit", "vastai throttling", "vastai 429", "vastai retry", "vastai backoff".
twinmind-rate-limits
Implement TwinMind rate limiting, backoff, and optimization patterns. Use when handling rate limit errors, implementing retry logic, or optimizing API request throughput for TwinMind. Trigger with phrases like "twinmind rate limit", "twinmind throttling", "twinmind 429", "twinmind retry", "twinmind backoff".
together-rate-limits
Together AI rate limits for inference, fine-tuning, and model deployment. Use when working with Together AI's OpenAI-compatible API. Trigger: "together rate limits".
techsmith-rate-limits
TechSmith rate limits for Snagit COM API and Camtasia automation. Use when working with TechSmith screen capture and video editing automation. Trigger: "techsmith rate limits".
supabase-rate-limits
Manage Supabase rate limits and quotas across all plan tiers. Use when hitting 429 errors, configuring connection pooling, optimizing API throughput, or understanding tier-specific quotas for Auth, Storage, Realtime, and Edge Functions. Trigger: "supabase rate limit", "supabase 429", "supabase throttle", "supabase quota", "supabase connection pool", "supabase too many requests".
stackblitz-rate-limits
WebContainer resource limits: memory, CPU, file system size, process count. Use when working with WebContainers or StackBlitz SDK. Trigger: "webcontainer limits".