onenote-rate-limits

Implement proper rate limit handling for OneNote Graph API with queue-based throttling. Use when building high-throughput OneNote integrations or debugging 429 errors. Trigger with "onenote rate limit", "onenote 429", "onenote throttling", "graph api throttle".

1,868 stars

byjeremylongshore

View on GitHub Installation ↓

Best use case

onenote-rate-limits is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using onenote-rate-limits should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/onenote-rate-limits/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/saas-packs/onenote-pack/skills/onenote-rate-limits/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/onenote-rate-limits/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How onenote-rate-limits Compares

Feature / Agent	onenote-rate-limits	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

SKILL.md Source

# OneNote — Rate Limit Handling & Request Throttling

## Overview

Microsoft Graph rate limits OneNote at **600 requests per 60 seconds per user** and **10,000 requests per 10 minutes per app/tenant**. When you exceed either limit, the API returns `429 Too Many Requests` with a `Retry-After` header specifying how many seconds to wait. Most implementations either ignore this header entirely (retrying immediately, making things worse) or use a fixed backoff that wastes capacity.

This skill implements a token bucket rate limiter, queue-based request throttling, and proper `Retry-After` header parsing. For multi-user apps, it tracks per-user and per-tenant budgets independently.

Key pain points addressed:
- The `Retry-After` header value is in seconds (not milliseconds) — many implementations parse this wrong
- The per-user limit (600/60s) is separate from the per-tenant limit (10,000/10min) — you can hit one without the other
- Batch requests (`$batch`) count as one request toward the limit, regardless of how many operations are inside
- After a 429, subsequent requests to ANY OneNote endpoint are throttled — not just the endpoint that triggered it

## Prerequisites

- Azure app registration with delegated permissions: `Notes.ReadWrite`
- App-only auth deprecated March 31, 2025 — use delegated auth only
- Python: `pip install msgraph-sdk azure-identity`
- Node/TypeScript: `npm install @microsoft/microsoft-graph-client @azure/identity @azure/msal-node`
- Optional: `npm install p-queue` for production queue management

## Instructions

### Step 1 — Understand the Rate Limit Structure

| Limit | Scope | Window | Threshold |
|-------|-------|--------|-----------|
| Per-user | Single user's delegated token | 60 seconds (rolling) | 600 requests |
| Per-tenant | All users + all apps in the tenant | 10 minutes (rolling) | 10,000 requests |

When either limit is hit:
- Response status: `429 Too Many Requests`
- Response header: `Retry-After: <seconds>` (integer, not milliseconds)
- All subsequent OneNote requests for that scope are blocked until the window resets
- Non-OneNote Graph endpoints (Outlook, OneDrive) are **not** affected

### Step 2 — Token Bucket Rate Limiter (TypeScript)

A token bucket preemptively throttles requests to stay below the limit, avoiding 429s entirely:

```typescript
class TokenBucket {
  private tokens: number;
  private lastRefill: number;
  private readonly maxTokens: number;
  private readonly refillRate: number; // tokens per millisecond

  constructor(maxTokens: number, refillWindowMs: number) {
    this.maxTokens = maxTokens;
    this.tokens = maxTokens;
    this.lastRefill = Date.now();
    this.refillRate = maxTokens / refillWindowMs;
  }

  private refill(): void {
    const now = Date.now();
    const elapsed = now - this.lastRefill;
    this.tokens = Math.min(this.maxTokens, this.tokens + elapsed * this.refillRate);
    this.lastRefill = now;
  }

  async acquire(): Promise<void> {
    this.refill();
    if (this.tokens >= 1) {
      this.tokens -= 1;
      return;
    }
    // Wait until a token is available
    const waitMs = Math.ceil((1 - this.tokens) / this.refillRate);
    await new Promise((resolve) => setTimeout(resolve, waitMs));
    this.tokens -= 1;
  }

  get available(): number {
    this.refill();
    return Math.floor(this.tokens);
  }
}

// Per-user bucket: 600 requests per 60 seconds
const userBucket = new TokenBucket(600, 60_000);

// Use with a safety margin (80% of limit)
const safeUserBucket = new TokenBucket(480, 60_000);
```

### Step 3 — Queue-Based Request Throttling

Wrap all OneNote API calls through a throttled queue that respects both the token bucket and `Retry-After` headers:

```typescript
import { Client } from "@microsoft/microsoft-graph-client";

class ThrottledOneNoteClient {
  private bucket: TokenBucket;
  private queue: Array<{
    resolve: (value: any) => void;
    reject: (error: any) => void;
    fn: () => Promise<any>;
  }> = [];
  private processing = false;
  private retryAfterUntil: number = 0; // Timestamp when retry-after expires

  constructor(
    private client: Client,
    maxRequestsPerMinute: number = 480 // 80% safety margin
  ) {
    this.bucket = new TokenBucket(maxRequestsPerMinute, 60_000);
  }

  async request<T>(fn: (client: Client) => Promise<T>): Promise<T> {
    return new Promise((resolve, reject) => {
      this.queue.push({ resolve, reject, fn: () => fn(this.client) });
      this.processQueue();
    });
  }

  private async processQueue(): Promise<void> {
    if (this.processing) return;
    this.processing = true;

    while (this.queue.length > 0) {
      // Respect Retry-After if we've been throttled
      const now = Date.now();
      if (this.retryAfterUntil > now) {
        const waitMs = this.retryAfterUntil - now;
        console.warn(`Rate limited — waiting ${Math.ceil(waitMs / 1000)}s`);
        await new Promise((r) => setTimeout(r, waitMs));
      }

      await this.bucket.acquire();
      const item = this.queue.shift()!;

      try {
        const result = await item.fn();
        item.resolve(result);
      } catch (err: any) {
        if (err.statusCode === 429) {
          const retryAfter = parseInt(err.headers?.["retry-after"] ?? "30", 10);
          this.retryAfterUntil = Date.now() + retryAfter * 1000;
          // Re-queue the failed request
          this.queue.unshift(item);
          console.warn(`429 received — Retry-After: ${retryAfter}s`);
        } else {
          item.reject(err);
        }
      }
    }

    this.processing = false;
  }
}

// Usage
const throttled = new ThrottledOneNoteClient(client);
const notebooks = await throttled.request((c) =>
  c.api("/me/onenote/notebooks").get()
);
```

### Step 4 — Per-User Tracking for Multi-User Apps

Multi-user apps must track rate limits per user, not globally:

```typescript
class MultiUserRateLimiter {
  private userBuckets: Map<string, TokenBucket> = new Map();
  private tenantBucket: TokenBucket;

  constructor() {
    // Tenant-wide: 10,000 per 10 minutes
    this.tenantBucket = new TokenBucket(8_000, 600_000); // 80% safety margin
  }

  async acquire(userId: string): Promise<void> {
    // Get or create per-user bucket
    if (!this.userBuckets.has(userId)) {
      this.userBuckets.set(userId, new TokenBucket(480, 60_000));
    }
    const userBucket = this.userBuckets.get(userId)!;

    // Must acquire from BOTH buckets
    await userBucket.acquire();
    await this.tenantBucket.acquire();
  }

  getStatus(userId: string): { userRemaining: number; tenantRemaining: number } {
    const userBucket = this.userBuckets.get(userId);
    return {
      userRemaining: userBucket?.available ?? 480,
      tenantRemaining: this.tenantBucket.available,
    };
  }
}
```

### Step 5 — Exponential Backoff with Jitter

For 429 responses without a `Retry-After` header (rare but possible), use exponential backoff with jitter:

```typescript
async function withBackoff<T>(
  fn: () => Promise<T>,
  maxRetries: number = 5
): Promise<T> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err: any) {
      if (err.statusCode !== 429 || attempt === maxRetries) throw err;

      const retryAfter = err.headers?.["retry-after"];
      let delayMs: number;

      if (retryAfter) {
        // Prefer server-specified delay (in seconds)
        delayMs = parseInt(retryAfter, 10) * 1000;
      } else {
        // Exponential backoff: 1s, 2s, 4s, 8s, 16s + jitter
        const base = Math.pow(2, attempt) * 1000;
        const jitter = Math.random() * 1000;
        delayMs = base + jitter;
      }

      console.warn(`Retry ${attempt + 1}/${maxRetries} in ${Math.ceil(delayMs / 1000)}s`);
      await new Promise((r) => setTimeout(r, delayMs));
    }
  }
  throw new Error("Unreachable");
}

// Usage
const pages = await withBackoff(() =>
  client.api("/me/onenote/pages").top(50).get()
);
```

### Step 6 — Batch Requests to Reduce Call Count

The Graph `$batch` endpoint lets you send up to 20 operations in a single HTTP request. The entire batch counts as **one** request toward your rate limit:

```typescript
async function batchGetPages(client: Client, pageIds: string[]): Promise<any[]> {
  const batchSize = 20; // Graph batch limit
  const allResults: any[] = [];

  for (let i = 0; i < pageIds.length; i += batchSize) {
    const chunk = pageIds.slice(i, i + batchSize);
    const batchBody = {
      requests: chunk.map((id, idx) => ({
        id: String(idx + 1),
        method: "GET",
        url: `/me/onenote/pages/${id}?$select=id,title,lastModifiedDateTime`,
      })),
    };

    const batchResponse = await client.api("/$batch").post(batchBody);
    for (const response of batchResponse.responses) {
      if (response.status === 200) {
        allResults.push(response.body);
      } else {
        console.warn(`Batch item ${response.id} failed: ${response.status}`);
      }
    }
  }
  return allResults;
}

// 100 pages = 5 HTTP requests instead of 100
const pages = await batchGetPages(client, hundredPageIds);
```

### Step 7 — Python Rate Limiter with asyncio

```python
import asyncio
import time

class RateLimiter:
    """Token bucket rate limiter for OneNote Graph API."""

    def __init__(self, max_requests: int = 480, window_seconds: int = 60):
        self.max_tokens = max_requests
        self.tokens = float(max_requests)
        self.refill_rate = max_requests / window_seconds
        self.last_refill = time.monotonic()
        self._lock = asyncio.Lock()

    async def acquire(self):
        async with self._lock:
            now = time.monotonic()
            elapsed = now - self.last_refill
            self.tokens = min(self.max_tokens, self.tokens + elapsed * self.refill_rate)
            self.last_refill = now

            if self.tokens < 1:
                wait = (1 - self.tokens) / self.refill_rate
                await asyncio.sleep(wait)
                self.tokens = 0
            else:
                self.tokens -= 1

# Usage — combines token bucket with Retry-After handling
limiter = RateLimiter(max_requests=480, window_seconds=60)

async def safe_get_pages(client, section_id: str, max_retries: int = 3):
    for attempt in range(max_retries):
        await limiter.acquire()
        try:
            return await client.me.onenote.sections.by_onenote_section_id(
                section_id
            ).pages.get()
        except Exception as e:
            # Handle 429 with Retry-After header
            if hasattr(e, "response") and e.response.status_code == 429 and attempt < max_retries - 1:
                retry_after = int(e.response.headers.get("Retry-After", "30"))
                await asyncio.sleep(retry_after)
            else:
                raise
    raise RuntimeError("Max retries exceeded for OneNote API call")
```

### Step 8 — Monitor and Adjust Preemptively

Track your 429 rate over time and adjust thresholds:

```typescript
class RateLimitMonitor {
  private requestCount = 0;
  private throttleCount = 0;
  private windowStart = Date.now();

  record(wasThrottled: boolean): void {
    this.requestCount++;
    if (wasThrottled) this.throttleCount++;
  }

  getMetrics(): { total: number; throttled: number; throttleRate: number; windowMinutes: number } {
    const windowMinutes = (Date.now() - this.windowStart) / 60_000;
    return {
      total: this.requestCount,
      throttled: this.throttleCount,
      throttleRate: this.throttleCount / Math.max(this.requestCount, 1),
      windowMinutes: Math.round(windowMinutes * 10) / 10,
    };
  }

  // Alert if throttle rate exceeds threshold
  shouldReduceRate(): boolean {
    return this.getMetrics().throttleRate > 0.05; // >5% throttled = slow down
  }
}
```

## Output

Rate limit handling produces:
- Preemptive throttling via token bucket — requests are delayed before sending, not after 429
- `Retry-After` compliance — exact server-specified delays honored
- Batch consolidation — 20 operations per HTTP request for bulk workloads
- Monitoring metrics — request count, throttle count, throttle rate percentage

## Error Handling

| Status | Cause | Fix |
|--------|-------|-----|
| 429 (with Retry-After) | Per-user or per-tenant limit exceeded | Wait exactly `Retry-After` seconds; do not retry sooner |
| 429 (no Retry-After) | Rare edge case, limit exceeded | Exponential backoff with jitter starting at 1 second |
| 503 | Service throttling under load | Treat like 429 — backoff and retry |
| 500 | Internal error during throttled state | Do not count as rate limit; retry with normal backoff |

## Examples

**Calculate request budget for polling + CRUD:**
```typescript
const BUDGET_PER_MINUTE = 600;
const SAFETY_MARGIN = 0.8; // Use 80% of limit
const safeBudget = BUDGET_PER_MINUTE * SAFETY_MARGIN; // 480

// Allocate budget
const pollingSections = 20;
const pollIntervalSec = 30;
const pollRequestsPerMin = pollingSections * (60 / pollIntervalSec); // 40/min

const remainingForCrud = safeBudget - pollRequestsPerMin; // 440/min for user operations
console.log(`Polling: ${pollRequestsPerMin}/min | CRUD: ${remainingForCrud}/min`);
```

**Production health check:**
```typescript
const monitor = new RateLimitMonitor();
// After each API call:
monitor.record(/* wasThrottled */ false);

// Periodic check
setInterval(() => {
  const metrics = monitor.getMetrics();
  if (monitor.shouldReduceRate()) {
    console.warn(`High throttle rate: ${(metrics.throttleRate * 100).toFixed(1)}%`);
    // Dynamically increase poll interval or reduce batch concurrency
  }
}, 60_000);
```

## Resources

- [OneNote API Overview](https://learn.microsoft.com/en-us/graph/api/resources/onenote-api-overview)
- [Error Codes](https://learn.microsoft.com/en-us/graph/onenote-error-codes)
- [Best Practices](https://learn.microsoft.com/en-us/graph/onenote-best-practices)
- [Known Issues](https://learn.microsoft.com/en-us/graph/known-issues)
- [Graph API Reference](https://learn.microsoft.com/en-us/graph/api/overview)
- [Graph Explorer](https://developer.microsoft.com/en-us/graph/graph-explorer)

## Next Steps

- See `onenote-webhooks-events` for polling patterns that consume rate budget
- See `onenote-performance-tuning` for batch operations and `$select` to reduce payload size
- See `onenote-core-workflow-a` for CRUD operations that benefit from throttled clients

Related Skills

workhuman-rate-limits

1868

from jeremylongshore/claude-code-plugins-plus-skills

Workhuman rate limits for employee recognition and rewards API. Use when integrating Workhuman Social Recognition, or building recognition workflows with HRIS systems. Trigger: "workhuman rate limits".

wispr-rate-limits

1868

from jeremylongshore/claude-code-plugins-plus-skills

Wispr Flow rate limits for voice-to-text API integration. Use when integrating Wispr Flow dictation, WebSocket streaming, or building voice-powered applications. Trigger: "wispr rate limits".

windsurf-rate-limits

1868

from jeremylongshore/claude-code-plugins-plus-skills

Understand and manage Windsurf credit system, usage limits, and model selection. Use when running out of credits, optimizing AI usage costs, or understanding the credit-per-model pricing structure. Trigger with phrases like "windsurf credits", "windsurf rate limit", "windsurf usage", "windsurf out of credits", "windsurf model costs".

webflow-rate-limits

1868

from jeremylongshore/claude-code-plugins-plus-skills

Handle Webflow Data API v2 rate limits — per-key limits, Retry-After headers, exponential backoff, request queuing, and bulk endpoint optimization. Use when hitting 429 errors, implementing retry logic, or optimizing API request throughput. Trigger with phrases like "webflow rate limit", "webflow throttling", "webflow 429", "webflow retry", "webflow backoff", "webflow too many requests".

vercel-rate-limits

1868

from jeremylongshore/claude-code-plugins-plus-skills

Handle Vercel API rate limits, implement retry logic, and configure WAF rate limiting. Use when hitting 429 errors, implementing retry logic, or setting up rate limiting for your Vercel-deployed API endpoints. Trigger with phrases like "vercel rate limit", "vercel throttling", "vercel 429", "vercel retry", "vercel backoff", "vercel WAF rate limit".

veeva-rate-limits

1868

from jeremylongshore/claude-code-plugins-plus-skills

Veeva Vault rate limits for REST API and clinical operations. Use when working with Veeva Vault document management and CRM. Trigger: "veeva rate limits".

vastai-rate-limits

1868

from jeremylongshore/claude-code-plugins-plus-skills

Handle Vast.ai API rate limits with backoff and request optimization. Use when encountering 429 errors, implementing retry logic, or optimizing API request throughput. Trigger with phrases like "vastai rate limit", "vastai throttling", "vastai 429", "vastai retry", "vastai backoff".

twinmind-rate-limits

1868

from jeremylongshore/claude-code-plugins-plus-skills

Implement TwinMind rate limiting, backoff, and optimization patterns. Use when handling rate limit errors, implementing retry logic, or optimizing API request throughput for TwinMind. Trigger with phrases like "twinmind rate limit", "twinmind throttling", "twinmind 429", "twinmind retry", "twinmind backoff".

together-rate-limits

1868

from jeremylongshore/claude-code-plugins-plus-skills

Together AI rate limits for inference, fine-tuning, and model deployment. Use when working with Together AI's OpenAI-compatible API. Trigger: "together rate limits".

techsmith-rate-limits

1868

from jeremylongshore/claude-code-plugins-plus-skills

TechSmith rate limits for Snagit COM API and Camtasia automation. Use when working with TechSmith screen capture and video editing automation. Trigger: "techsmith rate limits".

supabase-rate-limits

1868

from jeremylongshore/claude-code-plugins-plus-skills

Manage Supabase rate limits and quotas across all plan tiers. Use when hitting 429 errors, configuring connection pooling, optimizing API throughput, or understanding tier-specific quotas for Auth, Storage, Realtime, and Edge Functions. Trigger: "supabase rate limit", "supabase 429", "supabase throttle", "supabase quota", "supabase connection pool", "supabase too many requests".

stackblitz-rate-limits

1868

from jeremylongshore/claude-code-plugins-plus-skills

WebContainer resource limits: memory, CPU, file system size, process count. Use when working with WebContainers or StackBlitz SDK. Trigger: "webcontainer limits".