prompt-repetition

A prompt repetition technique for improving LLM accuracy. Achieves significant performance gains in 67% (47/70) of 70 benchmarks. Automatically applied on lightweight models (haiku, flash, mini).

242 stars

byaiskillstore

View on GitHub Installation ↓

Best use case

prompt-repetition is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. A prompt repetition technique for improving LLM accuracy. Achieves significant performance gains in 67% (47/70) of 70 benchmarks. Automatically applied on lightweight models (haiku, flash, mini).

A prompt repetition technique for improving LLM accuracy. Achieves significant performance gains in 67% (47/70) of 70 benchmarks. Automatically applied on lightweight models (haiku, flash, mini).

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "prompt-repetition" skill to help with this workflow task. Context: A prompt repetition technique for improving LLM accuracy. Achieves significant performance gains in 67% (47/70) of 70 benchmarks. Automatically applied on lightweight models (haiku, flash, mini).

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

Do not use this when you only need a one-off answer and do not need a reusable workflow.
Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/prompt-repetition/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/supercent-io/prompt-repetition/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/prompt-repetition/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How prompt-repetition Compares

Feature / Agent	prompt-repetition	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

A prompt repetition technique for improving LLM accuracy. Achieves significant performance gains in 67% (47/70) of 70 benchmarks. Automatically applied on lightweight models (haiku, flash, mini).

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Prompt Repetition

## Problem Being Solved

LLMs are trained as **Causal Language Models**, where each token attends only to **previous tokens**. This leads to:

1. **Context-Question Problem**: The question is unknown when processing context
2. **Options-First MCQ Problem**: Cannot fully understand the question context when viewing answer choices
3. **Position/Index Problem**: Attention weights weaken for specific position information in long lists

**Prompt repetition** enables the second pass to reference the entire first pass, effectively **mimicking some benefits of bidirectional attention**.

---

## When to use this skill

- **When using lightweight models**: claude-haiku, gemini-flash, gpt-4o-mini, etc.
- **Options-First MCQ**: Multiple choice where answer choices appear before the question
- **Context + Question**: Searching for specific information in long contexts
- **Index/Position Tasks**: Position-based queries in inventories or lists
- **NPC Dialogue**: Maintaining consistency for game AI characters
- **Non-Reasoning Tasks**: Tasks that do not use Chain-of-Thought

---

## How It Works

### Limitations of Causal Attention

```
[Context] → [Question]
    ↓
Cannot reference Question content when processing Context tokens
Attention weights for Context are already finalized by the time Question tokens appear
```

### How Prompt Repetition Solves This

```
[First Pass]                [Second Pass]
Context → Question    →    Context' → Question'
                              ↑         ↑
                          Can reference entire first pass
```

In the second repetition, the model **reprocesses information across the entire first prompt** and **strengthens attention weights on key concepts**, resulting in improved performance.

> **Note**: This does not change the model architecture to bidirectional; it is a prompt engineering technique to mitigate the limitations of causal models.

---

## Research Results (Google Research 2025)

| Metric | Result |
|--------|--------|
| **Significant improvement** (p < 0.1) | 47 / 70 benchmarks |
| **Performance degradation** | 0 |
| **Neutral** | 23 |
| **Improvement rate** | 67% |

**Most dramatic improvement:** Gemini 2.0 Flash-Lite on NameIndex: **21.33% → 97.33%** (+76%p)

### Tested Models

- Gemini 2.0 Flash / Flash Lite
- GPT-4o / GPT-4o-mini
- Claude 3.7 Sonnet / Claude 3 Haiku
- Deepseek V3

### Tested Benchmarks

- ARC (Challenge) - Scientific reasoning
- OpenBookQA - Open-domain QA
- GSM8K - Math problems
- MMLU-Pro - Multitask language understanding
- MATH - Mathematical problem solving
- NameIndex / MiddleMatch - Custom position tasks

---

## Application Procedure

### Step 1: Verify Auto-Apply Target Models

| Provider | Auto-apply models | Excluded models |
|----------|------------------|-----------------|
| Claude | haiku series | opus, sonnet |
| Gemini | flash, flash-lite | pro, ultra |
| OpenAI | gpt-4o-mini, gpt-low | gpt-4o, gpt-4 |

### Step 2: Determine Repetition Count by Task Type

| Task Type | Keyword Pattern | Repetitions | Expected Improvement |
|-----------|----------------|-------------|----------------------|
| Options-First MCQ | `A. B. C. D.` choices first | 2× | +15-40%p |
| Index/Position | `slot`, `position`, `index`, `N-th` | **3×** | +50-76%p |
| Context + Question | General question | 2× | +5-15%p |
| With CoT | `step by step`, `think through` | **0×** (not applied) | ~0% |

### Step 3: Check Token Limits

```python
# Check context before auto-apply
max_context = model_context_window * 0.8  # 80% safety margin
if len(prompt_tokens) * repetitions > max_context:
    repetitions = max(1, int(max_context / len(prompt_tokens)))
```

### Step 4: Prompt Transformation

```python
def apply_prompt_repetition(prompt: str, times: int = 2) -> str:
    """Repeat the prompt a specified number of times

    Args:
        prompt: Original prompt
        times: Number of repetitions (default 2)

    Returns:
        Repeated prompt
    """
    if times <= 1:
        return prompt
    return "\n\n".join([prompt] * times)
```

---

## Practical Examples

### Example 1: Options-First MCQ (Greatest Effect)

**Before:**
```
A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.
```

**After (repetition ×2 applied):**
```
A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.

A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.
```

**Expected output:**
```
A
```
Accuracy: original 78% → after repetition 93% (+15%p)

---

### Example 2: Index/Position Tasks (Maximum Effect)

**Before:**
```
Inventory:
1. Iron Sword
2. Leather Armor
3. Health Potion (x5)
4. Magic Staff
...
25. Dragon Scale
...
50. Ancient Map

What item is in slot 25?
```

**After (repetition ×3 applied):**
Prompt repeated 3 times

**Expected output:**
```
Dragon Scale
```
Accuracy: original 21% → after repetition 97% (+76%p)

---

### Example 3: Tool Call Prompt Handling

**Note**: Prompts containing tool call instructions are also **repeated in their entirety**. The full-repetition approach was adopted for implementation simplicity and consistency.

**Before:**
```
Use the calculator tool to compute 234 * 567.
What is the result?
```

**After (repetition ×2):**
```
Use the calculator tool to compute 234 * 567.
What is the result?

Use the calculator tool to compute 234 * 567.
What is the result?
```

> Research results show that full repetition including tool call sections is also effective.

---

## Production-Ready Implementation

### Auto-Apply Transformer

```python
"""prompt_repetition_transformer.py"""
from dataclasses import dataclass, field
from typing import Optional, Callable, List
import re

# Context window per model (in tokens)
MODEL_CONTEXT_WINDOWS = {
    "claude-3-haiku": 200_000,
    "claude-haiku": 200_000,
    "gemini-flash": 1_000_000,
    "gemini-flash-lite": 1_000_000,
    "gemini-2.0-flash": 1_000_000,
    "gpt-4o-mini": 128_000,
    "gpt-low": 128_000,
}

# Models targeted for auto-apply
AUTO_APPLY_MODELS = list(MODEL_CONTEXT_WINDOWS.keys())

# CoT patterns (excluded from apply)
COT_PATTERNS = [
    r"step by step",
    r"think through",
    r"let's think",
    r"reasoning:",
    r"chain of thought",
]

# Position/Index patterns (3× repetition)
POSITION_PATTERNS = [
    r"slot \d+",
    r"position \d+",
    r"index \d+",
    r"\d+(st|nd|rd|th)",
    r"item \d+",
    r"row \d+",
    r"column \d+",
]

@dataclass
class PromptRepetitionConfig:
    """Prompt repetition configuration"""
    default_repetitions: int = 2
    position_repetitions: int = 3
    separator: str = "\n\n"
    max_context_ratio: float = 0.8
    applied_marker: str = "<!-- prompt-repetition-applied -->"

class PromptRepetitionTransformer:
    """Auto-apply prompt repetition transformer for lightweight models"""

    def __init__(self, config: Optional[PromptRepetitionConfig] = None):
        self.config = config or PromptRepetitionConfig()

    def should_apply(self, model: str, prompt: str) -> bool:
        """Determine whether to auto-apply"""
        # Skip if already applied
        if self.config.applied_marker in prompt:
            return False

        # Check target model
        model_lower = model.lower()
        if not any(m in model_lower for m in AUTO_APPLY_MODELS):
            return False

        # Skip when CoT pattern detected
        prompt_lower = prompt.lower()
        for pattern in COT_PATTERNS:
            if re.search(pattern, prompt_lower):
                return False

        return True

    def determine_repetitions(self, prompt: str, model: str) -> int:
        """Determine repetition count based on task type"""
        prompt_lower = prompt.lower()

        # Position/Index pattern detected → 3×
        for pattern in POSITION_PATTERNS:
            if re.search(pattern, prompt_lower):
                return self.config.position_repetitions

        return self.config.default_repetitions

    def estimate_tokens(self, text: str) -> int:
        """Simple token count estimation (speed over precision)"""
        # Estimate approximately 4 characters = 1 token
        return len(text) // 4

    def transform(self, prompt: str, model: str) -> str:
        """Apply repetition to prompt"""
        if not self.should_apply(model, prompt):
            return prompt

        repetitions = self.determine_repetitions(prompt, model)

        # Check context limit
        model_lower = model.lower()
        max_tokens = 128_000  # Default value
        for m, tokens in MODEL_CONTEXT_WINDOWS.items():
            if m in model_lower:
                max_tokens = tokens
                break

        max_allowed = int(max_tokens * self.config.max_context_ratio)
        prompt_tokens = self.estimate_tokens(prompt)

        # Reduce repetitions if token limit exceeded
        while prompt_tokens * repetitions > max_allowed and repetitions > 1:
            repetitions -= 1

        if repetitions <= 1:
            return prompt

        # Apply repetition + add marker
        repeated = self.config.separator.join([prompt] * repetitions)
        return f"{self.config.applied_marker}\n{repeated}"

    def wrap_llm_call(self, llm_fn: Callable, model: str) -> Callable:
        """Wrap LLM call function"""
        def wrapped(prompt: str, **kwargs):
            transformed = self.transform(prompt, model)
            return llm_fn(transformed, **kwargs)
        return wrapped
```

---

## How to Measure Effectiveness (Verification)

### A/B Testing Method

```python
def run_ab_test(prompts: List[str], llm_fn, model: str, ground_truth: List[str]):
    """A/B test for prompt repetition effectiveness"""
    transformer = PromptRepetitionTransformer()

    results = {"baseline": [], "repeated": []}

    for prompt, expected in zip(prompts, ground_truth):
        # Baseline
        response_a = llm_fn(prompt)
        results["baseline"].append(response_a == expected)

        # With Repetition
        repeated_prompt = transformer.transform(prompt, model)
        response_b = llm_fn(repeated_prompt)
        results["repeated"].append(response_b == expected)

    baseline_acc = sum(results["baseline"]) / len(prompts)
    repeated_acc = sum(results["repeated"]) / len(prompts)

    print(f"Baseline accuracy: {baseline_acc:.2%}")
    print(f"Repeated accuracy: {repeated_acc:.2%}")
    print(f"Improvement: {repeated_acc - baseline_acc:+.2%}p")
```

### Key Metrics

| Metric | Measurement Method |
|--------|-------------------|
| Accuracy | Compare correct answer rates |
| Consistency | Variance across 10 runs of same prompt |
| Token cost | Input token increase rate |
| Latency | Compare p50, p99 latency |

---

## When NOT to Use

| Case | Reason |
|------|--------|
| **Using CoT** | Reasoning process already provides context |
| **Reasoning models** (opus, sonnet) | Already optimized; minimal effect |
| **Very long prompts** | Risk of exceeding context limit |
| **Already repeated** | Duplicate application wastes tokens |

---

## Cost-Accuracy Analysis

| Metric | Baseline | With Repetition | Change |
|--------|----------|----------------|--------|
| Input tokens | 500/req | 1000/req | +100% |
| Output tokens | 100/req | 100/req | 0% |
| Latency (p50) | 450ms | 460ms | **+2%** |
| Latency (p99) | 1200ms | 1250ms | +4% |
| Accuracy | 78% | 89% | **+14%p** |
| Cost per correct answer | $0.019 | $0.020 | +5% |

**Key insight:** The prefill phase is highly parallelized on GPU, so doubling input tokens has minimal impact on latency.

---

## Multi-Agent Integration

### Auto-Apply Strategy Per Agent

| Agent | Model | Repetition Applied | Applied At |
|-------|-------|--------------------|------------|
| Claude Orchestrator | opus/sonnet | Optional | - |
| Claude Executor | **haiku** | **Auto** | skill_loader.py |
| Gemini Analyst | **flash** | **Auto** | On MCP call |
| OpenAI | **gpt-4o-mini** | **Auto** | skill_loader.py |

### Preventing Duplicate Application

To prevent duplicate application in multi-agent pipelines:

1. **Use markers**: Detect already-applied prompts with `<!-- prompt-repetition-applied -->` marker
2. **Pass metadata**: Pass `x-prompt-repetition-applied: true` header between agents
3. **Orchestrator management**: Claude Orchestrator tracks whether repetition is applied when calling sub-agents

### Application Pattern

```
[Claude Sonnet] Planning (no repetition needed)
    ↓
[Gemini Flash] Analysis (repetition ×2 auto-applied, marker added)
    ↓
[Claude Haiku] Execution (marker detected → skip duplicate apply)
```

---

## skill_loader.py Integration Guide

### Recommended Implementation

```python
# Code to add to skill_loader.py
from prompt_repetition_transformer import PromptRepetitionTransformer

class SkillLoader:
    def __init__(self, ...):
        # ... existing code ...
        self.prompt_transformer = PromptRepetitionTransformer()

    def apply_auto_skills(self, prompt: str, model: str) -> str:
        """Handle auto-apply skills"""
        # Auto-apply prompt-repetition
        for skill in self.skills.values():
            auto_apply = skill.get('data', {}).get('auto-apply', {})
            if auto_apply.get('trigger') == 'auto':
                target_models = auto_apply.get('models', [])
                if any(m in model.lower() for m in target_models):
                    prompt = self.prompt_transformer.transform(prompt, model)

        return prompt
```

---

## Constraints

### Required Rules

1. **Lightweight models first**: Most effective for haiku, flash, mini series
2. **Limit repetitions**: 2× for general tasks, max 3× for position tasks
3. **Context monitoring**: Be cautious of context overflow due to repetition
4. **Check markers**: Mandatory marker check to prevent duplicate application

### Prohibited Rules

1. **No padding substitution**: Increasing length with `.` etc. has no effect (per research)
2. **Do not combine with CoT**: Effects cancel out
3. **Do not force-apply to reasoning models**: Already optimized
4. **No duplicate application**: Consecutive application without markers wastes tokens

---

## Quick Reference

```
=== Auto-Apply Target Models ===
claude-3-haiku, claude-haiku
gemini-flash, gemini-flash-lite, gemini-2.0-flash
gpt-4o-mini, gpt-low

=== Repetition Count ===
General tasks: 2×
Position/Index (slot/position/index keywords): 3×
With CoT: 0× (not applied)

=== Effect (Google Research 2025) ===
Improvement rate: 67% (47/70 benchmarks)
Performance degradation: 0 cases
Maximum improvement: +76%p (NameIndex)

=== Cost ===
Input tokens: +100%
Latency: +2% (Prefill parallelization)
Cost per correct answer: +5%

=== Duplicate Application Prevention ===
Marker: <!-- prompt-repetition-applied -->
```

---

## References

- [Prompt Repetition Improves Non-Reasoning LLMs (Leviathan et al., 2025)](https://arxiv.org/)
- [Chain-of-Thought Prompting Elicits Reasoning (Wei et al., 2023)](https://arxiv.org/)
- [Re-Reading Improves Reasoning in LLMs (Xu et al., 2024)](https://arxiv.org/)

Related Skills

prompt-optimize

242

from aiskillstore/marketplace

Expert prompt engineering skill that transforms Claude into "Alpha-Prompt" - a master prompt engineer who collaboratively crafts high-quality prompts through flexible dialogue. Activates when user asks to "optimize prompt", "improve system instruction", "enhance AI instruction", or mentions prompt engineering tasks.

prompt-library

242

from aiskillstore/marketplace

Curated collection of high-quality prompts for various use cases. Includes role-based prompts, task-specific templates, and prompt refinement techniques. Use when user needs prompt templates, role-play prompts, or ready-to-use prompt examples for coding, writing, analysis, or creative tasks.

prompt-engineering-patterns

242

from aiskillstore/marketplace

Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability in production. Use when optimizing prompts, improving LLM outputs, or designing production prompt templates.

prompt-engineer

242

from aiskillstore/marketplace

Transforms user prompts into optimized prompts using frameworks (RTF, RISEN, Chain of Thought, RODES, Chain of Density, RACE, RISE, STAR, SOAP, CLEAR, GROW)

ClaudeCodex

prompt-caching

242

from aiskillstore/marketplace

Caching strategies for LLM prompts including Anthropic prompt caching, response caching, and CAG (Cache Augmented Generation) Use when: prompt caching, cache prompt, response cache, cag, cache augmented.

llm-application-dev-prompt-optimize

242

from aiskillstore/marketplace

You are an expert prompt engineer specializing in crafting effective prompts for LLMs through advanced techniques including constitutional AI, chain-of-thought reasoning, and model-specific optimizati

seedance-prompt

242

from aiskillstore/marketplace

为 Seedance 2.0 视频生成模型撰写产品宣传动画提示词（Motion Graphics Prompt）。生成 15 秒一镜到底的动态影像提示词，支持 4 种美学风格（Apple/Microsoft/Bauhaus/Vercel）。支持垫图（参考图）模式，自动标注图片占位符。生成提示词后可通过即梦 Dreamina CLI 直接生成视频（自动检测安装状态）。 TRIGGER when: 用户需要生成 Seedance 视频提示词、产品宣传动画提示词、motion graphics prompt，或提到 "Seedance"、"视频提示词"、"产品动画"、"宣传视频提示词"、"video prompt"。 Keywords: Seedance, 视频提示词, video prompt, motion graphics, 产品动画, 宣传动画, 垫图, Dreamina

video-prompting-guide

242

from aiskillstore/marketplace

Best practices and techniques for writing effective AI video generation prompts. Covers: Veo, Seedance, Wan, Grok, Kling, Runway, Pika, Sora prompting strategies. Learn: shot types, camera movements, lighting, pacing, style keywords, negative prompts. Use for: improving video quality, getting consistent results, professional video prompts. Triggers: video prompt, how to prompt video, veo prompts, video generation tips, better ai video, video prompt engineering, video prompt guide, video prompt template, ai video tips, video prompt best practices, video prompt examples, cinematography prompts

enhance-prompt

242

from aiskillstore/marketplace

Transforms vague UI ideas into polished, Stitch-optimized prompts. Enhances specificity, adds UI/UX keywords, injects design system context, and structures output for better generation results.

mcp-prompts-guide

242

from aiskillstore/marketplace

Create powerful MCP prompts that guide AI interactions with templates, arguments, and context injection

prompt-engineering

242

from aiskillstore/marketplace

Advanced prompt engineering techniques for optimal AI responses. Use this when crafting prompts, optimizing AI interactions, or designing system prompts for applications.

when-optimizing-prompts-use-prompt-optimization-analyzer

242

from aiskillstore/marketplace

Active diagnostic tool for analyzing prompt quality, detecting anti-patterns, identifying token waste, and providing optimization recommendations