nowait-reasoning-optimizer

Implements the NOWAIT technique for efficient reasoning in R1-style LLMs. Use when optimizing inference of reasoning models (QwQ, DeepSeek-R1, Phi4-Reasoning, Qwen3, Kimi-VL, QvQ), reducing chain-of-thought token usage by 27-51% while preserving accuracy. Triggers on "optimize reasoning", "reduce thinking tokens", "efficient inference", "suppress reflection tokens", or when working with verbose CoT outputs.

242 stars

Best use case

nowait-reasoning-optimizer is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Implements the NOWAIT technique for efficient reasoning in R1-style LLMs. Use when optimizing inference of reasoning models (QwQ, DeepSeek-R1, Phi4-Reasoning, Qwen3, Kimi-VL, QvQ), reducing chain-of-thought token usage by 27-51% while preserving accuracy. Triggers on "optimize reasoning", "reduce thinking tokens", "efficient inference", "suppress reflection tokens", or when working with verbose CoT outputs.

Implements the NOWAIT technique for efficient reasoning in R1-style LLMs. Use when optimizing inference of reasoning models (QwQ, DeepSeek-R1, Phi4-Reasoning, Qwen3, Kimi-VL, QvQ), reducing chain-of-thought token usage by 27-51% while preserving accuracy. Triggers on "optimize reasoning", "reduce thinking tokens", "efficient inference", "suppress reflection tokens", or when working with verbose CoT outputs.

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "nowait-reasoning-optimizer" skill to help with this workflow task. Context: Implements the NOWAIT technique for efficient reasoning in R1-style LLMs. Use when optimizing inference of reasoning models (QwQ, DeepSeek-R1, Phi4-Reasoning, Qwen3, Kimi-VL, QvQ), reducing chain-of-thought token usage by 27-51% while preserving accuracy. Triggers on "optimize reasoning", "reduce thinking tokens", "efficient inference", "suppress reflection tokens", or when working with verbose CoT outputs.

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

  • Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

  • Do not use this when you only need a one-off answer and do not need a reusable workflow.
  • Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/nowait-reasoning-optimizer/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/davila7/nowait-reasoning-optimizer/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/nowait-reasoning-optimizer/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How nowait-reasoning-optimizer Compares

Feature / Agentnowait-reasoning-optimizerStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Implements the NOWAIT technique for efficient reasoning in R1-style LLMs. Use when optimizing inference of reasoning models (QwQ, DeepSeek-R1, Phi4-Reasoning, Qwen3, Kimi-VL, QvQ), reducing chain-of-thought token usage by 27-51% while preserving accuracy. Triggers on "optimize reasoning", "reduce thinking tokens", "efficient inference", "suppress reflection tokens", or when working with verbose CoT outputs.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# NOWAIT Reasoning Optimizer

Implements the NOWAIT technique from the paper "Wait, We Don't Need to 'Wait'! Removing Thinking Tokens Improves Reasoning Efficiency" (Wang et al., 2025).

## Overview

NOWAIT is a training-free inference-time intervention that suppresses self-reflection tokens (e.g., "Wait", "Hmm", "Alternatively") during generation, reducing chain-of-thought (CoT) trajectory length by **27-51%** without compromising model utility.

## When to Use

- Deploying R1-style reasoning models with limited compute
- Reducing inference latency for production systems
- Optimizing token costs for reasoning tasks
- Working with verbose CoT outputs that need streamlining

## Supported Models

| Model Series | Type | Token Reduction |
|--------------|------|-----------------|
| QwQ-32B | RL-based | 16-31% |
| Phi4-Reasoning-Plus | RL-based | 23-28% |
| Qwen3-32B | RL-based | 13-16% |
| Kimi-VL-A3B | Multimodal | 40-60% |
| QvQ-72B-Preview | Multimodal | 20-30% |

**Important**: NOWAIT works best with RL-based models. Distilled models (Qwen3-4B/8B/14B) show degraded performance when reflection tokens are suppressed.

## Quick Start

### 1. Basic Implementation

```python
from scripts.nowait_processor import NOWAITLogitProcessor

# Initialize processor for your model's tokenizer
processor = NOWAITLogitProcessor(tokenizer)

# Use during generation
outputs = model.generate(
    inputs,
    logits_processor=[processor],
    max_new_tokens=32768
)
```

### 2. Keywords Suppressed

See `references/keywords.md` for the complete list. Core keywords:

```
wait, alternatively, hmm, but, however, check, 
double-check, maybe, verify, again, oh, ah
```

## How It Works

1. **Initialize Keywords**: Identify reflection keywords from empirical analysis
2. **Expand to Token Variants**: Map keywords to all token variants in vocabulary (e.g., "wait" → " wait", "Wait", " Wait", ".wait", "WAIT")
3. **Suppress During Inference**: Set logits of reflection tokens to large negative values during decoding

```
Logits (Before)         Logits (After)
Wait     0.8     →     Wait     -inf
First    0.6     →     First    0.6
Hmm      0.5     →     Hmm      -inf
Let      0.4     →     Let      0.4
```

## Key Findings

### Why It Works

- NOWAIT doesn't eliminate self-reflection entirely—it guides models to skip **unnecessary** "waiting" reasoning
- Models still perform essential verification at key decision points
- Results in more linear, straightforward reasoning paths

### RL vs Distilled Models

| Model Type | NOWAIT Effect | Recommendation |
|------------|---------------|----------------|
| RL-based (QwQ, Phi4, Qwen3-32B) | Stable accuracy, significant token reduction | ✅ Recommended |
| Distilled (Qwen3-4B/8B/14B) | Accuracy degradation on hard tasks | ⚠️ Use with caution |

Distilled models rely heavily on CoT structure from training data—removing reflection tokens disrupts their reasoning patterns.

## Integration Examples

### HuggingFace Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from scripts.nowait_processor import NOWAITLogitProcessor

model = AutoModelForCausalLM.from_pretrained("Qwen/QwQ-32B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/QwQ-32B")

processor = NOWAITLogitProcessor(tokenizer)

response = model.generate(
    tokenizer(prompt, return_tensors="pt").input_ids,
    logits_processor=[processor],
    max_new_tokens=32768,
    do_sample=True,
    temperature=0.7
)
```

### vLLM

```python
from vllm import LLM, SamplingParams
from scripts.nowait_processor import get_nowait_bad_words_ids

llm = LLM(model="Qwen/QwQ-32B")
bad_words_ids = get_nowait_bad_words_ids(llm.get_tokenizer())

sampling_params = SamplingParams(
    max_tokens=32768,
    bad_words_ids=bad_words_ids
)
```

## Expected Results

| Task Type | Original Tokens | NOWAIT Tokens | Reduction |
|-----------|-----------------|---------------|-----------|
| Math (AIME) | 15,000 | 10,500 | 30% |
| Visual QA (MMMU) | 2,900 | 1,450 | 50% |
| Video QA (MMVU) | 1,700 | 1,250 | 27% |

## Limitations

- Less effective on very simple problems where CoT overhead is already minimal
- Distilled models may suffer accuracy loss on challenging tasks
- Some domains may require model-specific keyword tuning

## References

- Paper: arXiv:2506.08343v2
- Complete keyword list: `references/keywords.md`
- Implementation: `scripts/nowait_processor.py`

Related Skills

seo-meta-optimizer

242
from aiskillstore/marketplace

Creates optimized meta titles, descriptions, and URL suggestions based on character limits and best practices. Generates compelling, keyword-rich metadata. Use PROACTIVELY for new content.

dx-optimizer

242
from aiskillstore/marketplace

Developer Experience specialist. Improves tooling, setup, and workflows. Use PROACTIVELY when setting up new projects, after team feedback, or when development friction is noticed.

database-optimizer

242
from aiskillstore/marketplace

Expert database optimizer specializing in modern performance tuning, query optimization, and scalable architectures. Masters advanced indexing, N+1 resolution, multi-tier caching, partitioning strategies, and cloud database optimization. Handles complex query analysis, migration strategies, and performance monitoring. Use PROACTIVELY for database optimization, performance issues, or scalability challenges.

aws-cost-optimizer

242
from aiskillstore/marketplace

Comprehensive AWS cost analysis and optimization recommendations using AWS CLI and Cost Explorer

cold-start-optimizer

242
from aiskillstore/marketplace

Provides guidance on reducing Lambda cold start times through binary optimization, lazy initialization, and deployment strategies. Activates when users discuss cold starts or deployment configuration.

when-optimizing-agent-learning-use-reasoningbank-intelligence

242
from aiskillstore/marketplace

Implement adaptive learning with ReasoningBank for pattern recognition, strategy optimization, and continuous improvement

reasoningbank-with-agentdb

242
from aiskillstore/marketplace

Implement ReasoningBank adaptive learning with AgentDB's 150x faster vector database. Includes trajectory tracking, verdict judgment, memory distillation, and pattern recognition. Use when building self-learning agents, optimizing decision-making, or implementing experience replay systems.

reasoningbank-intelligence

242
from aiskillstore/marketplace

Implement adaptive learning with ReasoningBank for pattern recognition, strategy optimization, and continuous improvement. Use when building self-learning agents, optimizing workflows, or implementing meta-cognitive systems.

reasoningbank-adaptive-learning-with-agentdb

242
from aiskillstore/marketplace

Implement ReasoningBank adaptive learning with AgentDB for trajectory tracking, verdict judgment, memory distillation, and pattern recognition to build self-learning agents that improve decision-making through experience.

query-optimizer

242
from aiskillstore/marketplace

Analyze and optimize SQL queries for better performance and efficiency.

seo-optimizer

242
from aiskillstore/marketplace

Audit and optimize WordPress SEO (Yoast/Rank Math) - checks focus keywords, meta descriptions, featured images. Uses Unsplash API for missing images. Run on all pages/posts to identify and fix SEO issues.

docker-optimizer

242
from aiskillstore/marketplace

Reviews Dockerfiles for best practices, security issues, and image size optimizations including multi-stage builds and layer caching. Use when working with Docker, containers, or deployment.