backend-hang-debug

Diagnose and fix FastAPI hangs caused by blocking ThreadPoolExecutor shutdown in the news stream route; includes py-spy capture and non-blocking executor pattern.

242 stars

byaiskillstore

View on GitHub Installation ↓

Best use case

backend-hang-debug is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Diagnose and fix FastAPI hangs caused by blocking ThreadPoolExecutor shutdown in the news stream route; includes py-spy capture and non-blocking executor pattern.

Diagnose and fix FastAPI hangs caused by blocking ThreadPoolExecutor shutdown in the news stream route; includes py-spy capture and non-blocking executor pattern.

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "backend-hang-debug" skill to help with this workflow task. Context: Diagnose and fix FastAPI hangs caused by blocking ThreadPoolExecutor shutdown in the news stream route; includes py-spy capture and non-blocking executor pattern.

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

Do not use this when you only need a one-off answer and do not need a reusable workflow.
Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/backend-hang-debug/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/benderfendor/backend-hang-debug/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/backend-hang-debug/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How backend-hang-debug Compares

Feature / Agent	backend-hang-debug	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Diagnose and fix FastAPI hangs caused by blocking ThreadPoolExecutor shutdown in the news stream route; includes py-spy capture and non-blocking executor pattern.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Backend Hang Debug

## Purpose
- Detect and resolve event-loop hangs where the FastAPI app stops responding (e.g., `curl http://localhost:8000/` times out) due to synchronous executor shutdown in the SSE news stream.
- Provide a repeatable triage flow using `py-spy` to capture live stacks and pinpoint blocking code.

## Scope
- Backend: `backend/app/api/routes/stream.py` (news stream), `backend/app/services/rss_ingestion.py` (RSS workers), startup processes.
- Tooling: `py-spy` for live stack dumps; `curl` with timeouts for smoke tests.

## Quick Triage
1. **Reproduce hang**: `curl -m 5 http://localhost:8000/` and `curl -m 5 http://localhost:8000/health`; note timeouts.
2. **Process check**: `ss -tlnp | grep 8000` to confirm listener; `ls /proc/$(pgrep -f "uvicorn app.main")/fd | wc -l` to rule out FD leak.
3. **Stack capture** (inside backend venv): `uv pip install py-spy` then `sudo /home/bender/classwork/Thesis/backend/.venv/bin/py-spy dump --pid $(pgrep -f "uvicorn app.main")` (and worker pid if multiprocess). Look for `ThreadPoolExecutor.shutdown` in `api/routes/stream.py` frames.

## Fix Pattern (non-blocking executor)
- Replace synchronous context manager `with ThreadPoolExecutor(...):` inside `event_generator` with a long-lived executor plus explicit **non-blocking** shutdown:
  - Create executor outside the context manager.
  - On client disconnect, cancel pending futures instead of awaiting shutdown.
  - In `finally`, call `executor.shutdown(wait=False, cancel_futures=True)`.
- Rationale: context manager calls `shutdown(wait=True)`, blocking the event loop if RSS worker threads hang on network I/O.

## Implementation Steps
1. **Update stream executor usage** in `backend/app/api/routes/stream.py`:
   - Instantiate `executor = concurrent.futures.ThreadPoolExecutor(max_workers=5)`.
   - Dispatch work via `loop.run_in_executor(executor, _process_source_with_debug, ...)`.
   - On disconnect, `cancel()` pending futures.
   - In `finally`, `executor.shutdown(wait=False, cancel_futures=True)`.
2. **Keep RSS executor as-is** (`rss_ingestion.py`) since it runs in background threads, but ensure request timeouts remain reasonable (currently 60s per RSS `requests.get`).
3. **Retest**:
   - Restart uvicorn; `curl -m 5 http://localhost:8000/health` should respond.
   - Start a stream request and abort the client; server must stay responsive.
   - Re-run `py-spy dump` to verify no `ThreadPoolExecutor.shutdown(wait=True)` frames in main thread.

## Verification Checklist
- [ ] `curl -m 5 http://localhost:8000/` returns a response (no hang).
- [ ] `curl -m 5 http://localhost:8000/health` succeeds.
- [ ] Aborting `/news/stream` does **not** freeze subsequent requests.
- [ ] `py-spy dump` shows event loop not blocked on `ThreadPoolExecutor.shutdown`.
- [ ] Frontend no longer stalls waiting on root/health while backend is busy with streams.

## Notes & Future Hardening
- Consider adding request timeout middleware to fail fast on slow handlers.
- Add per-source network timeouts and shorter retries for RSS feeds to reduce long-lived threads.
- If multi-worker uvicorn is used, run `py-spy` on each worker pid when diagnosing hangs.

Related Skills

req-change-workflow

242

from aiskillstore/marketplace

Standardize requirement/feature changes in an existing codebase (especially Chrome extensions) by turning "改需求/需求变更/调整交互/改功能/重构流程" into a repeatable loop: clarify acceptance criteria, confirm current behavior from code, assess impact/risk, design the new logic, implement with small diffs, run a fixed regression checklist, and update docs/decision log. Use when the user feels the change process is chaotic, when edits tend to sprawl across files, or when changes touch manifest/service worker/OAuth/storage/UI and need reliable verification + rollback planning.

woocommerce-backend-dev

242

from aiskillstore/marketplace

Add or modify WooCommerce backend PHP code following project conventions. Use when creating new classes, methods, hooks, or modifying existing backend code. **MUST be invoked before writing any PHP unit tests.**

changelog-maintenance

242

from aiskillstore/marketplace

Maintain a clear and informative changelog for software releases. Use when documenting version changes, tracking features, or communicating updates to users. Handles semantic versioning, changelog formats, and release notes.

backend-testing

242

from aiskillstore/marketplace

Write comprehensive backend tests including unit tests, integration tests, and API tests. Use when testing REST APIs, database operations, authentication flows, or business logic. Handles Jest, Pytest, Mocha, testing strategies, mocking, and test coverage.

game-changing-features

242

from aiskillstore/marketplace

Find 10x product opportunities and high-leverage improvements. Use when user wants strategic product thinking, mentions '10x', wants to find high-impact features, or says 'what would make this 10x better', 'product strategy', or 'what should we build next'.

wiki-changelog

242

from aiskillstore/marketplace

Analyzes git commit history and generates structured changelogs categorized by change type. Use when the user asks about recent changes, wants a changelog, or needs to understand what changed in the repository.

nodejs-backend-patterns

242

from aiskillstore/marketplace

Build production-ready Node.js backend services with Express/Fastify, implementing middleware patterns, error handling, authentication, database integration, and API design best practices. Use when creating Node.js servers, REST APIs, GraphQL backends, or microservices architectures.

error-diagnostics-smart-debug

242

from aiskillstore/marketplace

Use when working with error diagnostics smart debug

error-debugging-multi-agent-review

242

from aiskillstore/marketplace

Use when working with error debugging multi agent review

error-debugging-error-trace

242

from aiskillstore/marketplace

You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging, and ensure teams can quickly identify and resolve production issues.

error-debugging-error-analysis

242

from aiskillstore/marketplace

You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.

dotnet-backend

242

from aiskillstore/marketplace

Build ASP.NET Core 8+ backend services with EF Core, auth, background jobs, and production API patterns.