backend-hang-debug
Diagnose and fix FastAPI hangs caused by blocking ThreadPoolExecutor shutdown in the news stream route; includes py-spy capture and non-blocking executor pattern.
Best use case
backend-hang-debug is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Diagnose and fix FastAPI hangs caused by blocking ThreadPoolExecutor shutdown in the news stream route; includes py-spy capture and non-blocking executor pattern.
Diagnose and fix FastAPI hangs caused by blocking ThreadPoolExecutor shutdown in the news stream route; includes py-spy capture and non-blocking executor pattern.
Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.
Practical example
Example input
Use the "backend-hang-debug" skill to help with this workflow task. Context: Diagnose and fix FastAPI hangs caused by blocking ThreadPoolExecutor shutdown in the news stream route; includes py-spy capture and non-blocking executor pattern.
Example output
A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.
When to use this skill
- Use this skill when you want a reusable workflow rather than writing the same prompt again and again.
When not to use this skill
- Do not use this when you only need a one-off answer and do not need a reusable workflow.
- Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/backend-hang-debug/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How backend-hang-debug Compares
| Feature / Agent | backend-hang-debug | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Diagnose and fix FastAPI hangs caused by blocking ThreadPoolExecutor shutdown in the news stream route; includes py-spy capture and non-blocking executor pattern.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Backend Hang Debug ## Purpose - Detect and resolve event-loop hangs where the FastAPI app stops responding (e.g., `curl http://localhost:8000/` times out) due to synchronous executor shutdown in the SSE news stream. - Provide a repeatable triage flow using `py-spy` to capture live stacks and pinpoint blocking code. ## Scope - Backend: `backend/app/api/routes/stream.py` (news stream), `backend/app/services/rss_ingestion.py` (RSS workers), startup processes. - Tooling: `py-spy` for live stack dumps; `curl` with timeouts for smoke tests. ## Quick Triage 1. **Reproduce hang**: `curl -m 5 http://localhost:8000/` and `curl -m 5 http://localhost:8000/health`; note timeouts. 2. **Process check**: `ss -tlnp | grep 8000` to confirm listener; `ls /proc/$(pgrep -f "uvicorn app.main")/fd | wc -l` to rule out FD leak. 3. **Stack capture** (inside backend venv): `uv pip install py-spy` then `sudo /home/bender/classwork/Thesis/backend/.venv/bin/py-spy dump --pid $(pgrep -f "uvicorn app.main")` (and worker pid if multiprocess). Look for `ThreadPoolExecutor.shutdown` in `api/routes/stream.py` frames. ## Fix Pattern (non-blocking executor) - Replace synchronous context manager `with ThreadPoolExecutor(...):` inside `event_generator` with a long-lived executor plus explicit **non-blocking** shutdown: - Create executor outside the context manager. - On client disconnect, cancel pending futures instead of awaiting shutdown. - In `finally`, call `executor.shutdown(wait=False, cancel_futures=True)`. - Rationale: context manager calls `shutdown(wait=True)`, blocking the event loop if RSS worker threads hang on network I/O. ## Implementation Steps 1. **Update stream executor usage** in `backend/app/api/routes/stream.py`: - Instantiate `executor = concurrent.futures.ThreadPoolExecutor(max_workers=5)`. - Dispatch work via `loop.run_in_executor(executor, _process_source_with_debug, ...)`. - On disconnect, `cancel()` pending futures. - In `finally`, `executor.shutdown(wait=False, cancel_futures=True)`. 2. **Keep RSS executor as-is** (`rss_ingestion.py`) since it runs in background threads, but ensure request timeouts remain reasonable (currently 60s per RSS `requests.get`). 3. **Retest**: - Restart uvicorn; `curl -m 5 http://localhost:8000/health` should respond. - Start a stream request and abort the client; server must stay responsive. - Re-run `py-spy dump` to verify no `ThreadPoolExecutor.shutdown(wait=True)` frames in main thread. ## Verification Checklist - [ ] `curl -m 5 http://localhost:8000/` returns a response (no hang). - [ ] `curl -m 5 http://localhost:8000/health` succeeds. - [ ] Aborting `/news/stream` does **not** freeze subsequent requests. - [ ] `py-spy dump` shows event loop not blocked on `ThreadPoolExecutor.shutdown`. - [ ] Frontend no longer stalls waiting on root/health while backend is busy with streams. ## Notes & Future Hardening - Consider adding request timeout middleware to fail fast on slow handlers. - Add per-source network timeouts and shorter retries for RSS feeds to reduce long-lived threads. - If multi-worker uvicorn is used, run `py-spy` on each worker pid when diagnosing hangs.
Related Skills
req-change-workflow
Standardize requirement/feature changes in an existing codebase (especially Chrome extensions) by turning "改需求/需求变更/调整交互/改功能/重构流程" into a repeatable loop: clarify acceptance criteria, confirm current behavior from code, assess impact/risk, design the new logic, implement with small diffs, run a fixed regression checklist, and update docs/decision log. Use when the user feels the change process is chaotic, when edits tend to sprawl across files, or when changes touch manifest/service worker/OAuth/storage/UI and need reliable verification + rollback planning.
woocommerce-backend-dev
Add or modify WooCommerce backend PHP code following project conventions. Use when creating new classes, methods, hooks, or modifying existing backend code. **MUST be invoked before writing any PHP unit tests.**
changelog-maintenance
Maintain a clear and informative changelog for software releases. Use when documenting version changes, tracking features, or communicating updates to users. Handles semantic versioning, changelog formats, and release notes.
backend-testing
Write comprehensive backend tests including unit tests, integration tests, and API tests. Use when testing REST APIs, database operations, authentication flows, or business logic. Handles Jest, Pytest, Mocha, testing strategies, mocking, and test coverage.
game-changing-features
Find 10x product opportunities and high-leverage improvements. Use when user wants strategic product thinking, mentions '10x', wants to find high-impact features, or says 'what would make this 10x better', 'product strategy', or 'what should we build next'.
wiki-changelog
Analyzes git commit history and generates structured changelogs categorized by change type. Use when the user asks about recent changes, wants a changelog, or needs to understand what changed in the repository.
nodejs-backend-patterns
Build production-ready Node.js backend services with Express/Fastify, implementing middleware patterns, error handling, authentication, database integration, and API design best practices. Use when creating Node.js servers, REST APIs, GraphQL backends, or microservices architectures.
error-diagnostics-smart-debug
Use when working with error diagnostics smart debug
error-debugging-multi-agent-review
Use when working with error debugging multi agent review
error-debugging-error-trace
You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging, and ensure teams can quickly identify and resolve production issues.
error-debugging-error-analysis
You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.
dotnet-backend
Build ASP.NET Core 8+ backend services with EF Core, auth, background jobs, and production API patterns.