monitor
Use this skill when the user needs to set up production monitoring, track app health, configure error alerts, or respond to incidents. Also use when the user says 'my app went down,' 'how do I know if something breaks,' 'set up alerts,' 'is my app healthy,' or 'I found out from a user that my site was down.' Covers error tracking, uptime monitoring, performance metrics, and incident response for SaaS applications.
Best use case
monitor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Use this skill when the user needs to set up production monitoring, track app health, configure error alerts, or respond to incidents. Also use when the user says 'my app went down,' 'how do I know if something breaks,' 'set up alerts,' 'is my app healthy,' or 'I found out from a user that my site was down.' Covers error tracking, uptime monitoring, performance metrics, and incident response for SaaS applications.
Teams using monitor should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/monitor/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How monitor Compares
| Feature / Agent | monitor | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Use this skill when the user needs to set up production monitoring, track app health, configure error alerts, or respond to incidents. Also use when the user says 'my app went down,' 'how do I know if something breaks,' 'set up alerts,' 'is my app healthy,' or 'I found out from a user that my site was down.' Covers error tracking, uptime monitoring, performance metrics, and incident response for SaaS applications.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Monitor **This skill is for production monitoring and incident response.** For debugging specific bugs, use **debug**. For pre-launch readiness checks, use **go-live**. For security-specific monitoring (auth events, API abuse), use **secure**. For analytics and user behavior tracking, use **analytics**. ### Don't Do Yet - **Don't pay for monitoring tools** until you've outgrown the free tiers. UptimeRobot + Sentry free handles most early-stage apps. - **Don't set up DataDog, New Relic, or Grafana.** These are enterprise tools. You don't need them with < 1,000 users. - **Don't build custom dashboards.** Your hosting platform (Vercel, Railway) has built-in metrics. Use those first. - **Don't monitor everything.** Three things matter at launch: is it up, are there errors, is it slow. That's it. ## Monitoring Checklist ``` Basic Monitoring: - [ ] Uptime monitoring (is site up?) - [ ] Error tracking (are errors happening?) - [ ] Performance monitoring (is it slow?) - [ ] User activity (are people using it?) - [ ] Critical alerts configured - [ ] Check dashboard daily ``` See [MONITORING-SETUP.md](MONITORING-SETUP.md) for implementation. --- ## Why Monitor? **Without monitoring:** - Users hit errors, you don't know - Site goes down, you find out from Twitter - Slow performance, users leave silently - Security issues, no alert **With monitoring:** - Errors show in dashboard immediately - Get text when site goes down - See performance degradation - Catch issues before users complain **Goal: Know about problems before users tell you.** --- ## Three Essential Monitors ### 1. Is It Up? **Uptime monitoring** - Pings your app every minute **Free tools:** - UptimeRobot (free, 50 monitors) - Pingdom (limited free tier) - Vercel/Netlify (built-in for deployed apps) **Setup:** ``` 1. Sign up for UptimeRobot 2. Add monitor for https://yourapp.com 3. Add your email for alerts 4. Get texted if site is down ``` ### 2. Are There Errors? **Error tracking** - Captures JavaScript errors and API failures **Free tools:** - Sentry (free tier: 5k errors/month) - LogRocket (limited free) - Vercel/Netlify logs (for deployed apps) **Claude Code:** ``` Add Sentry error tracking to my app: - Install @sentry/nextjs (or appropriate package) - Capture all frontend errors and API errors - Include user context (email, ID) - Configure source maps for readable stack traces - Set up Sentry.init in both client and server entry points ``` **Lovable / Replit** (paste into chat): ``` Add error tracking to my app. I want to be notified when errors happen. Use Sentry (free tier). Show me how to: 1. Create a Sentry account and project 2. Add the tracking code to my app 3. Test that errors are being captured ``` ### 3. Is It Slow? **Performance monitoring** - Tracks page load times **Free tools:** - Vercel Analytics (built-in) - Google PageSpeed Insights (free) - Cloudflare Analytics (free tier) **Setup:** - Usually automatic with hosting platform - Check dashboard weekly --- ## What to Monitor ### Critical Metrics **Must monitor:** - Site uptime (99%+) - Error rate (< 1% of requests) - API response time (< 500ms) - Page load time (< 3s) **Nice to have:** - Active users - Feature usage - Conversion rates - User paths **For MVP:** Focus on the "must monitor" only. --- ## Setting Up Alerts **Configure alerts for:** **Critical (text me immediately):** - Site is down - Error rate spike (10x normal) - Database connection lost - Payment processing failing **Important (email within hour):** - API slow (>2 seconds) - Error rate elevated (2x normal) - Disk space low (>80%) **Informational (daily digest):** - New errors discovered - Performance trending down - Traffic patterns **Tell AI:** ``` Configure monitoring alerts: - Critical: Text to [phone] - Important: Email to [email] - Send summary: Daily at 9am ``` --- ## Daily Monitoring Routine **5-minute morning check:** ``` Daily Check: 1. Open monitoring dashboard 2. Check uptime (should be 100% yesterday) 3. Check error count (any spikes?) 4. Check performance (slower than usual?) 5. Review any alerts from overnight ``` **If all green:** You're done, 5 minutes. **If red:** Investigate using debug skill. --- ## Reading Monitoring Dashboards ### Uptime Dashboard **Green:** Site responding **Red:** Site down or slow to respond **What to check:** - Uptime percentage (target: 99%+) - Response time (target: <500ms) - Recent downtime incidents ### Error Dashboard **Look for:** - Error count spikes (sudden jump) - New error types (didn't see before) - Affected users (how many hit this?) - Error frequency (happening a lot?) **Priority:** - Affecting many users → High priority - Blocking key features → High priority - Edge case error → Lower priority ### Performance Dashboard **Look for:** - Load time trending up (getting slower) - Slow endpoints (which API calls) - Slow pages (which routes) - Geographic differences (slow in specific regions) --- ## Error Investigation **When errors spike:** ``` 1. Open error tracking dashboard (Sentry) 2. Find the most frequent error 3. Read error message and stack trace 4. Note: How many users affected? 5. Note: Started when? 6. Check: Did we deploy recently? ``` **Give to AI:** ``` Error in production: [Paste error message and stack trace] Affected: [X] users in last [Y] hours Started: [timestamp] Recent deploys: [any?] Please: 1. Explain what's wrong 2. Propose hotfix 3. How to test before deploying ``` --- ## User-Reported Issues **When user reports problem:** ``` User Report Investigation: 1. Can you reproduce it? 2. Check monitoring for errors at that time 3. Check logs for that user 4. Check if others affected 5. Determine severity Then use debug skill to fix. ``` **Tell AI:** ``` User reported: [issue description] User: [email or ID] Timestamp: [when it happened] Check monitoring and logs for this user at this time. What errors or issues do you see? ``` --- ## Proactive Monitoring **Catch issues before users:** **Weekly checks:** ``` Weekly Review: - [ ] Error trends (going up or down?) - [ ] Performance trends (slower?) - [ ] New error types introduced - [ ] Uptime issues resolved - [ ] Alert noise (too many false alerts?) ``` **Monthly checks:** ``` Monthly Health: - [ ] Compare to last month - [ ] Any degradation? - [ ] Any improvements? - [ ] Monitoring gaps (what's not tracked?) ``` --- ## Free Monitoring Stack **Recommended for MVP:** **Uptime:** - UptimeRobot (free) - 50 monitors **Errors:** - Sentry (free) - 5k errors/month **Performance:** - Vercel Analytics (free on Vercel) - Cloudflare Analytics (free) **Logs:** - Platform logs (Vercel, Netlify, Railway) **Cost: $0/month until you need more.** --- ## When to Upgrade Monitoring **Upgrade when:** - Hitting free tier limits - Need more detailed analytics - Need faster alert response - Need advanced features (session replay, etc.) **Paid tiers (typically $20-50/mo):** - Sentry Pro ($26/mo) - LogRocket ($99/mo - session replay) - DataDog ($15/host/mo) **For < 1000 users:** Free tiers sufficient. --- ## Common Monitoring Mistakes | Mistake | Fix | |---------|-----| | No monitoring set up | Set up before launch | | Alert fatigue (too many alerts) | Only alert on critical issues | | Checking once a month | Check daily (5 minutes) | | Ignoring trends | Watch for degradation over time | | No alerts configured | Set up text alerts for downtime | | Monitoring but not acting | Use monitoring to find and fix issues | --- ## Interpreting Trends **Good trends:** - Errors decreasing - Performance improving - Uptime stable at 99.9%+ **Warning trends:** - Errors slowly increasing - Performance slowly degrading - Uptime dipping below 99% **Critical trends:** - Sudden error spike - Sudden performance drop - Multiple downtime incidents **Action:** Address warning trends before they become critical. --- ## Logging vs Monitoring **Logging:** - Records what happened - For debugging specific issues - Detailed, verbose - Review when investigating **Monitoring:** - Tracks overall health - For catching issues early - High-level metrics - Review daily **Both needed:** Monitoring alerts you, logs help debug. --- ## Setting Up Logging **Tell AI:** ``` Add application logging: - Log all errors with context - Log API requests/responses - Log slow operations (>1s) - Log authentication events - Don't log sensitive data Format: JSON with timestamp, level, message, context Send to: [Platform logs or external service] ``` **Log levels:** - ERROR: Something broke - WARN: Something concerning - INFO: Normal operations - DEBUG: Detailed debugging info **Production:** Log ERROR and WARN only. --- ## Monitoring Integrations **Third-party services:** **Payments (Stripe):** - Failed payments alert - Refund requests alert - Subscription cancellations (daily digest) **Email (SendGrid):** - Delivery failures alert - Bounce rate elevated alert - Spam complaints alert **Database:** - Connection pool exhausted - Slow queries (>1s) - Disk space low **Tell AI:** ``` Add monitoring for [service]: - Alert on failures - Track success rate - Log errors with context ``` --- ## Incident Response **When alerts fire:** ``` Incident Response: 1. Acknowledge alert (mark as seen) 2. Assess severity: - Critical: Site down, payments failing - High: Errors affecting many users - Medium: Isolated issues 3. Immediate action: - Critical: Hotfix or rollback - High: Fix within hours - Medium: Fix in next deploy 4. Update users if needed 5. Post-mortem after resolved ``` **Critical incidents:** ``` 1. Assess impact (how many affected?) 2. Quick fix or rollback 3. Deploy hotfix 4. Verify fixed 5. Monitor closely for hour 6. Update status page if you have one ``` --- ## Success Looks Like ✅ Know about issues before users report them ✅ Uptime >99.9% ✅ Errors caught and fixed quickly ✅ Performance trends stable or improving ✅ Daily monitoring routine (5 minutes) ✅ Alerts configured and actionable ✅ Issues resolved proactively --- ## Related Skills - **debug** — Investigate and fix specific bugs - **deploy** — Hosting setup and rollback procedures - **secure** — Security monitoring and hardening - **analytics** — User behavior tracking and conversion funnels - **go-live** — Pre-launch readiness (includes monitoring as a checklist item)
Related Skills
validate
Use this skill when the user needs to validate a business idea, test demand before building, run a smoke test, create an MVP experiment, or decide whether an idea is worth pursuing. Covers demand validation, smoke tests, fake-door tests, landing page experiments, and go/no-go decision frameworks for bootstrapped founders.
ux-design
Use this skill when flows feel clunky, users are confused, navigation needs planning, onboarding needs design, or accessibility needs implementation. Covers information architecture, user flows, interaction patterns, progressive disclosure, and error handling UX.
ui-patterns
Use this skill when the user needs to build a dashboard, settings page, data table, or any page layout. Also use when choosing component libraries, implementing responsive design, dark mode, or handling UI states (loading, empty, error). Covers component selection, page composition, and responsive implementation.
translate
Use this skill when the user is a domain expert (lawyer, doctor, contractor, accountant, etc.) who wants to turn their professional knowledge into a software product. Also use when the user says 'I have an idea for my industry,' 'I know this problem exists,' 'I want to build something for [profession],' or is struggling to describe what they want the software to do. Helps identify which professional pain is worth building for, then translates it into requirements AI tools can execute.
test
Use this skill when the user needs to test features before deployment, create test scenarios, find edge cases, or verify bug fixes. Covers manual testing workflows, cross-browser testing, edge case identification, and testing checklists for non-technical founders.
technical-seo
Use this skill to implement technical SEO optimizations in code — meta tags, schema markup, Core Web Vitals, crawlability, robots.txt, sitemaps, and GEO (Generative Engine Optimization) for AI search engines. This is the implementation skill — for strategy see seo, for content writing see seo-content, for auditing see seo-audit.
support
Use this skill when the user needs to create help docs, build a knowledge base, set up self-serve support, or reduce support tickets. Covers documentation strategy, help center structure, support tone, and scaling support without hiring.
social-media
Use this skill when the user needs to grow a social media presence, create content for Twitter/X, LinkedIn, or other platforms, build a founder brand, or use social media as a distribution channel. Covers platform strategy, content frameworks, posting cadence, and audience building for bootstrapped SaaS founders.
seo
Use this skill when the user needs to plan SEO content, do keyword research, build a content calendar, map search intent to page types, or create an internal linking strategy. Also use when the user says 'how do I rank higher,' 'what should I write about for SEO,' 'SEO plan,' 'what keywords should I target,' or 'how to get organic traffic.' This is the strategy and planning skill — for writing content see seo-content, for technical implementation see technical-seo, for auditing see seo-audit.
seo-content
Use this skill when the user needs to write SEO content — blog posts, landing pages, feature pages, comparison pages, how-to guides, or any content meant to rank in search and get cited by AI. Covers content briefs, humanized writing that avoids AI detection, SERP feature targeting, entity optimization, content refresh, and quality self-checks. This is the writing skill — for strategy see seo, for technical implementation see technical-seo, for auditing see seo-audit.
seo-audit
Audit a codebase for SEO and AI-answer visibility, then produce a prioritized fix-it plan. Use this skill whenever a user says things like "audit my SEO", "check my site for search visibility", "how do I rank better", "optimize for Google", "optimize for AI answers", "SEO review", "GEO audit", "run the SEO agent", or anything about improving organic traffic or search rankings. Also trigger when someone mentions wanting visibility in AI-generated answers (ChatGPT, Gemini, Perplexity, Claude). Works on any web project — static sites, Next.js, Astro, Hugo, WordPress themes, or anything that outputs HTML.
secure
Use this skill when the user needs to secure their SaaS app, implement authentication, protect user data, secure APIs, or check for vulnerabilities. Also use when the user says 'is my app secure,' 'security check,' 'I'm worried about hackers,' 'how do I protect user data,' or 'security before launch.' Covers OWASP Top 10, auth best practices, data protection, and security checklists for apps built with AI tools.