Posted Jul 27, 2025
closed
Problem
The site experiences frequent H12 timeout errors and crashes after deployments. The web dynos need to be manually restarted to restore service. This is causing significant downtime and user frustration.
Root Causes Identified
-
Sidekiq Middleware Bug:
Module.newwas being used instead ofClass.newin sidekiq_memory_killer.rb, causing NoMethodError exceptions - Memory Issues: Web dynos approaching memory limits (512MB for Performance-M)
- H12 Timeouts: Requests taking longer than Heroku's 30-second limit
- Post-deployment instability: Site crashes within minutes of deployment
Solution Implemented
Created an automatic restart system that runs within the app itself:
1. Auto-Restart Monitor (config/initializers/auto_restart_monitor.rb)
- Monitors for H12 errors (restarts after 3 errors in 5 minutes)
- Tracks request timeouts
- Monitors memory usage (restarts if > 450MB)
- Implements cooldown period (10 minutes) to prevent restart loops
- Gracefully shuts down Puma when restart needed
- Coordinates restarts across multiple web dynos
2. Monitoring Endpoint (app/controllers/monitor_controller.rb)
- Provides
/monitor/statusendpoint for health checks - Shows current memory usage, error counts, and system status
- Protected by token authentication
3. Fixed Sidekiq Bug
- Changed
Module.newtoClass.newin sidekiq_memory_killer.rb
How It Works
- The monitor runs in a background thread on each web dyno
- It tracks H12 errors and timeouts via middleware
- When thresholds are exceeded, it gracefully terminates the process
- Heroku automatically restarts the terminated dyno
- Multiple dynos coordinate to stagger restarts
Benefits
- Automatic recovery from crashes
- No manual intervention required
- Minimal downtime (Heroku restarts dynos in ~10 seconds)
- Prevents extended outages
- Provides visibility into system health
Next Steps
- Deploy this temporary fix to production
- Continue investigating root cause of performance issues
- Consider upgrading to larger dynos if memory is the constraint
- Optimize slow database queries and ActiveStorage operations