Account

Issues: URGENT: Implement auto-restart monitor to handle H12 timeouts and crashes

Posted Jul 27, 2025

closed

Problem

The site experiences frequent H12 timeout errors and crashes after deployments. The web dynos need to be manually restarted to restore service. This is causing significant downtime and user frustration.

Root Causes Identified

Sidekiq Middleware Bug: Module.new was being used instead of Class.new in sidekiq_memory_killer.rb, causing NoMethodError exceptions
Memory Issues: Web dynos approaching memory limits (512MB for Performance-M)
H12 Timeouts: Requests taking longer than Heroku's 30-second limit
Post-deployment instability: Site crashes within minutes of deployment

Solution Implemented

Created an automatic restart system that runs within the app itself:

1. Auto-Restart Monitor (`config/initializers/auto_restart_monitor.rb`)

Monitors for H12 errors (restarts after 3 errors in 5 minutes)
Tracks request timeouts
Monitors memory usage (restarts if > 450MB)
Implements cooldown period (10 minutes) to prevent restart loops
Gracefully shuts down Puma when restart needed
Coordinates restarts across multiple web dynos

2. Monitoring Endpoint (`app/controllers/monitor_controller.rb`)

Provides /monitor/status endpoint for health checks
Shows current memory usage, error counts, and system status
Protected by token authentication

3. Fixed Sidekiq Bug

Changed Module.new to Class.new in sidekiq_memory_killer.rb

How It Works

The monitor runs in a background thread on each web dyno
It tracks H12 errors and timeouts via middleware
When thresholds are exceeded, it gracefully terminates the process
Heroku automatically restarts the terminated dyno
Multiple dynos coordinate to stagger restarts

Benefits

Automatic recovery from crashes
No manual intervention required
Minimal downtime (Heroku restarts dynos in ~10 seconds)
Prevents extended outages
Provides visibility into system health

Next Steps

Deploy this temporary fix to production
Continue investigating root cause of performance issues
Consider upgrading to larger dynos if memory is the constraint
Optimize slow database queries and ActiveStorage operations

All Issues

Site Menu

Explore

Meet

Enjoy

Learn