3.4 KiB
3.4 KiB
Implement graceful shutdown with readiness endpoints
- Status: Accepted
- Deciders: Gabriel Radureau, AI Agent
- Date: 2026-04-03
Context and Problem Statement
We needed to implement a shutdown mechanism for dance-lessons-coach that provides:
- Clean resource cleanup
- Proper handling of in-flight requests
- Kubernetes/service mesh compatibility
- Minimal downtime for users
- Proper orchestration signaling
Decision Drivers
- Need for zero-data-loss shutdowns
- Desire for Kubernetes compatibility
- Requirement for proper resource cleanup
- Need for minimal user impact
- Desire for proper orchestration integration
Considered Options
- Graceful shutdown with readiness endpoints - Kubernetes-style shutdown
- Immediate shutdown - Simple but disruptive
- Delayed shutdown with queue draining - Complex but thorough
- Signal-based shutdown only - Basic graceful shutdown
Decision Outcome
Chosen option: "Graceful shutdown with readiness endpoints" because it provides the best combination of Kubernetes compatibility, proper resource cleanup, minimal user impact, and follows industry best practices for containerized services.
Pros and Cons of the Options
Graceful shutdown with readiness endpoints
- Good, because Kubernetes/service mesh compatible
- Good, because minimal user impact
- Good, because proper resource cleanup
- Good, because follows industry best practices
- Good, because allows proper orchestration
- Bad, because more complex to implement
- Bad, because requires additional endpoints
Immediate shutdown
- Good, because simplest to implement
- Bad, because disruptive to users
- Bad, because can lose in-flight requests
- Bad, because no resource cleanup
Delayed shutdown with queue draining
- Good, because very thorough
- Good, because minimal data loss
- Bad, because very complex
- Bad, because overkill for simple services
Signal-based shutdown only
- Good, because better than immediate shutdown
- Good, because allows some cleanup
- Bad, because not Kubernetes-compatible
- Bad, because still somewhat disruptive
Implementation Details
// Readiness context management
readyCtx, readyCancel := context.WithCancel(context.Background())
// Readiness endpoint handler
func (s *Server) handleReadiness(w http.ResponseWriter, r *http.Request) {
select {
case <-s.readyCtx.Done():
w.WriteHeader(http.StatusServiceUnavailable)
w.Write([]byte(`{"ready":false}`))
default:
w.Write([]byte(`{"ready":true}`))
}
}
// Shutdown sequence
func (s *Server) shutdown() {
// Cancel readiness - stop accepting new requests
readyCancel()
// Wait for shutdown timeout
shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Graceful server shutdown
s.server.Shutdown(shutdownCtx)
}
Links
Monitoring and Verification
# Check readiness during shutdown
while true; do curl -s http://localhost:8080/api/ready | jq; sleep 1; done
# Expected output during shutdown:
# {"ready":true}
# {"ready":true}
# {"ready":false} # When shutdown starts
# {"ready":false}
# ... (connection refused) # When server fully stopped