# Implement graceful shutdown with readiness endpoints * Status: Accepted * Deciders: Gabriel Radureau, AI Agent * Date: 2026-04-03 ## Context and Problem Statement We needed to implement a shutdown mechanism for dance-lessons-coach that provides: - Clean resource cleanup - Proper handling of in-flight requests - Kubernetes/service mesh compatibility - Minimal downtime for users - Proper orchestration signaling ## Decision Drivers * Need for zero-data-loss shutdowns * Desire for Kubernetes compatibility * Requirement for proper resource cleanup * Need for minimal user impact * Desire for proper orchestration integration ## Considered Options * Graceful shutdown with readiness endpoints - Kubernetes-style shutdown * Immediate shutdown - Simple but disruptive * Delayed shutdown with queue draining - Complex but thorough * Signal-based shutdown only - Basic graceful shutdown ## Decision Outcome Chosen option: "Graceful shutdown with readiness endpoints" because it provides the best combination of Kubernetes compatibility, proper resource cleanup, minimal user impact, and follows industry best practices for containerized services. ## Pros and Cons of the Options ### Graceful shutdown with readiness endpoints * Good, because Kubernetes/service mesh compatible * Good, because minimal user impact * Good, because proper resource cleanup * Good, because follows industry best practices * Good, because allows proper orchestration * Bad, because more complex to implement * Bad, because requires additional endpoints ### Immediate shutdown * Good, because simplest to implement * Bad, because disruptive to users * Bad, because can lose in-flight requests * Bad, because no resource cleanup ### Delayed shutdown with queue draining * Good, because very thorough * Good, because minimal data loss * Bad, because very complex * Bad, because overkill for simple services ### Signal-based shutdown only * Good, because better than immediate shutdown * Good, because allows some cleanup * Bad, because not Kubernetes-compatible * Bad, because still somewhat disruptive ## Implementation Details ```go // Readiness context management readyCtx, readyCancel := context.WithCancel(context.Background()) // Readiness endpoint handler func (s *Server) handleReadiness(w http.ResponseWriter, r *http.Request) { select { case <-s.readyCtx.Done(): w.WriteHeader(http.StatusServiceUnavailable) w.Write([]byte(`{"ready":false}`)) default: w.Write([]byte(`{"ready":true}`)) } } // Shutdown sequence func (s *Server) shutdown() { // Cancel readiness - stop accepting new requests readyCancel() // Wait for shutdown timeout shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second) defer cancel() // Graceful server shutdown s.server.Shutdown(shutdownCtx) } ``` ## Links * [Kubernetes Graceful Shutdown](https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown/) * [VictoriaMetrics Readiness Patterns](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#how-to-shut-down) * [Go HTTP Server Shutdown](https://pkg.go.dev/net/http#Server.Shutdown) ## Monitoring and Verification ```bash # Check readiness during shutdown while true; do curl -s http://localhost:8080/api/ready | jq; sleep 1; done # Expected output during shutdown: # {"ready":true} # {"ready":true} # {"ready":false} # When shutdown starts # {"ready":false} # ... (connection refused) # When server fully stopped ```