Files
dance-lessons-coach/adr/0005-graceful-shutdown.md
Gabriel Radureau 52065c9cf3
Some checks failed
CI/CD Pipeline / Build Docker Cache (push) Successful in 10s
CI/CD Pipeline / CI Pipeline (push) Failing after 13s
refactor: convert all DanceLessonsCoach mentions to kebab-case
2026-04-07 19:11:39 +02:00

3.4 KiB

Implement graceful shutdown with readiness endpoints

  • Status: Accepted
  • Deciders: Gabriel Radureau, AI Agent
  • Date: 2026-04-03

Context and Problem Statement

We needed to implement a shutdown mechanism for dance-lessons-coach that provides:

  • Clean resource cleanup
  • Proper handling of in-flight requests
  • Kubernetes/service mesh compatibility
  • Minimal downtime for users
  • Proper orchestration signaling

Decision Drivers

  • Need for zero-data-loss shutdowns
  • Desire for Kubernetes compatibility
  • Requirement for proper resource cleanup
  • Need for minimal user impact
  • Desire for proper orchestration integration

Considered Options

  • Graceful shutdown with readiness endpoints - Kubernetes-style shutdown
  • Immediate shutdown - Simple but disruptive
  • Delayed shutdown with queue draining - Complex but thorough
  • Signal-based shutdown only - Basic graceful shutdown

Decision Outcome

Chosen option: "Graceful shutdown with readiness endpoints" because it provides the best combination of Kubernetes compatibility, proper resource cleanup, minimal user impact, and follows industry best practices for containerized services.

Pros and Cons of the Options

Graceful shutdown with readiness endpoints

  • Good, because Kubernetes/service mesh compatible
  • Good, because minimal user impact
  • Good, because proper resource cleanup
  • Good, because follows industry best practices
  • Good, because allows proper orchestration
  • Bad, because more complex to implement
  • Bad, because requires additional endpoints

Immediate shutdown

  • Good, because simplest to implement
  • Bad, because disruptive to users
  • Bad, because can lose in-flight requests
  • Bad, because no resource cleanup

Delayed shutdown with queue draining

  • Good, because very thorough
  • Good, because minimal data loss
  • Bad, because very complex
  • Bad, because overkill for simple services

Signal-based shutdown only

  • Good, because better than immediate shutdown
  • Good, because allows some cleanup
  • Bad, because not Kubernetes-compatible
  • Bad, because still somewhat disruptive

Implementation Details

// Readiness context management
readyCtx, readyCancel := context.WithCancel(context.Background())

// Readiness endpoint handler
func (s *Server) handleReadiness(w http.ResponseWriter, r *http.Request) {
    select {
    case <-s.readyCtx.Done():
        w.WriteHeader(http.StatusServiceUnavailable)
        w.Write([]byte(`{"ready":false}`))
    default:
        w.Write([]byte(`{"ready":true}`))
    }
}

// Shutdown sequence
func (s *Server) shutdown() {
    // Cancel readiness - stop accepting new requests
    readyCancel()
    
    // Wait for shutdown timeout
    shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    
    // Graceful server shutdown
    s.server.Shutdown(shutdownCtx)
}

Monitoring and Verification

# Check readiness during shutdown
while true; do curl -s http://localhost:8080/api/ready | jq; sleep 1; done

# Expected output during shutdown:
# {"ready":true}
# {"ready":true}
# {"ready":false}  # When shutdown starts
# {"ready":false}
# ... (connection refused)  # When server fully stopped