dance-lessons-coach/adr/0005-graceful-shutdown.md

# Implement graceful shutdown with readiness endpoints

* Status: Accepted
* Deciders: Gabriel Radureau, AI Agent
* Date: 2026-04-03

## Context and Problem Statement

We needed to implement a shutdown mechanism for dance-lessons-coach that provides:
- Clean resource cleanup
- Proper handling of in-flight requests
- Kubernetes/service mesh compatibility
- Minimal downtime for users
- Proper orchestration signaling

## Decision Drivers

* Need for zero-data-loss shutdowns
* Desire for Kubernetes compatibility
* Requirement for proper resource cleanup
* Need for minimal user impact
* Desire for proper orchestration integration

## Considered Options

* Graceful shutdown with readiness endpoints - Kubernetes-style shutdown
* Immediate shutdown - Simple but disruptive
* Delayed shutdown with queue draining - Complex but thorough
* Signal-based shutdown only - Basic graceful shutdown

## Decision Outcome

Chosen option: "Graceful shutdown with readiness endpoints" because it provides the best combination of Kubernetes compatibility, proper resource cleanup, minimal user impact, and follows industry best practices for containerized services.

## Pros and Cons of the Options

### Graceful shutdown with readiness endpoints

* Good, because Kubernetes/service mesh compatible
* Good, because minimal user impact
* Good, because proper resource cleanup
* Good, because follows industry best practices
* Good, because allows proper orchestration
* Bad, because more complex to implement
* Bad, because requires additional endpoints

### Immediate shutdown

* Good, because simplest to implement
* Bad, because disruptive to users
* Bad, because can lose in-flight requests
* Bad, because no resource cleanup

### Delayed shutdown with queue draining

* Good, because very thorough
* Good, because minimal data loss
* Bad, because very complex
* Bad, because overkill for simple services

### Signal-based shutdown only

* Good, because better than immediate shutdown
* Good, because allows some cleanup
* Bad, because not Kubernetes-compatible
* Bad, because still somewhat disruptive

## Implementation Details

```go
// Readiness context management
readyCtx, readyCancel := context.WithCancel(context.Background())

// Readiness endpoint handler
func (s *Server) handleReadiness(w http.ResponseWriter, r *http.Request) {
    select {
    case <-s.readyCtx.Done():
        w.WriteHeader(http.StatusServiceUnavailable)
        w.Write([]byte(`{"ready":false}`))
    default:
        w.Write([]byte(`{"ready":true}`))
    }
}

// Shutdown sequence
func (s *Server) shutdown() {
    // Cancel readiness - stop accepting new requests
    readyCancel()

    // Wait for shutdown timeout
    shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()

    // Graceful server shutdown
    s.server.Shutdown(shutdownCtx)
}
```

## Links

* [Kubernetes Graceful Shutdown](https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown/)
* [VictoriaMetrics Readiness Patterns](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#how-to-shut-down)
* [Go HTTP Server Shutdown](https://pkg.go.dev/net/http#Server.Shutdown)

## Monitoring and Verification

```bash
# Check readiness during shutdown
while true; do curl -s http://localhost:8080/api/ready | jq; sleep 1; done

# Expected output during shutdown:
# {"ready":true}
# {"ready":true}
# {"ready":false}  # When shutdown starts
# {"ready":false}
# ... (connection refused)  # When server fully stopped
```