Files
dance-lessons-coach/adr/0005-graceful-shutdown.md
Gabriel Radureau 52065c9cf3
Some checks failed
CI/CD Pipeline / Build Docker Cache (push) Successful in 10s
CI/CD Pipeline / CI Pipeline (push) Failing after 13s
refactor: convert all DanceLessonsCoach mentions to kebab-case
2026-04-07 19:11:39 +02:00

117 lines
3.4 KiB
Markdown

# Implement graceful shutdown with readiness endpoints
* Status: Accepted
* Deciders: Gabriel Radureau, AI Agent
* Date: 2026-04-03
## Context and Problem Statement
We needed to implement a shutdown mechanism for dance-lessons-coach that provides:
- Clean resource cleanup
- Proper handling of in-flight requests
- Kubernetes/service mesh compatibility
- Minimal downtime for users
- Proper orchestration signaling
## Decision Drivers
* Need for zero-data-loss shutdowns
* Desire for Kubernetes compatibility
* Requirement for proper resource cleanup
* Need for minimal user impact
* Desire for proper orchestration integration
## Considered Options
* Graceful shutdown with readiness endpoints - Kubernetes-style shutdown
* Immediate shutdown - Simple but disruptive
* Delayed shutdown with queue draining - Complex but thorough
* Signal-based shutdown only - Basic graceful shutdown
## Decision Outcome
Chosen option: "Graceful shutdown with readiness endpoints" because it provides the best combination of Kubernetes compatibility, proper resource cleanup, minimal user impact, and follows industry best practices for containerized services.
## Pros and Cons of the Options
### Graceful shutdown with readiness endpoints
* Good, because Kubernetes/service mesh compatible
* Good, because minimal user impact
* Good, because proper resource cleanup
* Good, because follows industry best practices
* Good, because allows proper orchestration
* Bad, because more complex to implement
* Bad, because requires additional endpoints
### Immediate shutdown
* Good, because simplest to implement
* Bad, because disruptive to users
* Bad, because can lose in-flight requests
* Bad, because no resource cleanup
### Delayed shutdown with queue draining
* Good, because very thorough
* Good, because minimal data loss
* Bad, because very complex
* Bad, because overkill for simple services
### Signal-based shutdown only
* Good, because better than immediate shutdown
* Good, because allows some cleanup
* Bad, because not Kubernetes-compatible
* Bad, because still somewhat disruptive
## Implementation Details
```go
// Readiness context management
readyCtx, readyCancel := context.WithCancel(context.Background())
// Readiness endpoint handler
func (s *Server) handleReadiness(w http.ResponseWriter, r *http.Request) {
select {
case <-s.readyCtx.Done():
w.WriteHeader(http.StatusServiceUnavailable)
w.Write([]byte(`{"ready":false}`))
default:
w.Write([]byte(`{"ready":true}`))
}
}
// Shutdown sequence
func (s *Server) shutdown() {
// Cancel readiness - stop accepting new requests
readyCancel()
// Wait for shutdown timeout
shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Graceful server shutdown
s.server.Shutdown(shutdownCtx)
}
```
## Links
* [Kubernetes Graceful Shutdown](https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown/)
* [VictoriaMetrics Readiness Patterns](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#how-to-shut-down)
* [Go HTTP Server Shutdown](https://pkg.go.dev/net/http#Server.Shutdown)
## Monitoring and Verification
```bash
# Check readiness during shutdown
while true; do curl -s http://localhost:8080/api/ready | jq; sleep 1; done
# Expected output during shutdown:
# {"ready":true}
# {"ready":true}
# {"ready":false} # When shutdown starts
# {"ready":false}
# ... (connection refused) # When server fully stopped
```