Audit 2026-05-02 (Tâche 6 Phase A) had identified 3 inconsistent formats across the ADR corpus : - F1 list bullets : `* Status:` / `* Date:` / `* Deciders:` (11 ADRs) - F2 bold fields : `**Status:**` / `**Date:**` / `**Authors:**` (9 ADRs) - F3 dedicated section : `## Status\n**Value** ✅` (5 ADRs) Mixed metadata names (Authors / Deciders / Decision Date / Implementation Date / Implementation Status / Last Updated) and decorative emojis on status values made the corpus hard to scan or template against. Canonical format adopted (see adr/README.md for full template) : # NN. Title **Status:** <Proposed|Accepted|Implemented|Partially Implemented| Approved|Rejected|Deferred|Deprecated|Superseded by ADR-NNNN> **Date:** YYYY-MM-DD **Authors:** Name(s) [optional **Field:** ... lines] ## Context... Transformations applied (via /tmp/homogenize-adrs.py) : - F1 list bullets → bold fields - F2 cleanup : `**Deciders:**` → `**Authors:**`, strip status emojis - F3 sections : `## Status\n**Value** ✅` → `**Status:** Value` - Strip decorative emojis from `**Status:**` and `**Implementation Status:**` - Convert any `* Implementation Status:` / `* Last Updated:` / `* Decision Drivers:` / `* Decision Date:` to bold equivalents - Date typo fix : `2024-04-XX` → `2026-04-XX` for ADRs 0018, 0019 (already noted in PR #17 but here re-applied since branch starts from origin/main pre-PR17) - Normalize multiple blank lines after header (max 1) 21 / 23 ADRs modified. 0010 and 0012 were already conform. 0011 and 0014 do not exist in the repo (cf. README index update). Body content of each ADR is preserved unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
117 lines
3.4 KiB
Markdown
117 lines
3.4 KiB
Markdown
# Implement graceful shutdown with readiness endpoints
|
|
|
|
**Status:** Accepted
|
|
**Authors:** Gabriel Radureau, AI Agent
|
|
**Date:** 2026-04-03
|
|
|
|
## Context and Problem Statement
|
|
|
|
We needed to implement a shutdown mechanism for dance-lessons-coach that provides:
|
|
- Clean resource cleanup
|
|
- Proper handling of in-flight requests
|
|
- Kubernetes/service mesh compatibility
|
|
- Minimal downtime for users
|
|
- Proper orchestration signaling
|
|
|
|
## Decision Drivers
|
|
|
|
* Need for zero-data-loss shutdowns
|
|
* Desire for Kubernetes compatibility
|
|
* Requirement for proper resource cleanup
|
|
* Need for minimal user impact
|
|
* Desire for proper orchestration integration
|
|
|
|
## Considered Options
|
|
|
|
* Graceful shutdown with readiness endpoints - Kubernetes-style shutdown
|
|
* Immediate shutdown - Simple but disruptive
|
|
* Delayed shutdown with queue draining - Complex but thorough
|
|
* Signal-based shutdown only - Basic graceful shutdown
|
|
|
|
## Decision Outcome
|
|
|
|
Chosen option: "Graceful shutdown with readiness endpoints" because it provides the best combination of Kubernetes compatibility, proper resource cleanup, minimal user impact, and follows industry best practices for containerized services.
|
|
|
|
## Pros and Cons of the Options
|
|
|
|
### Graceful shutdown with readiness endpoints
|
|
|
|
* Good, because Kubernetes/service mesh compatible
|
|
* Good, because minimal user impact
|
|
* Good, because proper resource cleanup
|
|
* Good, because follows industry best practices
|
|
* Good, because allows proper orchestration
|
|
* Bad, because more complex to implement
|
|
* Bad, because requires additional endpoints
|
|
|
|
### Immediate shutdown
|
|
|
|
* Good, because simplest to implement
|
|
* Bad, because disruptive to users
|
|
* Bad, because can lose in-flight requests
|
|
* Bad, because no resource cleanup
|
|
|
|
### Delayed shutdown with queue draining
|
|
|
|
* Good, because very thorough
|
|
* Good, because minimal data loss
|
|
* Bad, because very complex
|
|
* Bad, because overkill for simple services
|
|
|
|
### Signal-based shutdown only
|
|
|
|
* Good, because better than immediate shutdown
|
|
* Good, because allows some cleanup
|
|
* Bad, because not Kubernetes-compatible
|
|
* Bad, because still somewhat disruptive
|
|
|
|
## Implementation Details
|
|
|
|
```go
|
|
// Readiness context management
|
|
readyCtx, readyCancel := context.WithCancel(context.Background())
|
|
|
|
// Readiness endpoint handler
|
|
func (s *Server) handleReadiness(w http.ResponseWriter, r *http.Request) {
|
|
select {
|
|
case <-s.readyCtx.Done():
|
|
w.WriteHeader(http.StatusServiceUnavailable)
|
|
w.Write([]byte(`{"ready":false}`))
|
|
default:
|
|
w.Write([]byte(`{"ready":true}`))
|
|
}
|
|
}
|
|
|
|
// Shutdown sequence
|
|
func (s *Server) shutdown() {
|
|
// Cancel readiness - stop accepting new requests
|
|
readyCancel()
|
|
|
|
// Wait for shutdown timeout
|
|
shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
|
|
defer cancel()
|
|
|
|
// Graceful server shutdown
|
|
s.server.Shutdown(shutdownCtx)
|
|
}
|
|
```
|
|
|
|
## Links
|
|
|
|
* [Kubernetes Graceful Shutdown](https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown/)
|
|
* [VictoriaMetrics Readiness Patterns](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#how-to-shut-down)
|
|
* [Go HTTP Server Shutdown](https://pkg.go.dev/net/http#Server.Shutdown)
|
|
|
|
## Monitoring and Verification
|
|
|
|
```bash
|
|
# Check readiness during shutdown
|
|
while true; do curl -s http://localhost:8080/api/ready | jq; sleep 1; done
|
|
|
|
# Expected output during shutdown:
|
|
# {"ready":true}
|
|
# {"ready":true}
|
|
# {"ready":false} # When shutdown starts
|
|
# {"ready":false}
|
|
# ... (connection refused) # When server fully stopped
|
|
``` |