Files
dance-lessons-coach/adr/0005-graceful-shutdown.md
Gabriel Radureau 95596b5e12 📝 docs: consolidate documentation and add comprehensive ADRs\n\n## Summary\nMajor documentation restructuring to improve clarity, reduce redundancy,
and preserve complete architectural context for AI/developer reference.\n\n## Changes\n\n### Documentation Consolidation 🗂️\n- Simplified README.md by ~100 lines (25% reduction)\n- Removed redundant sections (project structure, configuration, API docs)\n- Added strategic cross-references between README.md and AGENTS.md\n- README.md now focused on user onboarding and basic usage\n- AGENTS.md maintained as complete technical reference\n\n### Architecture Decision Records \n- Added comprehensive ADR directory with 9 decision records:\n  * 0001-go-1.26.1-standard.md\n  * 0002-chi-router.md\n  * 0003-zerolog-logging.md (enhanced with Zap analysis)\n  * 0004-interface-based-design.md\n  * 0005-graceful-shutdown.md\n  * 0006-configuration-management.md\n  * 0007-opentelemetry-integration.md\n  * 0008-bdd-testing.md\n  * 0009-hybrid-testing-approach.md\n- Added adr/README.md with guidelines and template\n- Enhanced Zerolog ADR with detailed performance benchmarking vs Zap\n\n### Content Organization 📝\n- README.md: User-focused guide with quick start and basic examples\n- AGENTS.md: Developer/AI-focused complete technical reference\n- ADR directory: Architectural decision history and rationale\n\n## Impact\n-  Better user onboarding experience\n-  Preserved complete technical context for AI agents\n-  Reduced maintenance burden through consolidation\n-  Improved discoverability of advanced documentation\n-  Established ADR process for future decisions\n\n## Related\n- Resolves documentation redundancy issues\n- Prepares for BDD implementation with clear context\n- Supports future Swagger integration decisions\n- Maintains project history for new contributors\n\nGenerated by Mistral Vibe.\nCo-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-04-04 15:48:27 +02:00

3.4 KiB

Implement graceful shutdown with readiness endpoints

  • Status: Accepted
  • Deciders: Gabriel Radureau, AI Agent
  • Date: 2026-04-03

Context and Problem Statement

We needed to implement a shutdown mechanism for DanceLessonsCoach that provides:

  • Clean resource cleanup
  • Proper handling of in-flight requests
  • Kubernetes/service mesh compatibility
  • Minimal downtime for users
  • Proper orchestration signaling

Decision Drivers

  • Need for zero-data-loss shutdowns
  • Desire for Kubernetes compatibility
  • Requirement for proper resource cleanup
  • Need for minimal user impact
  • Desire for proper orchestration integration

Considered Options

  • Graceful shutdown with readiness endpoints - Kubernetes-style shutdown
  • Immediate shutdown - Simple but disruptive
  • Delayed shutdown with queue draining - Complex but thorough
  • Signal-based shutdown only - Basic graceful shutdown

Decision Outcome

Chosen option: "Graceful shutdown with readiness endpoints" because it provides the best combination of Kubernetes compatibility, proper resource cleanup, minimal user impact, and follows industry best practices for containerized services.

Pros and Cons of the Options

Graceful shutdown with readiness endpoints

  • Good, because Kubernetes/service mesh compatible
  • Good, because minimal user impact
  • Good, because proper resource cleanup
  • Good, because follows industry best practices
  • Good, because allows proper orchestration
  • Bad, because more complex to implement
  • Bad, because requires additional endpoints

Immediate shutdown

  • Good, because simplest to implement
  • Bad, because disruptive to users
  • Bad, because can lose in-flight requests
  • Bad, because no resource cleanup

Delayed shutdown with queue draining

  • Good, because very thorough
  • Good, because minimal data loss
  • Bad, because very complex
  • Bad, because overkill for simple services

Signal-based shutdown only

  • Good, because better than immediate shutdown
  • Good, because allows some cleanup
  • Bad, because not Kubernetes-compatible
  • Bad, because still somewhat disruptive

Implementation Details

// Readiness context management
readyCtx, readyCancel := context.WithCancel(context.Background())

// Readiness endpoint handler
func (s *Server) handleReadiness(w http.ResponseWriter, r *http.Request) {
    select {
    case <-s.readyCtx.Done():
        w.WriteHeader(http.StatusServiceUnavailable)
        w.Write([]byte(`{"ready":false}`))
    default:
        w.Write([]byte(`{"ready":true}`))
    }
}

// Shutdown sequence
func (s *Server) shutdown() {
    // Cancel readiness - stop accepting new requests
    readyCancel()
    
    // Wait for shutdown timeout
    shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    
    // Graceful server shutdown
    s.server.Shutdown(shutdownCtx)
}

Monitoring and Verification

# Check readiness during shutdown
while true; do curl -s http://localhost:8080/api/ready | jq; sleep 1; done

# Expected output during shutdown:
# {"ready":true}
# {"ready":true}
# {"ready":false}  # When shutdown starts
# {"ready":false}
# ... (connection refused)  # When server fully stopped