## Summary Homogenize all 23 ADRs to a single canonical header format, and rewrite `adr/README.md` to match the actual state of the corpus. This is **Tâche 7** of the ARCODANGE Phase 1 migration (Claude Code → Mistral Vibe). Independent from PR #17 (Tâche 6 — restructure AGENTS.md) — both can merge in any order. No code changes; only documentation. ## Changes ### 1. Homogenize 21 ADR headers (commit `db09d0a`) The audit (Tâche 6 Phase A, Mistral intent-router agent, 2026-05-02) had identified **3 inconsistent header formats** : - **F1** — list bullets (`* Status:` / `* Date:` / `* Deciders:`) : 11 ADRs (0001-0008, 0011, 0014, 0023) - **F2** — bold fields (`**Status:**` / `**Date:**` / `**Authors:**`) : 9 ADRs (0009, 0010, 0012, 0013, 0015, 0016, 0017, 0018, 0019) - **F3** — dedicated section (`## Status\n**Value** ✅`) : 5 ADRs (0020, 0021, 0022, 0024, 0025) Plus mixed metadata names (Authors / Deciders / Decision Date / Implementation Date / Implementation Status / Last Updated) and decorative emojis on status values made the corpus hard to scan or template against. **Canonical format adopted** (see `adr/README.md` for full template) : ```markdown # NN. Title **Status:** <Proposed | Accepted | Implemented | Partially Implemented | Approved | Rejected | Deferred | Deprecated | Superseded by ADR-NNNN> **Date:** YYYY-MM-DD **Authors:** Name(s) [optional **Field:** ... lines] ## Context... ``` **Transformations applied** (via `/tmp/homogenize-adrs.py` script, 23 files scanned, 21 modified — 0010 and 0012 were already conform) : - F1 list bullets → bold fields - F2 cleanup : `**Deciders:**` → `**Authors:**`, strip status emojis - F3 sections : `## Status\n**Value** ✅` → `**Status:** Value` (single line) - Strip decorative emojis from `**Status:**` and `**Implementation Status:**` - Convert `* Last Updated:` / `* Implementation Status:` / `* Decision Drivers:` / `* Decision Date:` to bold - Date typo fix : `2024-04-XX` → `2026-04-XX` for ADRs 0018, 0019 (off-by-2-years in original) - Normalize multiple blank lines after header (max 1) **ADR body content is preserved unchanged.** Only headers transformed. ### 2. Rewrite `adr/README.md` (commit `d64ab02`) Previous README had multiple inconsistencies : - Index table listed wrong titles for ADRs 0010-0021 (looked like an aspirational forecast that never matched reality — e.g. "0011 = Trunk-Based Development" but real 0011 is absent and Trunk-Based Development is actually 0017) - Listed entries for ADRs 0011 (validation library) and 0014 (gRPC) but **these files do not exist** in the repo - 0024 (BDD Test Organization) was missing from the detail list - Template still showed the obsolete F1 format (`* Status:`) - Decorative emojis on every status entry Rewrite : - Index table **regenerated from actual file contents** (title from H1, status from `**Status:**` line) — emoji-free, accurate - Notes that 0011 / 0014 are not currently in use (reserved) - Updated template block matches the canonical format - Status Legend extended with `Approved`, `Partially Implemented`, `Deferred` - Added note that 0026 is the next free number for new ADRs ## Test plan - [x] All 23 ADRs follow `**Status:**` / `**Date:**` / `**Authors:**` (verified via grep) - [x] No more occurrences of `* Status:` (F1) or `## Status` (F3) in any ADR header - [x] No more emojis on `**Status:**` lines - [x] `adr/README.md` index links resolve to existing files (no more 0011 / 0014 dead links) - [x] Pre-commit hooks pass (`go mod tidy`, `go fmt`, `swag fmt`) ## Migration context Part of Phase 1 of the ARCODANGE migration from Claude Code to Mistral Vibe. Tâche 7 of the curriculum. Independent from PR #17 (which restructures `AGENTS.md`). The two PRs touch disjoint files — no merge conflict expected when both are merged. 🤖 Generated with [Claude Code](https://claude.com/claude-code) (Opus 4.7, 1M context). Mistral Vibe (intent-router agent / mistral-medium-3.5) did the original audit identifying the 3 formats during Tâche 6 Phase A. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: Mistral Vibe (devstral-2 / mistral-medium-3.5) Reviewed-on: #18 Co-authored-by: Gabriel Radureau <arcodange@gmail.com> Co-committed-by: Gabriel Radureau <arcodange@gmail.com>
468 lines
13 KiB
Markdown
468 lines
13 KiB
Markdown
# 10. JWT Secret Retention Policy
|
||
|
||
**Status:** Proposed
|
||
|
||
## Context
|
||
|
||
The dance-lessons-coach application requires a robust JWT secret management system that balances security and user experience. As implemented in [ADR-0009](0009-hybrid-testing-approach.md), the system supports multiple JWT secrets for graceful rotation. However, the current implementation lacks a clear policy for secret retention and cleanup.
|
||
|
||
### Current State
|
||
|
||
- ✅ Multiple JWT secrets supported
|
||
- ✅ Graceful rotation implemented
|
||
- ✅ Backward compatibility maintained
|
||
- ❌ No automatic cleanup of old secrets
|
||
- ❌ No configurable retention periods
|
||
- ❌ No expiration-based secret management
|
||
|
||
### Problem Statement
|
||
|
||
Without a retention policy:
|
||
1. **Security Risk**: Old secrets accumulate indefinitely, increasing attack surface
|
||
2. **Memory Bloat**: Unbounded growth of secret storage
|
||
3. **Operational Overhead**: Manual cleanup required
|
||
4. **Compliance Issues**: May violate security policies requiring regular key rotation
|
||
|
||
### Requirements
|
||
|
||
1. **Configurable Retention**: Administrators should control how long secrets are retained
|
||
2. **Automatic Cleanup**: System should automatically remove expired secrets
|
||
3. **Backward Compatibility**: Existing tokens should continue working during retention period
|
||
4. **Sensible Defaults**: Should work out-of-the-box with secure defaults
|
||
5. **Performance**: Cleanup should not impact runtime performance
|
||
|
||
## Decision
|
||
|
||
### JWT Secret Retention Policy
|
||
|
||
Implement a configurable retention policy based on JWT TTL (Time-To-Live) with the following components:
|
||
|
||
#### 1. Configuration Structure
|
||
|
||
```yaml
|
||
jwt:
|
||
# Token time-to-live (default: 24h)
|
||
ttl: 24h
|
||
|
||
# Secret retention configuration
|
||
secret_retention:
|
||
# Retention factor multiplier (default: 2.0)
|
||
# Retention period = JWT TTL × retention_factor
|
||
retention_factor: 2.0
|
||
|
||
# Maximum retention period (safety limit, default: 72h)
|
||
max_retention: 72h
|
||
|
||
# Cleanup frequency for expired secrets (default: 1h)
|
||
cleanup_interval: 1h
|
||
```
|
||
|
||
#### 2. Retention Period Calculation
|
||
|
||
```
|
||
retention_period = min(JWT_TTL × retention_factor, max_retention)
|
||
```
|
||
|
||
**Examples:**
|
||
- Default (24h TTL, 2.0 factor): `min(48h, 72h) = 48h`
|
||
- Short-lived tokens (1h TTL, 3.0 factor): `min(3h, 72h) = 3h`
|
||
- Long-lived tokens (72h TTL, 2.0 factor): `min(144h, 72h) = 72h`
|
||
|
||
#### 3. Secret Lifecycle
|
||
|
||
```mermaid
|
||
graph LR
|
||
A[Secret Created] --> B[Active Period]
|
||
B --> C{Retention Period}
|
||
C -->|Expired| D[Marked for Cleanup]
|
||
C -->|Valid| B
|
||
D --> E[Automatic Removal]
|
||
```
|
||
|
||
#### 4. Cleanup Process
|
||
|
||
- **Frequency**: Configurable interval (default: 1 hour)
|
||
- **Scope**: Remove secrets older than retention period
|
||
- **Safety**: Never remove current primary secret
|
||
- **Logging**: Audit trail of cleanup operations
|
||
|
||
### Implementation Strategy
|
||
|
||
#### Phase 1: Configuration Framework
|
||
|
||
1. **Extend Config Package** (`pkg/config/config.go`)
|
||
- Add JWT TTL configuration
|
||
- Add secret retention parameters
|
||
- Implement validation
|
||
|
||
2. **Environment Variables**
|
||
```bash
|
||
# JWT Token TTL
|
||
DLC_JWT_TTL=24h
|
||
|
||
# Secret Retention
|
||
DLC_JWT_SECRET_RETENTION_FACTOR=2.0
|
||
DLC_JWT_SECRET_MAX_RETENTION=72h
|
||
DLC_JWT_SECRET_CLEANUP_INTERVAL=1h
|
||
```
|
||
|
||
#### Phase 2: Secret Manager Enhancement
|
||
|
||
1. **Enhance JWTSecret Struct**
|
||
```go
|
||
type JWTSecret struct {
|
||
Secret string
|
||
IsPrimary bool
|
||
CreatedAt time.Time
|
||
ExpiresAt *time.Time // Now properly calculated
|
||
RetentionPeriod time.Duration
|
||
}
|
||
```
|
||
|
||
2. **Add Expiration Logic**
|
||
```go
|
||
func (m *JWTSecretManager) AddSecret(secret string, isPrimary bool, expiresIn time.Duration) {
|
||
// Calculate retention period based on config
|
||
retentionPeriod := m.calculateRetentionPeriod()
|
||
expiresAt := time.Now().Add(expiresIn)
|
||
|
||
m.secrets = append(m.secrets, JWTSecret{
|
||
Secret: secret,
|
||
IsPrimary: isPrimary,
|
||
CreatedAt: time.Now(),
|
||
ExpiresAt: &expiresAt,
|
||
RetentionPeriod: retentionPeriod,
|
||
})
|
||
}
|
||
```
|
||
|
||
#### Phase 3: Automatic Cleanup
|
||
|
||
1. **Background Cleanup Job**
|
||
```go
|
||
func (m *JWTSecretManager) StartCleanupJob(ctx context.Context, interval time.Duration) {
|
||
ticker := time.NewTicker(interval)
|
||
go func() {
|
||
for {
|
||
select {
|
||
case <-ticker.C:
|
||
m.CleanupExpiredSecrets()
|
||
case <-ctx.Done():
|
||
ticker.Stop()
|
||
return
|
||
}
|
||
}
|
||
}()
|
||
}
|
||
```
|
||
|
||
2. **Cleanup Implementation**
|
||
```go
|
||
func (m *JWTSecretManager) CleanupExpiredSecrets() {
|
||
now := time.Now()
|
||
var activeSecrets []JWTSecret
|
||
|
||
for _, secret := range m.secrets {
|
||
if secret.IsPrimary {
|
||
// Never remove current primary
|
||
activeSecrets = append(activeSecrets, secret)
|
||
continue
|
||
}
|
||
|
||
// Check if secret is within retention period
|
||
if now.Sub(secret.CreatedAt) <= secret.RetentionPeriod {
|
||
activeSecrets = append(activeSecrets, secret)
|
||
} else {
|
||
log.Info().
|
||
Str("secret", secret.Secret).
|
||
Msg("Removed expired JWT secret")
|
||
}
|
||
}
|
||
|
||
m.secrets = activeSecrets
|
||
}
|
||
```
|
||
|
||
#### Phase 4: Integration
|
||
|
||
1. **Server Initialization**
|
||
```go
|
||
func (s *Server) InitializeJWT() error {
|
||
// Load config
|
||
jwtConfig := s.config.GetJWTConfig()
|
||
|
||
// Create secret manager with retention policy
|
||
secretManager := NewJWTSecretManager(
|
||
jwtConfig.Secret,
|
||
WithRetentionFactor(jwtConfig.RetentionFactor),
|
||
WithMaxRetention(jwtConfig.MaxRetention),
|
||
)
|
||
|
||
// Start cleanup job
|
||
secretManager.StartCleanupJob(s.ctx, jwtConfig.CleanupInterval)
|
||
|
||
return nil
|
||
}
|
||
```
|
||
|
||
### Validation
|
||
|
||
#### 1. Configuration Validation
|
||
|
||
```go
|
||
func (c *Config) ValidateJWTConfig() error {
|
||
if c.JWT.TTL <= 0 {
|
||
return fmt.Errorf("jwt.ttl must be positive")
|
||
}
|
||
|
||
if c.JWT.SecretRetention.RetentionFactor < 1.0 {
|
||
return fmt.Errorf("jwt.secret_retention.retention_factor must be ≥ 1.0")
|
||
}
|
||
|
||
if c.JWT.SecretRetention.MaxRetention <= 0 {
|
||
return fmt.Errorf("jwt.secret_retention.max_retention must be positive")
|
||
}
|
||
|
||
if c.JWT.SecretRetention.CleanupInterval <= 0 {
|
||
return fmt.Errorf("jwt.secret_retention.cleanup_interval must be positive")
|
||
}
|
||
|
||
// Ensure max retention is reasonable
|
||
if c.JWT.SecretRetention.MaxRetention > 720h { // 30 days
|
||
return fmt.Errorf("jwt.secret_retention.max_retention exceeds maximum of 720h")
|
||
}
|
||
|
||
return nil
|
||
}
|
||
```
|
||
|
||
#### 2. Runtime Validation
|
||
|
||
```go
|
||
func (m *JWTSecretManager) ValidateSecret(secret string) error {
|
||
// Check minimum length
|
||
if len(secret) < 16 {
|
||
return fmt.Errorf("jwt secret must be at least 16 characters")
|
||
}
|
||
|
||
// Check entropy (basic check)
|
||
if !hasSufficientEntropy(secret) {
|
||
return fmt.Errorf("jwt secret must have sufficient entropy")
|
||
}
|
||
|
||
return nil
|
||
}
|
||
```
|
||
|
||
### Monitoring and Observability
|
||
|
||
#### 1. Metrics
|
||
|
||
```go
|
||
// Prometheus metrics
|
||
var (
|
||
jwtSecretsActive = prometheus.NewGauge(prometheus.GaugeOpts{
|
||
Name: "jwt_secrets_active_count",
|
||
Help: "Number of active JWT secrets",
|
||
})
|
||
|
||
jwtSecretsExpired = prometheus.NewCounter(prometheus.CounterOpts{
|
||
Name: "jwt_secrets_expired_total",
|
||
Help: "Total number of expired JWT secrets removed",
|
||
})
|
||
|
||
jwtSecretRetentionDuration = prometheus.NewHistogram(prometheus.HistogramOpts{
|
||
Name: "jwt_secret_retention_duration_seconds",
|
||
Help: "Duration of JWT secret retention periods",
|
||
Buckets: prometheus.ExponentialBuckets(3600, 2, 6), // 1h to 32h
|
||
})
|
||
)
|
||
```
|
||
|
||
#### 2. Logging
|
||
|
||
```go
|
||
func (m *JWTSecretManager) logSecretEvent(secret string, event string, details ...interface{}) {
|
||
log.Info().
|
||
Str("secret", maskSecret(secret)).
|
||
Str("event", event).
|
||
Interface("details", details).
|
||
Msg("JWT secret event")
|
||
}
|
||
|
||
func maskSecret(secret string) string {
|
||
if len(secret) <= 4 {
|
||
return "****"
|
||
}
|
||
return secret[:4] + "****" + secret[len(secret)-4:]
|
||
}
|
||
```
|
||
|
||
## Consequences
|
||
|
||
### Positive
|
||
|
||
1. **Enhanced Security**: Automatic cleanup reduces attack surface
|
||
2. **Reduced Memory Usage**: Prevents unbounded growth of secret storage
|
||
3. **Operational Efficiency**: No manual cleanup required
|
||
4. **Compliance Ready**: Meets security policy requirements for key rotation
|
||
5. **Flexibility**: Configurable to meet different security requirements
|
||
|
||
### Negative
|
||
|
||
1. **Complexity**: Adds configuration and cleanup logic
|
||
2. **Performance Overhead**: Background cleanup job (minimal impact)
|
||
3. **Migration**: Existing deployments need configuration updates
|
||
4. **Debugging**: More moving parts to troubleshoot
|
||
|
||
### Neutral
|
||
|
||
1. **Backward Compatibility**: Existing tokens continue to work
|
||
2. **Learning Curve**: New configuration options to understand
|
||
3. **Monitoring**: Additional metrics to track
|
||
|
||
## Alternatives Considered
|
||
|
||
### Alternative 1: Fixed Retention Period
|
||
|
||
**Proposal**: Use fixed retention period (e.g., 48 hours) instead of TTL-based calculation
|
||
|
||
**Rejected Because**:
|
||
- Less flexible for different use cases
|
||
- Doesn't scale with JWT TTL changes
|
||
- May be too short for long-lived tokens or too long for short-lived ones
|
||
|
||
### Alternative 2: Manual Cleanup Only
|
||
|
||
**Proposal**: Require administrators to manually clean up old secrets
|
||
|
||
**Rejected Because**:
|
||
- Operational overhead
|
||
- Security risk if cleanup is forgotten
|
||
- Doesn't scale for frequent rotations
|
||
|
||
### Alternative 3: No Retention (Current State)
|
||
|
||
**Proposal**: Keep current behavior with no automatic cleanup
|
||
|
||
**Rejected Because**:
|
||
- Security concerns with accumulating secrets
|
||
- Memory management issues
|
||
- Compliance violations
|
||
|
||
## Success Metrics
|
||
|
||
1. **Security**: No old secrets remain beyond retention period
|
||
2. **Reliability**: 99.9% of valid tokens continue to work during rotation
|
||
3. **Performance**: Cleanup job completes in <100ms with <1000 secrets
|
||
4. **Adoption**: Configuration used in 100% of deployments within 3 months
|
||
|
||
## Migration Plan
|
||
|
||
### Phase 1: Preparation (1 week)
|
||
- ✅ Create this ADR
|
||
- ✅ Update documentation
|
||
- ✅ Add configuration to config package
|
||
- ✅ Implement basic retention logic
|
||
|
||
### Phase 2: Testing (2 weeks)
|
||
- ✅ Write BDD scenarios for retention
|
||
- ✅ Add unit tests for secret manager
|
||
- ✅ Test with various TTL/factor combinations
|
||
- ✅ Performance testing with large secret counts
|
||
|
||
### Phase 3: Rollout (1 week)
|
||
- ✅ Update default configuration
|
||
- ✅ Add feature flag for gradual rollout
|
||
- ✅ Monitor metrics in staging
|
||
- ✅ Gradual production rollout
|
||
|
||
### Phase 4: Optimization (Ongoing)
|
||
- ✅ Monitor cleanup performance
|
||
- ✅ Adjust defaults based on real-world usage
|
||
- ✅ Add alerts for cleanup failures
|
||
- ✅ Document troubleshooting guide
|
||
|
||
## References
|
||
|
||
- [ADR-0009: Hybrid Testing Approach](0009-hybrid-testing-approach.md)
|
||
- [ADR-0008: BDD Testing](0008-bdd-testing.md)
|
||
- [RFC 7519: JSON Web Tokens](https://tools.ietf.org/html/rfc7519)
|
||
- [OWASP Key Management Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Key_Management_Cheat_Sheet.html)
|
||
|
||
## Appendix
|
||
|
||
### Configuration Examples
|
||
|
||
**Development Environment** (short retention for testing):
|
||
```yaml
|
||
jwt:
|
||
ttl: 1h
|
||
secret_retention:
|
||
retention_factor: 1.5
|
||
max_retention: 3h
|
||
cleanup_interval: 30m
|
||
```
|
||
|
||
**Production Environment** (secure defaults):
|
||
```yaml
|
||
jwt:
|
||
ttl: 24h
|
||
secret_retention:
|
||
retention_factor: 2.0
|
||
max_retention: 72h
|
||
cleanup_interval: 1h
|
||
```
|
||
|
||
**High-Security Environment** (aggressive rotation):
|
||
```yaml
|
||
jwt:
|
||
ttl: 8h
|
||
secret_retention:
|
||
retention_factor: 1.5
|
||
max_retention: 24h
|
||
cleanup_interval: 30m
|
||
```
|
||
|
||
### Troubleshooting
|
||
|
||
**Issue**: Secrets being removed too quickly
|
||
- **Check**: Retention factor and JWT TTL settings
|
||
- **Fix**: Increase retention_factor or JWT TTL
|
||
|
||
**Issue**: Too many old secrets accumulating
|
||
- **Check**: Cleanup job logs and interval
|
||
- **Fix**: Decrease cleanup_interval or retention_factor
|
||
|
||
**Issue**: Performance degradation during cleanup
|
||
- **Check**: Number of secrets and cleanup frequency
|
||
- **Fix**: Optimize cleanup algorithm or increase interval
|
||
|
||
### FAQ
|
||
|
||
**Q: What happens to tokens signed with expired secrets?**
|
||
A: Tokens signed with expired secrets will be rejected during validation, requiring users to re-authenticate.
|
||
|
||
**Q: Can I disable automatic cleanup?**
|
||
A: Yes, set `cleanup_interval` to a very high value (e.g., `8760h` for 1 year).
|
||
|
||
**Q: How does this affect existing deployments?**
|
||
A: Existing deployments will use sensible defaults. The feature is backward compatible.
|
||
|
||
**Q: What's the recommended retention factor?**
|
||
A: Start with 2.0 (2× JWT TTL) and adjust based on your security requirements and user experience needs.
|
||
|
||
**Q: How often should cleanup run?**
|
||
A: For most deployments, every 1 hour is sufficient. High-volume systems may need more frequent cleanup.
|
||
|
||
## Decision Record
|
||
|
||
**Approved By**:
|
||
**Approved Date**:
|
||
**Implemented By**:
|
||
**Implementation Date**:
|
||
|
||
---
|
||
|
||
*Generated by Mistral Vibe*
|
||
*Co-Authored-By: Mistral Vibe <vibe@mistral.ai>* |