Files
dance-lessons-coach/adr/0019-postgresql-integration.md
Gabriel Radureau f39a0df338 🧪 test: add JWT edge case scenarios with validation endpoint
- Add expired JWT token scenario

- Add wrong secret JWT token scenario

- Add malformed JWT token scenario

- Implement /api/v1/auth/validate endpoint

- Add JWT parsing and validation to BDD steps

Generated by Mistral Vibe.

Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-04-07 18:21:56 +02:00

23 KiB

19. PostgreSQL Database Integration

Date: 2024-04-07 Status: Proposed Authors: Product Owner Decision Drivers: Data Persistence, Scalability, Production Readiness

Context

The DanceLessonsCoach application currently uses SQLite with GORM for the user management system (ADR 0018), but since there are no existing users or production data, we can implement PostgreSQL directly as our primary database without migration concerns.

Current State

  • Database: SQLite (in-memory mode) - no persistent data
  • ORM: GORM v1.31.1
  • Implementation: pkg/user/sqlite_repository.go
  • Usage: User management system only
  • Data: No existing users or production data

Implementation Drivers

  1. Production Readiness: PostgreSQL is enterprise-grade and production-ready
  2. Data Persistence: Proper persistent storage for user accounts
  3. Concurrency: PostgreSQL handles concurrent connections better
  4. Scalability: PostgreSQL supports horizontal scaling
  5. Features: Advanced PostgreSQL features (JSONB, full-text search)
  6. Ecosystem: Better tooling and monitoring for PostgreSQL

Decision

We will implement PostgreSQL database directly, replacing the SQLite implementation with the following characteristics:

Core Features

  1. Database Setup

    • PostgreSQL 15+ for production compatibility
    • Containerized development environment
    • Connection pooling for performance
    • SSL support for secure connections
  2. ORM Integration

    • GORM as the primary ORM
    • Interface-based repository pattern
    • Database migrations for schema management
    • Transaction support for data integrity
  3. Configuration Management

    • Viper integration for database settings
    • Environment variable support with DLC_ prefix
    • Multiple environment support (dev, staging, prod)
    • Connection health checking
  4. Integration Points

    • User management system (ADR 0018)
    • Existing greet service (for future personalization)
    • OpenTelemetry tracing integration
    • Zerolog structured logging

Technical Implementation

Database Schema Foundation

-- Users table (from ADR 0018)
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    deleted_at TIMESTAMP WITH TIME ZONE,
    username VARCHAR(50) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    description TEXT,
    current_goal TEXT,
    is_admin BOOLEAN DEFAULT FALSE,
    allow_password_reset BOOLEAN DEFAULT FALSE,
    last_login TIMESTAMP WITH TIME ZONE
);

-- Greet history table (future extension)
CREATE TABLE greet_history (
    id SERIAL PRIMARY KEY,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    user_id INTEGER REFERENCES users(id),
    message TEXT NOT NULL,
    context JSONB
);

Technology Stack

  • Database: PostgreSQL 15+ - production-ready relational database
  • ORM: GORM v1.25+ - aligns with interface-based design
  • Migrations: GORM AutoMigrate + custom SQL migrations
  • Connection Pooling: PgBouncer-compatible connection management
  • Configuration: Viper integration - consistent with existing patterns
  • Logging: Zerolog integration - structured database logging
  • Telemetry: OpenTelemetry database instrumentation

Architecture Alignment

The PostgreSQL integration follows established DanceLessonsCoach patterns:

  1. Interface-based Design:

    type DatabaseRepository interface {
        GetDB() *gorm.DB
        Close() error
        HealthCheck(ctx context.Context) error
        BeginTransaction(ctx context.Context) (*gorm.DB, error)
    }
    
    type UserRepository interface {
        CreateUser(ctx context.Context, user *User) error
        GetUserByUsername(ctx context.Context, username string) (*User, error)
        // ... other methods
    }
    
  2. Context-aware Services:

    func (r *PostgresUserRepository) CreateUser(ctx context.Context, user *User) error {
        log.Trace().Ctx(ctx).Str("username", user.Username).Msg("Creating user")
        return r.db.WithContext(ctx).Create(user).Error
    }
    
  3. Configuration Integration:

    type DatabaseConfig struct {
        Type        string        `mapstructure:"type"`  // sqlite, postgres, auto
        Host        string        `mapstructure:"host"`
        Port        int           `mapstructure:"port"`
        User        string        `mapstructure:"user"`
        Password    string        `mapstructure:"password"`
        Name        string        `mapstructure:"name"`
        SSLMode     string        `mapstructure:"ssl_mode"`
        MaxOpenConns int         `mapstructure:"max_open_conns"`
        MaxIdleConns int         `mapstructure:"max_idle_conns"`
        ConnMaxLifetime time.Duration `mapstructure:"conn_max_lifetime"`
    }
    
  4. Graceful Shutdown Integration:

    func (s *Server) Shutdown(ctx context.Context) error {
        // Close database connections gracefully
        if s.userRepo != nil {
            if err := s.userRepo.Close(); err != nil {
                log.Error().Err(err).Msg("User repository shutdown failed")
                // Continue shutdown even if database fails
            }
        }
    
        // The readiness endpoint already handles shutdown detection via s.readyCtx
        // No need for atomic operations - the context-based approach is cleaner
    
        // Continue with existing HTTP server shutdown
        return s.httpServer.Shutdown(ctx)
    }
    
  5. Readiness Endpoint Integration:

    func (s *Server) handleReadiness(w http.ResponseWriter, r *http.Request) {
        // Check database health if using persistent database
        if s.config.GetDatabaseType() != "sqlite" {
            if err := s.userRepo.CheckDatabaseHealth(r.Context()); err != nil {
                log.Warn().Err(err).Msg("Database health check failed")
                s.writeJSONResponse(w, http.StatusServiceUnavailable, map[string]interface{}{
                    "ready": false,
                    "reason": "database_unhealthy",
                    "error": err.Error(),
                })
                return
            }
        }
    
        // Existing readiness logic
        select {
        case <-s.readyCtx.Done():
            s.writeJSONResponse(w, http.StatusServiceUnavailable, map[string]interface{}{
                "ready": false,
                "reason": "shutting_down",
            })
        default:
            s.writeJSONResponse(w, http.StatusOK, map[string]interface{}{
                "ready": true,
            })
        }
    }
    

Implementation Strategy

Phase 1: PostgreSQL Repository Implementation

  1. Replace Dependencies:

    # Remove SQLite dependencies
    go get gorm.io/driver/postgres
    go get github.com/lib/pq  # PostgreSQL driver
    go mod tidy  # Clean up unused dependencies
    
  2. Create PostgreSQL Repository:

    • pkg/user/postgres_repository.go - PostgreSQL implementation
    • Implement UserRepository interface directly
    • Add PostgreSQL-specific connection management
  3. Docker Setup:

    • Create docker-compose.yml with PostgreSQL 16 service (current stable version)
    • Add initialization scripts for development
    • Configure health checks and monitoring
    • Use Alpine-based image for smaller footprint
  4. Configuration:

    • Add DatabaseConfig to existing config structure
    • Environment variables with DLC_ prefix
    • Connection validation and health checking

Phase 2: Server Integration

  1. Update Server Initialization:

    • Modify initializeUserServices() in pkg/server/server.go
    • Replace SQLite repository with PostgreSQL repository
    • Update error handling and logging
  2. Remove SQLite Code:

    • Delete pkg/user/sqlite_repository.go
    • Clean up any SQLite-specific references
    • Update imports and dependencies
  3. Enhance Health Checks:

    • Add database health check to readiness endpoint
    • Implement connection pooling monitoring
    • Add startup health validation

Phase 3: Testing & Validation

  1. BDD Test Integration:

    • Updated test server configuration with PostgreSQL settings
    • Automatic PostgreSQL container startup in test script
    • Health checks for database readiness before tests
    • Separate BDD test database (dance_lessons_coach_bdd_test)
    • Complete isolation from development/production databases
  2. Test Script Enhancement:

    • scripts/run-bdd-tests.sh now starts PostgreSQL if needed
    • Automatic BDD database creation using createdb command
    • Checks for existing BDD database before creating
    • Waits for database readiness before running tests
    • Proper error handling and timeout management
    • Reuses existing container if already running
  3. Database Isolation Strategy:

    • Development: dance_lessons_coach (config.yaml)
    • BDD Tests: dance_lessons_coach_bdd_test (automatically created)
    • Production: Custom name per environment
    • Manual Testing: Developers can use development database
  4. Unit & Integration Tests:

    • Repository method testing with PostgreSQL
    • Transaction and error case testing
    • Performance benchmarks
    • Connection failure scenarios
  5. Graceful Shutdown Testing:

    • Database connection cleanup during shutdown
    • Readiness endpoint behavior during shutdown
    • Connection pool behavior under stress

Phase 4: Documentation & Finalization

  1. Documentation Updates:

    • Update AGENTS.md with PostgreSQL setup instructions
    • Add database configuration guide
    • Create development setup documentation
    • Update BDD test documentation
  2. Cleanup:

    • Remove all SQLite references from code
    • Update go.mod and go.sum
    • Verify no unused imports or dependencies
  3. Production Readiness:

    • Add database health monitoring
    • Configure connection pooling for production
    • Add environment-specific configurations
  4. User Model & Repository:

    • pkg/user/models.go - GORM user model
    • pkg/user/repository.go - GORM implementation
    • pkg/user/repository_mock.go - Mock for testing
  5. Database Integration:

    • Implement UserRepository interface
    • Add transaction support
    • Implement health checks
  6. Testing Setup:

    • Test container for PostgreSQL
    • Integration test suite
    • Mock-based unit tests

Phase 3: Service Integration

  1. Auth Service Integration:

    • Update auth service to use user repository
    • Implement JWT token persistence
    • Add session management
  2. Greet Service Extension:

    • Add greet history tracking
    • Implement user-specific greetings
    • Add database logging
  3. API Endpoints:

    • Health check endpoint: GET /api/health/db
    • Database metrics endpoint: GET /api/metrics/db

Phase 4: Testing & Validation

  1. BDD Test Integration:

    • Temporary test database setup
    • Test container for PostgreSQL
    • Clean database between scenarios
    • Test data isolation
  2. Unit & Integration Tests:

    • Repository method testing
    • Transaction testing
    • Error case testing
    • Performance benchmarks
  3. Fallback Testing:

    • SQLite fallback scenarios
    • Connection failure handling
    • Graceful degradation

Consequences

Positive

  1. Data Persistence: User accounts and application data properly persisted
  2. Production Ready: PostgreSQL is enterprise-grade database
  3. Scalability: Better concurrent connection handling
  4. Simplified Architecture: Direct PostgreSQL implementation without migration complexity
  5. Clean Codebase: No legacy SQLite code or dual implementation
  6. Future-Proof: Foundation for all future data-driven features

Negative

  1. Dependency Changes: Replacing SQLite with PostgreSQL dependencies
  2. Operational Overhead: Database container management
  3. Learning Curve: PostgreSQL-specific features and optimization
  4. Testing Requirements: Comprehensive testing needed for new implementation

Neutral

  1. Code Changes: Repository implementation replacement
  2. Configuration Updates: New database configuration structure
  3. Development Workflow: Docker-based database for local development

Alternatives Considered

Alternative 1: Keep SQLite with File Persistence

  • Pros: Simple, no new dependencies, works for small-scale
  • Cons: Not production-grade, limited concurrency, file-based limitations
  • Rejected: Doesn't meet long-term production requirements

Alternative 2: Dual Implementation with Fallback

  • Pros: Smooth migration path, backward compatibility
  • Cons: Complex codebase, testing overhead, maintenance burden
  • Rejected: Unnecessary complexity since no existing data or users

Alternative 2: MySQL

  • Pros: Widely used, good community support
  • Cons: Different ecosystem, licensing concerns
  • Rejected: PostgreSQL better fits our needs

Alternative 3: MongoDB

  • Pros: Flexible schema, document-oriented
  • Cons: NoSQL approach, different query patterns
  • Rejected: Relational data better suits our model

Alternative 4: Pure SQL (no ORM)

  • Pros: No ORM overhead, direct control
  • Cons: More boilerplate, manual query building
  • Rejected: GORM provides good balance

Graceful Shutdown & Readiness Integration

Database Connection Lifecycle

The PostgreSQL integration must properly handle the server lifecycle:

  1. Startup Sequence:

    • Initialize database connections
    • Run health check
    • Set readiness to true only if database is healthy
    • Log connection details at trace level
  2. Runtime Operation:

    • Monitor database connection health
    • Handle connection failures gracefully
    • Implement connection retry logic
    • Log connection issues appropriately
  3. Shutdown Sequence:

    • Set readiness to false immediately
    • Close all database connections
    • Wait for in-flight queries to complete
    • Handle shutdown timeouts gracefully
    • Log shutdown progress

Readiness Endpoint Enhancement

The existing /api/ready endpoint already has the correct nested structure for service health checks. We'll enhance it to include PostgreSQL database health:

Current Structure:

{
  "ready": true,
  "connections": {
    "database": {
      "status": "healthy"
    }
  }
}

Health Check Logic:

func (r *PostgresUserRepository) CheckDatabaseHealth(ctx context.Context) error {
    // Simple query to test connectivity
    var count int64
    result := r.db.WithContext(ctx).Model(&User{}).Count(&count)
    if result.Error != nil {
        return fmt.Errorf("database health check failed: %w", result.Error)
    }
    return nil
}

Readiness Response States:

  • Healthy: {"ready": true, "connections": {"database": {"status": "healthy"}}}
  • Database Unhealthy: {"ready": false, "reason": "database_unhealthy", "connections": {"database": {"status": "unhealthy", "error": "connection refused"}}}
  • Shutting Down: {"ready": false, "reason": "server_shutting_down", "connections": {"database": "not_checked"}}
  • Not Configured: {"ready": true, "connections": {"database": {"status": "not_configured"}}} (for SQLite mode)

Connection Pool Management

Proper connection pool configuration for graceful shutdown:

// In database initialization
sqlDB, err := db.DB()
if err != nil {
    return nil, fmt.Errorf("failed to get SQL DB: %w", err)
}

// Configure connection pool
sqlDB.SetMaxOpenConns(cfg.MaxOpenConns)
sqlDB.SetMaxIdleConns(cfg.MaxIdleConns)
sqlDB.SetConnMaxLifetime(cfg.ConnMaxLifetime)

// Configure graceful connection handling
sqlDB.SetConnMaxIdleTime(time.Minute * 5)
sqlDB.SetConnMaxLifetime(time.Hour * 1)

Shutdown Timeout Handling

func (s *Server) Shutdown(ctx context.Context) error {
    // Create shutdown context with timeout
    shutdownCtx, cancel := context.WithTimeout(ctx, s.config.GetShutdownTimeout())
    defer cancel()
    
    // Close database connections with timeout
    done := make(chan struct{})
    go func() {
        if s.userRepo != nil {
            if err := s.userRepo.Close(); err != nil {
                log.Error().Err(err).Msg("Database shutdown error")
            }
        }
        close(done)
    }()
    
    select {
    case <-done:
        log.Trace().Msg("Database shutdown completed")
    case <-shutdownCtx.Done():
        log.Warn().Msg("Database shutdown timed out, forcing closure")
    }
    
    return s.httpServer.Shutdown(shutdownCtx)
}

Alignment with Existing Architecture

This implementation builds upon completed phases:

  • Phase 1-3: Uses Go 1.26.1, Chi router, Zerolog, interface-based design
  • Phase 5: Extends Viper configuration management
  • Phase 6: Integrates with graceful shutdown patterns and readiness endpoints
  • Phase 7: Maintains OpenTelemetry compatibility
  • Phase 8: Follows existing build system patterns
  • Phase 9: Preserves trace-level logging approach
  • Phase 18: Supports user management system

Backward Compatibility

The implementation maintains full backward compatibility:

  1. API Endpoints: Existing endpoints unchanged
  2. Configuration: All existing config options preserved
  3. Logging: Maintains existing Zerolog integration
  4. Telemetry: OpenTelemetry continues to work
  5. Error Handling: Consistent error patterns

Success Metrics

  1. Reliability: 99.9% database uptime
  2. Performance: <100ms average query time
  3. Scalability: Support 1000+ concurrent connections
  4. Data Integrity: Zero data corruption incidents
  5. Adoption: All new features use database storage

Open Questions

  1. What should be the connection pool size for production?
  2. Should we implement read replicas for scaling?
  3. What backup strategy should we implement?
  4. Should we add database connection health metrics?
  5. What query timeout should we set for production?

Database Cleanup Strategy

Decision: Raw SQL Cleanup Between Scenarios

Approach: Use raw SQL DELETE statements with SET CONSTRAINTS ALL DEFERRED to clean up database between test scenarios

Rationale:

  • Black Box Principle: BDD tests should not depend on implementation details
  • Foreign Key Safety: SET CONSTRAINTS ALL DEFERRED allows proper handling of constraints (PostgreSQL docs: https://www.postgresql.org/docs/current/sql-set-constraints.html)
  • Migration Compatibility: Works regardless of schema changes
  • Transaction Safety: Uses explicit transactions with proper rollback handling

Alternatives Considered:

  1. Repository-based cleanup - Rejected: Violates black box principle
  2. Transaction rollback - Rejected: Complex with nested transactions
  3. Recreate database - Rejected: Too slow for frequent test runs
  4. Separate test database - Chosen: Combined with SQL cleanup

Implementation Details

Cleanup Process:

  1. Disable constraints temporarily: SET CONSTRAINTS ALL DEFERRED
  2. Query all tables: From information_schema.tables
  3. Delete in reverse order: Handle foreign key dependencies
  4. Reset sequences: ALTER SEQUENCE ... RESTART WITH 1

Execution Timing:

  • AfterSuite: Full cleanup after all scenarios
  • Between Scenarios: Individual scenario cleanup (future enhancement)

Benefits:

  • Fast execution: Milliseconds vs seconds for recreation
  • Reliable: Handles schema changes automatically
  • Isolated: Each test gets clean state
  • Maintainable: No dependency on ORM or repositories

Temporary Database Approach

For BDD testing, we'll use temporary PostgreSQL databases to ensure:

  • Isolation: Each test run gets a clean database
  • Reproducibility: Consistent starting state
  • Performance: No interference between tests
  • CI/CD Compatibility: Works in containerized environments

Implementation Plan

  1. Test Container Setup:

    # Use testcontainers-go for PostgreSQL
    go get github.com/testcontainers/testcontainers-go
    go get github.com/testcontainers/testcontainers-go/modules/postgres
    
  2. BDD Test Configuration:

    • Create features/support/database.go
    • Implement BeforeScenario and AfterScenario hooks
    • Automatic database cleanup
    • Integrate with existing test suite structure
  3. Test Data Management:

    • Schema migration before each scenario
    • Transaction rollback for data isolation
    • Seed data for specific scenarios
    • Match existing BDD test patterns
  4. Configuration:

    # config.test.yaml
    database:
      host: "localhost"
      port: 5433  # Different from dev port
      name: "dance_lessons_coach_test"
      user: "test_user"
      password: "test_password"
    

Example Test Setup

// features/support/database.go
func BeforeScenario(ctx context.Context, sc *godog.Scenario) (context.Context, error) {
    // Start PostgreSQL container
    postgresContainer, err := postgres.RunContainer(ctx,
        testcontainers.WithImage("postgres:15-alpine"),
        postgres.WithDatabase("test_db"),
        postgres.WithUsername("test_user"),
        postgres.WithPassword("test_password"),
    )
    if err != nil {
        return ctx, err
    }
    
    // Get connection string
    connStr, err := postgresContainer.ConnectionString(ctx, "sslmode=disable")
    if err != nil {
        return ctx, err
    }
    
    // Store in context for test
    ctx = context.WithValue(ctx, "postgres_container", postgresContainer)
    ctx = context.WithValue(ctx, "postgres_conn_str", connStr)
    
    // Initialize user repository with test database
    config := config.GetTestConfig()
    config.Database.DSN = connStr
    
    repo, err := user.NewPostgresRepository(config)
    if err != nil {
        return ctx, err
    }
    
    // Store repository in context for scenario steps
    ctx = context.WithValue(ctx, "user_repository", repo)
    
    return ctx, nil
}

func AfterScenario(ctx context.Context, sc *godog.Scenario, err error) (context.Context, error) {
    // Clean up repository
    if repo, ok := ctx.Value("user_repository").(user.UserRepository); ok {
        repo.Close()
    }
    
    // Terminate PostgreSQL container
    if container, ok := ctx.Value("postgres_container").(testcontainers.Container); ok {
        if terminateErr := container.Terminate(ctx); terminateErr != nil {
            log.Error().Err(terminateErr).Msg("Failed to terminate PostgreSQL container")
        }
    }
    return ctx, err
}

Future Considerations

Immediate Next Steps (Post-Migration)

  1. CI/CD Integration: Add PostgreSQL to CI pipeline
  2. Performance Tuning: Query optimization
  3. Monitoring: Database health metrics
  4. Backup Strategy: Regular database backups

Long-Term Enhancements

  1. Database Sharding: For horizontal scaling
  2. Read Replicas: For read-heavy workloads
  3. Advanced Caching: Redis integration
  4. Database Monitoring: Prometheus exporter
  5. Backup Automation: Regular backup scheduling
  6. Query Optimization: Performance tuning

References

Approved by: [Product Owner] Approval Date: [To be determined] Implementation Target: Q2 2024