# 19. PostgreSQL Database Integration

**Date:** 2026-04-07
**Status:** Partially Implemented
**Authors:** Product Owner
**Decision Drivers:** Data Persistence, Scalability, Production Readiness

## Context

The dance-lessons-coach application currently uses SQLite with GORM for the user management system (ADR 0018), but since there are no existing users or production data, we can implement PostgreSQL directly as our primary database without migration concerns.

### Current State
- **Database:** SQLite (in-memory mode) - no persistent data
- **ORM:** GORM v1.31.1
- **Implementation:** `pkg/user/sqlite_repository.go`
- **Usage:** User management system only
- **Data:** No existing users or production data

### Implementation Drivers
1. **Production Readiness:** PostgreSQL is enterprise-grade and production-ready
2. **Data Persistence:** Proper persistent storage for user accounts
3. **Concurrency:** PostgreSQL handles concurrent connections better
4. **Scalability:** PostgreSQL supports horizontal scaling
5. **Features:** Advanced PostgreSQL features (JSONB, full-text search)
6. **Ecosystem:** Better tooling and monitoring for PostgreSQL

## Decision

We will implement PostgreSQL database directly, replacing the SQLite implementation with the following characteristics:

### Core Features

1. **Database Setup**
   - PostgreSQL 15+ for production compatibility
   - Containerized development environment
   - Connection pooling for performance
   - SSL support for secure connections

2. **ORM Integration**
   - GORM as the primary ORM
   - Interface-based repository pattern
   - Database migrations for schema management
   - Transaction support for data integrity

3. **Configuration Management**
   - Viper integration for database settings
   - Environment variable support with DLC_ prefix
   - Multiple environment support (dev, staging, prod)
   - Connection health checking

4. **Integration Points**
   - User management system (ADR 0018)
   - Existing greet service (for future personalization)
   - OpenTelemetry tracing integration
   - Zerolog structured logging

### Technical Implementation

#### Database Schema Foundation
```sql
-- Users table (from ADR 0018)
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    deleted_at TIMESTAMP WITH TIME ZONE,
    username VARCHAR(50) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    description TEXT,
    current_goal TEXT,
    is_admin BOOLEAN DEFAULT FALSE,
    allow_password_reset BOOLEAN DEFAULT FALSE,
    last_login TIMESTAMP WITH TIME ZONE
);

-- Greet history table (future extension)
CREATE TABLE greet_history (
    id SERIAL PRIMARY KEY,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    user_id INTEGER REFERENCES users(id),
    message TEXT NOT NULL,
    context JSONB
);
```

#### Technology Stack
- **Database:** PostgreSQL 15+ - production-ready relational database
- **ORM:** GORM v1.25+ - aligns with interface-based design
- **Migrations:** GORM AutoMigrate + custom SQL migrations
- **Connection Pooling:** PgBouncer-compatible connection management
- **Configuration:** Viper integration - consistent with existing patterns
- **Logging:** Zerolog integration - structured database logging
- **Telemetry:** OpenTelemetry database instrumentation


#### Architecture Alignment

The PostgreSQL integration follows established dance-lessons-coach patterns:

1. **Interface-based Design:**
   ```go
   type DatabaseRepository interface {
       GetDB() *gorm.DB
       Close() error
       HealthCheck(ctx context.Context) error
       BeginTransaction(ctx context.Context) (*gorm.DB, error)
   }
   
   type UserRepository interface {
       CreateUser(ctx context.Context, user *User) error
       GetUserByUsername(ctx context.Context, username string) (*User, error)
       // ... other methods
   }
   ```

2. **Context-aware Services:**
   ```go
   func (r *PostgresUserRepository) CreateUser(ctx context.Context, user *User) error {
       log.Trace().Ctx(ctx).Str("username", user.Username).Msg("Creating user")
       return r.db.WithContext(ctx).Create(user).Error
   }
   ```

3. **Configuration Integration:**
   ```go
   type DatabaseConfig struct {
       Type        string        `mapstructure:"type"`  // sqlite, postgres, auto
       Host        string        `mapstructure:"host"`
       Port        int           `mapstructure:"port"`
       User        string        `mapstructure:"user"`
       Password    string        `mapstructure:"password"`
       Name        string        `mapstructure:"name"`
       SSLMode     string        `mapstructure:"ssl_mode"`
       MaxOpenConns int         `mapstructure:"max_open_conns"`
       MaxIdleConns int         `mapstructure:"max_idle_conns"`
       ConnMaxLifetime time.Duration `mapstructure:"conn_max_lifetime"`
   }
   ```

4. **Graceful Shutdown Integration:**
   ```go
   func (s *Server) Shutdown(ctx context.Context) error {
       // Close database connections gracefully
       if s.userRepo != nil {
           if err := s.userRepo.Close(); err != nil {
               log.Error().Err(err).Msg("User repository shutdown failed")
               // Continue shutdown even if database fails
           }
       }
       
       // The readiness endpoint already handles shutdown detection via s.readyCtx
       // No need for atomic operations - the context-based approach is cleaner
       
       // Continue with existing HTTP server shutdown
       return s.httpServer.Shutdown(ctx)
   }
   ```

5. **Readiness Endpoint Integration:**
   ```go
   func (s *Server) handleReadiness(w http.ResponseWriter, r *http.Request) {
       // Check database health if using persistent database
       if s.config.GetDatabaseType() != "sqlite" {
           if err := s.userRepo.CheckDatabaseHealth(r.Context()); err != nil {
               log.Warn().Err(err).Msg("Database health check failed")
               s.writeJSONResponse(w, http.StatusServiceUnavailable, map[string]interface{}{
                   "ready": false,
                   "reason": "database_unhealthy",
                   "error": err.Error(),
               })
               return
           }
       }
       
       // Existing readiness logic
       select {
       case <-s.readyCtx.Done():
           s.writeJSONResponse(w, http.StatusServiceUnavailable, map[string]interface{}{
               "ready": false,
               "reason": "shutting_down",
           })
       default:
           s.writeJSONResponse(w, http.StatusOK, map[string]interface{}{
               "ready": true,
           })
       }
   }
   ```

### Implementation Strategy

#### Phase 1: PostgreSQL Repository Implementation

1. **Replace Dependencies:**
   ```bash
   # Remove SQLite dependencies
   go get gorm.io/driver/postgres
   go get github.com/lib/pq  # PostgreSQL driver
   go mod tidy  # Clean up unused dependencies
   ```

2. **Create PostgreSQL Repository:**
   - `pkg/user/postgres_repository.go` - PostgreSQL implementation
   - Implement `UserRepository` interface directly
   - Add PostgreSQL-specific connection management

3. **Docker Setup:**
   - Create `docker-compose.yml` with PostgreSQL 16 service (current stable version)
   - Add initialization scripts for development
   - Configure health checks and monitoring
   - Use Alpine-based image for smaller footprint

4. **Configuration:**
   - Add `DatabaseConfig` to existing config structure
   - Environment variables with `DLC_` prefix
   - Connection validation and health checking

#### Phase 2: Server Integration

1. **Update Server Initialization:**
   - Modify `initializeUserServices()` in `pkg/server/server.go`
   - Replace SQLite repository with PostgreSQL repository
   - Update error handling and logging

2. **Remove SQLite Code:**
   - Delete `pkg/user/sqlite_repository.go`
   - Clean up any SQLite-specific references
   - Update imports and dependencies

3. **Enhance Health Checks:**
   - Add database health check to readiness endpoint
   - Implement connection pooling monitoring
   - Add startup health validation

#### Phase 3: Testing & Validation

1. **BDD Test Integration:**
   - Updated test server configuration with PostgreSQL settings
   - Automatic PostgreSQL container startup in test script
   - Health checks for database readiness before tests
   - **Separate BDD test database** (`dance_lessons_coach_bdd_test`)
   - Complete isolation from development/production databases

2. **Test Script Enhancement:**
   - `scripts/run-bdd-tests.sh` now starts PostgreSQL if needed
   - **Automatic BDD database creation** using `createdb` command
   - Checks for existing BDD database before creating
   - Waits for database readiness before running tests
   - Proper error handling and timeout management
   - Reuses existing container if already running

3. **Database Isolation Strategy:**
   - **Development**: `dance_lessons_coach` (config.yaml)
   - **BDD Tests**: `dance_lessons_coach_bdd_test` (automatically created)
   - **Production**: Custom name per environment
   - **Manual Testing**: Developers can use development database

3. **Unit & Integration Tests:**
   - Repository method testing with PostgreSQL
   - Transaction and error case testing
   - Performance benchmarks
   - Connection failure scenarios

4. **Graceful Shutdown Testing:**
   - Database connection cleanup during shutdown
   - Readiness endpoint behavior during shutdown
   - Connection pool behavior under stress

#### Phase 4: Documentation & Finalization

1. **Documentation Updates:**
   - Update AGENTS.md with PostgreSQL setup instructions
   - Add database configuration guide
   - Create development setup documentation
   - Update BDD test documentation

2. **Cleanup:**
   - Remove all SQLite references from code
   - Update go.mod and go.sum
   - Verify no unused imports or dependencies

3. **Production Readiness:**
   - Add database health monitoring
   - Configure connection pooling for production
   - Add environment-specific configurations

1. **User Model & Repository:**
   - `pkg/user/models.go` - GORM user model
   - `pkg/user/repository.go` - GORM implementation
   - `pkg/user/repository_mock.go` - Mock for testing

2. **Database Integration:**
   - Implement `UserRepository` interface
   - Add transaction support
   - Implement health checks

3. **Testing Setup:**
   - Test container for PostgreSQL
   - Integration test suite
   - Mock-based unit tests

#### Phase 3: Service Integration

1. **Auth Service Integration:**
   - Update auth service to use user repository
   - Implement JWT token persistence
   - Add session management

2. **Greet Service Extension:**
   - Add greet history tracking
   - Implement user-specific greetings
   - Add database logging

3. **API Endpoints:**
   - Health check endpoint: `GET /api/health/db`
   - Database metrics endpoint: `GET /api/metrics/db`

#### Phase 4: Testing & Validation

1. **BDD Test Integration:**
   - Temporary test database setup
   - Test container for PostgreSQL
   - Clean database between scenarios
   - Test data isolation

2. **Unit & Integration Tests:**
   - Repository method testing
   - Transaction testing
   - Error case testing
   - Performance benchmarks

3. **Fallback Testing:**
   - SQLite fallback scenarios
   - Connection failure handling
   - Graceful degradation

## Consequences

### Positive

1. **Data Persistence:** User accounts and application data properly persisted
2. **Production Ready:** PostgreSQL is enterprise-grade database
3. **Scalability:** Better concurrent connection handling
4. **Simplified Architecture:** Direct PostgreSQL implementation without migration complexity
5. **Clean Codebase:** No legacy SQLite code or dual implementation
6. **Future-Proof:** Foundation for all future data-driven features

### Negative

1. **Dependency Changes:** Replacing SQLite with PostgreSQL dependencies
2. **Operational Overhead:** Database container management
3. **Learning Curve:** PostgreSQL-specific features and optimization
4. **Testing Requirements:** Comprehensive testing needed for new implementation

### Neutral

1. **Code Changes:** Repository implementation replacement
2. **Configuration Updates:** New database configuration structure
3. **Development Workflow:** Docker-based database for local development

## Alternatives Considered

### Alternative 1: Keep SQLite with File Persistence
- **Pros:** Simple, no new dependencies, works for small-scale
- **Cons:** Not production-grade, limited concurrency, file-based limitations
- **Rejected:** Doesn't meet long-term production requirements

### Alternative 2: Dual Implementation with Fallback
- **Pros:** Smooth migration path, backward compatibility
- **Cons:** Complex codebase, testing overhead, maintenance burden
- **Rejected:** Unnecessary complexity since no existing data or users

### Alternative 2: MySQL
- **Pros:** Widely used, good community support
- **Cons:** Different ecosystem, licensing concerns
- **Rejected:** PostgreSQL better fits our needs

### Alternative 3: MongoDB
- **Pros:** Flexible schema, document-oriented
- **Cons:** NoSQL approach, different query patterns
- **Rejected:** Relational data better suits our model

### Alternative 4: Pure SQL (no ORM)
- **Pros:** No ORM overhead, direct control
- **Cons:** More boilerplate, manual query building
- **Rejected:** GORM provides good balance

## Graceful Shutdown & Readiness Integration

### Database Connection Lifecycle

The PostgreSQL integration must properly handle the server lifecycle:

1. **Startup Sequence:**
   - Initialize database connections
   - Run health check
   - Set readiness to true only if database is healthy
   - Log connection details at trace level

2. **Runtime Operation:**
   - Monitor database connection health
   - Handle connection failures gracefully
   - Implement connection retry logic
   - Log connection issues appropriately

3. **Shutdown Sequence:**
   - Set readiness to false immediately
   - Close all database connections
   - Wait for in-flight queries to complete
   - Handle shutdown timeouts gracefully
   - Log shutdown progress

### Readiness Endpoint Enhancement

The existing `/api/ready` endpoint already has the correct nested structure for service health checks. We'll enhance it to include PostgreSQL database health:

**Current Structure:**
```json
{
  "ready": true,
  "connections": {
    "database": {
      "status": "healthy"
    }
  }
}
```

**Health Check Logic:**
```go
func (r *PostgresUserRepository) CheckDatabaseHealth(ctx context.Context) error {
    // Simple query to test connectivity
    var count int64
    result := r.db.WithContext(ctx).Model(&User{}).Count(&count)
    if result.Error != nil {
        return fmt.Errorf("database health check failed: %w", result.Error)
    }
    return nil
}
```

**Readiness Response States:**
- **Healthy:** `{"ready": true, "connections": {"database": {"status": "healthy"}}}`
- **Database Unhealthy:** `{"ready": false, "reason": "database_unhealthy", "connections": {"database": {"status": "unhealthy", "error": "connection refused"}}}`
- **Shutting Down:** `{"ready": false, "reason": "server_shutting_down", "connections": {"database": "not_checked"}}`
- **Not Configured:** `{"ready": true, "connections": {"database": {"status": "not_configured"}}}` (for SQLite mode)

### Connection Pool Management

Proper connection pool configuration for graceful shutdown:

```go
// In database initialization
sqlDB, err := db.DB()
if err != nil {
    return nil, fmt.Errorf("failed to get SQL DB: %w", err)
}

// Configure connection pool
sqlDB.SetMaxOpenConns(cfg.MaxOpenConns)
sqlDB.SetMaxIdleConns(cfg.MaxIdleConns)
sqlDB.SetConnMaxLifetime(cfg.ConnMaxLifetime)

// Configure graceful connection handling
sqlDB.SetConnMaxIdleTime(time.Minute * 5)
sqlDB.SetConnMaxLifetime(time.Hour * 1)
```

### Shutdown Timeout Handling

```go
func (s *Server) Shutdown(ctx context.Context) error {
    // Create shutdown context with timeout
    shutdownCtx, cancel := context.WithTimeout(ctx, s.config.GetShutdownTimeout())
    defer cancel()
    
    // Close database connections with timeout
    done := make(chan struct{})
    go func() {
        if s.userRepo != nil {
            if err := s.userRepo.Close(); err != nil {
                log.Error().Err(err).Msg("Database shutdown error")
            }
        }
        close(done)
    }()
    
    select {
    case <-done:
        log.Trace().Msg("Database shutdown completed")
    case <-shutdownCtx.Done():
        log.Warn().Msg("Database shutdown timed out, forcing closure")
    }
    
    return s.httpServer.Shutdown(shutdownCtx)
}
```

## Alignment with Existing Architecture

This implementation builds upon completed phases:

- **Phase 1-3:** Uses Go 1.26.1, Chi router, Zerolog, interface-based design
- **Phase 5:** Extends Viper configuration management
- **Phase 6:** Integrates with graceful shutdown patterns and readiness endpoints
- **Phase 7:** Maintains OpenTelemetry compatibility
- **Phase 8:** Follows existing build system patterns
- **Phase 9:** Preserves trace-level logging approach
- **Phase 18:** Supports user management system

## Backward Compatibility

The implementation maintains full backward compatibility:

1. **API Endpoints:** Existing endpoints unchanged
2. **Configuration:** All existing config options preserved
3. **Logging:** Maintains existing Zerolog integration
4. **Telemetry:** OpenTelemetry continues to work
5. **Error Handling:** Consistent error patterns

## Success Metrics

1. **Reliability:** 99.9% database uptime
2. **Performance:** <100ms average query time
3. **Scalability:** Support 1000+ concurrent connections
4. **Data Integrity:** Zero data corruption incidents
5. **Adoption:** All new features use database storage

## Open Questions

1. What should be the connection pool size for production?
2. Should we implement read replicas for scaling?
3. What backup strategy should we implement?
4. Should we add database connection health metrics?
5. What query timeout should we set for production?

## Database Cleanup Strategy

### Decision: Raw SQL Cleanup Between Scenarios

**Approach:** Use raw SQL DELETE statements with `SET CONSTRAINTS ALL DEFERRED` to clean up database between test scenarios

**Rationale:**
- **Black Box Principle:** BDD tests should not depend on implementation details
- **Foreign Key Safety:** `SET CONSTRAINTS ALL DEFERRED` allows proper handling of constraints (PostgreSQL docs: https://www.postgresql.org/docs/current/sql-set-constraints.html)
- **Migration Compatibility:** Works regardless of schema changes
- **Transaction Safety:** Uses explicit transactions with proper rollback handling

**Alternatives Considered:**
1. **Repository-based cleanup** - Rejected: Violates black box principle
2. **Transaction rollback** - Rejected: Complex with nested transactions
3. **Recreate database** - Rejected: Too slow for frequent test runs
4. **Separate test database** - Chosen: Combined with SQL cleanup

### Implementation Details

**Cleanup Process:**
1. **Disable constraints temporarily:** `SET CONSTRAINTS ALL DEFERRED`
2. **Query all tables:** From `information_schema.tables`
3. **Delete in reverse order:** Handle foreign key dependencies
4. **Reset sequences:** `ALTER SEQUENCE ... RESTART WITH 1`

**Execution Timing:**
- **AfterSuite:** Full cleanup after all scenarios
- **Between Scenarios:** Individual scenario cleanup (future enhancement)

**Benefits:**
- ✅ **Fast execution:** Milliseconds vs seconds for recreation
- ✅ **Reliable:** Handles schema changes automatically
- ✅ **Isolated:** Each test gets clean state
- ✅ **Maintainable:** No dependency on ORM or repositories

### Temporary Database Approach

For BDD testing, we'll use temporary PostgreSQL databases to ensure:
- **Isolation:** Each test run gets a clean database
- **Reproducibility:** Consistent starting state
- **Performance:** No interference between tests
- **CI/CD Compatibility:** Works in containerized environments

### Implementation Plan

1. **Test Container Setup:**
   ```bash
   # Use testcontainers-go for PostgreSQL
   go get github.com/testcontainers/testcontainers-go
   go get github.com/testcontainers/testcontainers-go/modules/postgres
   ```

2. **BDD Test Configuration:**
   - Create `features/support/database.go`
   - Implement `BeforeScenario` and `AfterScenario` hooks
   - Automatic database cleanup
   - Integrate with existing test suite structure

3. **Test Data Management:**
   - Schema migration before each scenario
   - Transaction rollback for data isolation
   - Seed data for specific scenarios
   - Match existing BDD test patterns

4. **Configuration:**
   ```yaml
   # config.test.yaml
   database:
     host: "localhost"
     port: 5433  # Different from dev port
     name: "dance_lessons_coach_test"
     user: "test_user"
     password: "test_password"
   ```

### Example Test Setup

```go
// features/support/database.go
func BeforeScenario(ctx context.Context, sc *godog.Scenario) (context.Context, error) {
    // Start PostgreSQL container
    postgresContainer, err := postgres.RunContainer(ctx,
        testcontainers.WithImage("postgres:15-alpine"),
        postgres.WithDatabase("test_db"),
        postgres.WithUsername("test_user"),
        postgres.WithPassword("test_password"),
    )
    if err != nil {
        return ctx, err
    }
    
    // Get connection string
    connStr, err := postgresContainer.ConnectionString(ctx, "sslmode=disable")
    if err != nil {
        return ctx, err
    }
    
    // Store in context for test
    ctx = context.WithValue(ctx, "postgres_container", postgresContainer)
    ctx = context.WithValue(ctx, "postgres_conn_str", connStr)
    
    // Initialize user repository with test database
    config := config.GetTestConfig()
    config.Database.DSN = connStr
    
    repo, err := user.NewPostgresRepository(config)
    if err != nil {
        return ctx, err
    }
    
    // Store repository in context for scenario steps
    ctx = context.WithValue(ctx, "user_repository", repo)
    
    return ctx, nil
}

func AfterScenario(ctx context.Context, sc *godog.Scenario, err error) (context.Context, error) {
    // Clean up repository
    if repo, ok := ctx.Value("user_repository").(user.UserRepository); ok {
        repo.Close()
    }
    
    // Terminate PostgreSQL container
    if container, ok := ctx.Value("postgres_container").(testcontainers.Container); ok {
        if terminateErr := container.Terminate(ctx); terminateErr != nil {
            log.Error().Err(terminateErr).Msg("Failed to terminate PostgreSQL container")
        }
    }
    return ctx, err
}
```

## Future Considerations

### Immediate Next Steps (Post-Migration)
1. **CI/CD Integration:** Add PostgreSQL to CI pipeline
2. **Performance Tuning:** Query optimization
3. **Monitoring:** Database health metrics
4. **Backup Strategy:** Regular database backups

### Long-Term Enhancements
1. **Database Sharding:** For horizontal scaling
2. **Read Replicas:** For read-heavy workloads
3. **Advanced Caching:** Redis integration
4. **Database Monitoring:** Prometheus exporter
5. **Backup Automation:** Regular backup scheduling
6. **Query Optimization:** Performance tuning

## References

- [GORM Documentation](https://gorm.io/)
- [PostgreSQL 16 Documentation](https://www.postgresql.org/docs/16/)
- [PostgreSQL Latest Version](https://www.postgresql.org/)
- [GORM + PostgreSQL Guide](https://gorm.io/docs/connecting_to_the_database.html#PostgreSQL)
- [Database Connection Pooling](https://www.alexedwards.net/blog/configuring-sqldb)

**Approved by:** [Product Owner]
**Approval Date:** [To be determined]
**Implementation Target:** Q2 2024