diff --git a/adr/0025-bdd-scenario-isolation-strategies.md b/adr/0025-bdd-scenario-isolation-strategies.md new file mode 100644 index 0000000..d1dfadb --- /dev/null +++ b/adr/0025-bdd-scenario-isolation-strategies.md @@ -0,0 +1,240 @@ +# ADR 0025: BDD Scenario Isolation Strategies + +## Status +**Proposed** 🟡 + +## Context + +As our BDD test suite grows, we're encountering **test pollution** issues where scenarios interfere with each other through shared state. This is particularly problematic for: + +1. **Database state**: Scenarios create users, JWT secrets, config entries that persist across scenarios +2. **JWT secret rotation**: Multiple secrets accumulate, affecting subsequent scenario authentication +3. **Config file modifications**: Feature flag changes persist between tests +4. **Gherkin Background steps**: Data set up in Background is visible to all scenarios in the feature + +Our current approach clears database tables after each scenario, but this has **race condition vulnerabilities** with concurrent scenario execution. + +### Gherkin Background Consideration + +Crucially, Gherkin's `Background` section runs **before each scenario** in a feature, not once before all scenarios. This means: + +```gherkin +Feature: User registration + Background: + Given the database is empty + And a default admin user exists + + Scenario: Register new user + When I register user "alice" + Then user "alice" should exist + + Scenario: Register duplicate user + When I register user "alice" + Then I should see error "user already exists" +``` + +The second scenario fails because Background creates data that persists, and the first scenario's data isn't cleaned up. Background steps are re-executed before each scenario. + +## Decision Drivers + +* **Isolation**: Each scenario must start with a clean slate +* **Performance**: Cleanup must be fast enough for CI/CD pipelines +* **Concurrency**: Must work with parallel scenario execution +* **Compatibility**: Must work with Gherkin Background steps +* **Maintainability**: Solution should be simple to understand and debug + +## Considered Options + +### Option 1: Transaction Rollback (Rejected ❌) + +Wrap each scenario in a database transaction, rollback at the end. + +```go +BeforeScenario: BEGIN; +AfterScenario: ROLLBACK; +``` + +**Pros:** +- Simple implementation +- Fast - transaction rollback is nearly instant +- No data cleanup needed + +**Cons:** +- ❌ **Fails if scenario commits**: Nested transaction problem - `COMMIT` inside scenario releases the transaction, parent `ROLLBACK` has no effect +- Cannot handle non-database state (JWT secrets in memory, config files) +- Doesn't solve JWT secret pollution + +**Verdict: Not viable** - Too many scenarios use database transactions internally. + +--- + +### Option 2: Clear Tables in Public Schema (Current ✅/⚠️) + +Delete all rows from all tables after each scenario. + +```go +AfterScenario: DELETE FROM table1; DELETE FROM table2; ... +``` + +**Pros:** +- Currently implemented +- Works with any scenario code +- Handles database state + +**Cons:** +- ⚠️ **Race conditions**: Concurrent scenarios can interleave - Scenario A deletes data while Scenario B is still using it +- ⚠️ **Slow**: Must delete from all tables, reset sequences +- ❌ **Misses in-memory state**: JWT secrets, config changes persist +- ❌ **Doesn't handle Background**: Background data is shared across scenarios + +**Verdict: Partially adequate** - Works for sequential execution but has parallel execution issues. + +--- + +### Option 3: Schema-per-Scenario (Recommended ✅) + +Create a unique PostgreSQL schema for each scenario, drop it after. + +```go +BeforeScenario: + schema := "test_" + sha256(scenario.Name)[:8] + CREATE SCHEMA schema; + SET search_path = schema, public; + +AfterScenario: + DROP SCHEMA schema CASCADE; +``` + +**Pros:** +- ✅ **True isolation**: Each scenario has its own database namespace +- ✅ **Works with transactions**: Scenario can commit freely - entire schema is dropped +- ✅ **Works with Background**: Background runs in scenario's schema, data is isolated +- ✅ **Fast**: Schema drop is instant (just metadata deletion) +- ✅ **Handles concurrent scenarios**: Different schemas = no conflicts + +**Cons:** +- Requires `CREATE/DROP SCHEMA` database privileges in test environment +- Some ORMs may hardcode `public` schema - need to use `SET search_path` carefully +- Test DB must allow many schemas (typically fine for PostgreSQL) +- We need to handle `search_path` in connection pooling (each scenario needs its own connection) + +**Implementation notes:** +- Use `Luego` (PostgreSQL schema prefix) approach: `test_{hash}` +- Hash: `sha256(feature_name + scenario_name)[:8]` for consistency across runs +- Execute Background steps in the scenario's schema context +- Set `search_path` at the connection level, not globally + +--- + +### Option 4: Database-per-Feature ⚠️ + +Create a separate database for each feature file. + +```go +BeforeFeature: CREATE DATABASE feature_auth; +AfterFeature: DROP DATABASE feature_auth; +``` + +**Pros:** +- Strong isolation between features +- Simple implementation + +**Cons:** +- ❌ **Doesn't isolate scenarios within a feature** - Background data shared across scenarios +- Database creation is slower than schema creation +- Harder to manage in CI (more databases to create/cleanup) +- Still need table clearing between scenarios within a feature + +**Verdict: Insufficient** - Doesn't solve intra-feature pollution. + +--- + +### Option 5: Schema-per-Feature + Table Clearing per Scenario ⚠️ + +Create one schema per feature, clear tables between scenarios. + +```go +BeforeFeature: CREATE SCHEMA feature_auth; +AfterFeature: DROP SCHEMA feature_auth; +AfterScenario: DELETE FROM all_tables; +``` + +**Pros:** +- Isolates features from each other +- Simpler than per-scenario schemas + +**Cons:** +- ❌ **Scenarios within a feature share state** - Background data persists +- Still has race conditions with concurrent scenarios in same feature +- Requires table clearing overhead + +**Verdict: Better than current but still has issues**. + +--- + +## Decision Outcome + +**Chosen option: Schema-per-Scenario (Option 3)** + +We will implement schema-per-scenario because it: + +1. Provides **true isolation** for all database state +2. **Works with Gherkin Background** - Background runs in each scenario's schema +3. **Handles concurrent execution** - No race conditions +4. **Works with scenario transactions** - Scenarios can commit freely +5. Is **fast** - Schema operations are cheap + +### Implementation Plan + +**Phase 1: Foundation** +- Add scenario-aware schema management to test server +- Implement schema creation/drop in BeforeScenario/AfterScenario hooks +- Handle `search_path` configuration for each scenario's database connection + +**Phase 2: Connection Pooling** +- Configure connection pool to respect per-scenario `search_path` +- Each scenario gets isolated connections + +**Phase 3: In-Memory State** +- Extend cleanup to handle JWT secrets (already implemented in suite.go) +- Add config reset capability + +**Phase 4: Validation** +- Run full test suite to identify ORM/schema issues +- Fix any hardcoded `public` schema references + +### Schema Naming Convention + +``` +Schema name: test_{sha256(feature + scenario)[:8]} +``` + +Example: +- Feature: `auth`, Scenario: `Successful user authentication` +- Hash: `sha256("auth_Successful user authentication")[:8]` = `a3f7b2c1` +- Schema: `test_a3f7b2c1` + +Benefits: +- Unique per scenario +- Consistent across test runs (same scenario = same schema) +- Short (8 chars + prefix = 14 chars max) +- Identifiable for debugging + +## Pros and Cons Summary + +| Aspect | Schema-per-Scenario | Current (Clear Tables) | Transaction Rollback | +|--------|---------------------|----------------------|-------------------| +| Isolation | ✅ Strong | ⚠️ Medium | ❌ Weak | +| Works with Background | ✅ Yes | ⚠️ Partial | ❌ No | +| Concurrency safe | ✅ Yes | ❌ No | ❌ No | +| Works with TX | ✅ Yes | ✅ Yes | ❌ No | +| Speed | ✅ Fast | ⚠️ Slow | ✅ Fast | +| DB privileges | ⚠️ Needs CREATE | ✅ None | ✅ None | +| Complexity | ⚠️ Medium | ✅ Low | ✅ Low | + +## Links + +* [ADR 0008: BDD Testing](adr/0008-bdd-testing.md) - Original BDD adoption decision +* [ADR 0024: BDD Test Organization and Isolation](adr/0024-bdd-test-organization-and-isolation.md) - Feature isolation strategy +* [Godog Documentation](https://github.com/cucumber/godog) - BDD framework specifics +* [PostgreSQL Schemas](https://www.postgresql.org/docs/current/ddl-schemas.html) - Schema management diff --git a/adr/README.md b/adr/README.md index af8470e..3cd2a3f 100644 --- a/adr/README.md +++ b/adr/README.md @@ -30,6 +30,7 @@ This directory contains Architecture Decision Records (ADRs) for the dance-lesso | 0022 | Rate Limiting and Cache Strategy | ✅ Accepted | | 0023 | Config Hot Reloading | 🟡 Proposed | | 0024 | BDD Test Organization and Isolation | 🟡 Proposed | +| 0025 | BDD Scenario Isolation Strategies | 🟡 Proposed | ## What is an ADR? @@ -111,6 +112,7 @@ Chosen option: "[Option 1]" because [justification] * [0021-jwt-secret-retention-policy.md](0021-jwt-secret-retention-policy.md) - JWT Secret Retention Policy with Configurable TTL and Retention * [0022-rate-limiting-cache-strategy.md](0022-rate-limiting-cache-strategy.md) - Rate Limiting and Cache Strategy with Multi-Phase Implementation * [0023-config-hot-reloading.md](0023-config-hot-reloading.md) - Config Hot Reloading Strategy +* [0025-bdd-scenario-isolation-strategies.md](0025-bdd-scenario-isolation-strategies.md) - Schema-per-scenario isolation for BDD tests ## How to Add a New ADR diff --git a/pkg/bdd/suite.go b/pkg/bdd/suite.go index 355f115..f031d9f 100644 --- a/pkg/bdd/suite.go +++ b/pkg/bdd/suite.go @@ -20,6 +20,11 @@ func isCleanupLoggingEnabled() bool { return os.Getenv("BDD_ENABLE_CLEANUP_LOGS") == "true" } +// isSchemaIsolationEnabled returns true if BDD_SCHEMA_ISOLATION environment variable is set to "true" +func isSchemaIsolationEnabled() bool { + return os.Getenv("BDD_SCHEMA_ISOLATION") == "true" +} + func InitializeTestSuite(ctx *godog.TestSuiteContext) { ctx.BeforeSuite(func() { // Small delay to ensure any previous server instances are fully cleaned up @@ -37,8 +42,24 @@ func InitializeTestSuite(ctx *godog.TestSuiteContext) { sc := ctx.ScenarioContext() sc.BeforeScenario(func(s *godog.Scenario) { + // Get feature name from context or environment + feature := os.Getenv("FEATURE") + if feature == "" { + // Try to extract feature from scenario tags or path + feature = "unknown" + } + if isCleanupLoggingEnabled() { - log.Info().Str("scenario", s.Name).Msg("CLEANUP: Scenario starting") + log.Info().Str("feature", feature).Str("scenario", s.Name).Msg("CLEANUP: Scenario starting") + } + + // Setup schema isolation if enabled + if sharedServer != nil { + if err := sharedServer.SetupScenarioSchema(feature, s.Name); err != nil { + if isCleanupLoggingEnabled() { + log.Warn().Err(err).Msg("ISOLATION: Failed to setup scenario schema") + } + } } }) @@ -48,17 +69,27 @@ func InitializeTestSuite(ctx *godog.TestSuiteContext) { } if sharedServer != nil { + // Teardown schema isolation if enabled + if teardownErr := sharedServer.TeardownScenarioSchema(); teardownErr != nil { + if isCleanupLoggingEnabled() { + log.Warn().Err(teardownErr).Msg("ISOLATION: Failed to teardown scenario schema") + } + } + // Reset JWT secrets after every scenario to prevent pollution + // Note: This is still needed for in-memory state even with schema isolation if resetErr := sharedServer.ResetJWTSecrets(); resetErr != nil { if isCleanupLoggingEnabled() { log.Warn().Err(resetErr).Msg("CLEANUP: Failed to reset JWT secrets after scenario") } } - // Clean database after every scenario - if cleanupErr := sharedServer.CleanupDatabase(); cleanupErr != nil { - if isCleanupLoggingEnabled() { - log.Warn().Err(cleanupErr).Msg("CLEANUP: Failed to cleanup database after scenario") + // Clean database after every scenario (only if schema isolation is disabled) + if !isSchemaIsolationEnabled() { + if cleanupErr := sharedServer.CleanupDatabase(); cleanupErr != nil { + if isCleanupLoggingEnabled() { + log.Warn().Err(cleanupErr).Msg("CLEANUP: Failed to cleanup database after scenario") + } } } } diff --git a/pkg/bdd/testserver/server.go b/pkg/bdd/testserver/server.go index 1895356..f93aa79 100644 --- a/pkg/bdd/testserver/server.go +++ b/pkg/bdd/testserver/server.go @@ -2,13 +2,16 @@ package testserver import ( "context" + "crypto/sha256" "database/sql" + "encoding/hex" "fmt" "math/rand" "net/http" "os" "strconv" "strings" + "sync" "time" "dance-lessons-coach/pkg/config" @@ -25,6 +28,30 @@ func isCleanupLoggingEnabled() bool { return os.Getenv("BDD_ENABLE_CLEANUP_LOGS") == "true" } +// isSchemaIsolationEnabled returns true if BDD_SCHEMA_ISOLATION environment variable is set to "true" +func isSchemaIsolationEnabled() bool { + return os.Getenv("BDD_SCHEMA_ISOLATION") == "true" +} + +// generateSchemaName creates a unique schema name for a scenario +// Format: test_{sha256(feature_scenario)[:8]} +func generateSchemaName(feature, scenario string) string { + hash := sha256.Sum256([]byte(feature + ":" + scenario)) + hashStr := hex.EncodeToString(hash[:]) + return "test_" + hashStr[:8] +} + +type Server struct { + httpServer *http.Server + port int + baseURL string + db *sql.DB + authService user.AuthService // Reference to auth service for cleanup + schemaMutex sync.Mutex // Protects schema operations + currentSchema string // Current schema being used + originalSearchPath string // Original search_path to restore +} + // getDatabaseHost returns the database host from environment variable or defaults to localhost func getDatabaseHost() string { host := os.Getenv("DLC_DATABASE_HOST") @@ -45,14 +72,6 @@ func getDatabasePort() int { return port } -type Server struct { - httpServer *http.Server - port int - baseURL string - db *sql.DB - authService user.AuthService // Reference to auth service for cleanup -} - func init() { // Seed the random number generator for random port selection rand.Seed(time.Now().UnixNano()) @@ -95,7 +114,9 @@ func NewServer() *Server { } return &Server{ - port: port, + port: port, + currentSchema: "public", + originalSearchPath: "public", } } @@ -430,6 +451,96 @@ func (s *Server) CleanupDatabase() error { return nil } +// SetupScenarioSchema creates and activates a unique schema for the scenario +func (s *Server) SetupScenarioSchema(feature, scenario string) error { + if !isSchemaIsolationEnabled() { + if isCleanupLoggingEnabled() { + log.Info().Str("feature", feature).Str("scenario", scenario).Msg("ISOLATION: Schema isolation disabled, using public schema") + } + return nil + } + + schemaName := generateSchemaName(feature, scenario) + s.schemaMutex.Lock() + defer s.schemaMutex.Unlock() + + // Store original search path if not already stored + if s.originalSearchPath == "" { + var err error + s.originalSearchPath, err = s.getCurrentSearchPath() + if err != nil { + log.Warn().Err(err).Msg("ISOLATION: Failed to get current search_path") + s.originalSearchPath = "public" + } + } + + // Create the schema + createSQL := fmt.Sprintf("CREATE SCHEMA IF NOT EXISTS %s", schemaName) + if _, err := s.db.Exec(createSQL); err != nil { + return fmt.Errorf("failed to create schema %s: %w", schemaName, err) + } + + // Set search path to use the new schema + searchPathSQL := fmt.Sprintf("SET search_path = %s, %s", schemaName, s.originalSearchPath) + if _, err := s.db.Exec(searchPathSQL); err != nil { + return fmt.Errorf("failed to set search_path: %w", err) + } + + s.currentSchema = schemaName + + if isCleanupLoggingEnabled() { + log.Info().Str("feature", feature).Str("scenario", scenario).Str("schema", schemaName).Msg("ISOLATION: Created and activated schema") + } + + return nil +} + +// TeardownScenarioSchema drops the scenario's schema and restores search path +func (s *Server) TeardownScenarioSchema() error { + if !isSchemaIsolationEnabled() { + return nil + } + + s.schemaMutex.Lock() + defer s.schemaMutex.Unlock() + + if s.currentSchema == "" || s.currentSchema == "public" { + if isCleanupLoggingEnabled() { + log.Info().Msg("ISOLATION: No custom schema to teardown") + } + return nil + } + + schemaName := s.currentSchema + + // Restore original search path + restoreSQL := fmt.Sprintf("SET search_path = %s", s.originalSearchPath) + if _, err := s.db.Exec(restoreSQL); err != nil { + log.Warn().Err(err).Str("original", s.originalSearchPath).Msg("ISOLATION: Failed to restore search_path") + } + + // Drop the schema - CASCADE ensures dependent objects are also dropped + dropSQL := fmt.Sprintf("DROP SCHEMA IF EXISTS %s CASCADE", schemaName) + if _, err := s.db.Exec(dropSQL); err != nil { + return fmt.Errorf("failed to drop schema %s: %w", schemaName, err) + } + + s.currentSchema = "" + + if isCleanupLoggingEnabled() { + log.Info().Str("schema", schemaName).Msg("ISOLATION: Dropped schema") + } + + return nil +} + +// getCurrentSearchPath retrieves the current search_path setting +func (s *Server) getCurrentSearchPath() (string, error) { + var searchPath string + err := s.db.QueryRow("SHOW search_path").Scan(&searchPath) + return searchPath, err +} + // CloseDatabase closes the database connection func (s *Server) CloseDatabase() error { if s.db != nil {