🐛 fix(bdd): revert PR #26 schema isolation + add cache flush + sequential CI tests #28

Merged
arcodange merged 1 commits from fix/bdd-cache-flush-and-revert-schema-isolation into main 2026-05-03 16:28:58 +02:00
Owner

Summary

PR #26 added BDD_SCHEMA_ISOLATION=true to CI, but this creates per-scenario schemas WITHOUT running table migrations on them, causing HTTP 500 errors on user registration. This PR reverts that and adopts a simpler approach.

Root cause analysis

Investigation locally:

  • Run AuthBDD with BDD_SCHEMA_ISOLATION=true (PR #26 mode): 0/5 runs PASS, 9 scenarios fail per run with 500 errors
  • Run AuthBDD without the flag: 5/5 PASS, deterministic
  • Reason: SetupScenarioSchema creates a fresh PostgreSQL schema and sets search_path, but the production server's PostgresRepository runs migrations once at startup (against the public schema). The new per-scenario schemas have NO tables.

Fix

  1. Revert PR #26 in .gitea/workflows/ci-cd.yaml (remove export BDD_SCHEMA_ISOLATION=true)
  2. Sequential test packages in scripts/run-bdd-tests.sh (-p 1) — prevents DB contention when multiple feature packages run their tests concurrently against the same Postgres DB
  3. Cache flush AfterScenario (defensive infra by Mistral Vibe agent):
    • pkg/server/server.go: expose GetCacheService()
    • pkg/bdd/testserver/server.go: store cacheService, add FlushCache() method
    • pkg/bdd/testserver/state_tracer.go: TraceStateCacheOperation()
    • pkg/bdd/suite.go: AfterScenario hook calls FlushCache() after ResetJWTSecrets()

Validation

  • AuthBDD only (5 consecutive runs): 5/5 PASS (was 0/5 with broken schema isolation)
  • Full features/ via run-bdd-tests.sh: ALL packages PASS (auth, config, greet, health, jwt, root)

Trade-off

This forces sequential package execution. Parallel BDD is deferred as architectural follow-up (T12) because it requires:

  • Running migrations per-schema (not just once at server start)
  • Dedicated DB connection pool per scenario (not shared via SET search_path)
  • Coordinated cleanup that doesn't race

This PR's goal is CI green TODAY. Parallel scaling is the next architectural step.

Files modified

  • .gitea/workflows/ci-cd.yaml: revert BDD_SCHEMA_ISOLATION export
  • pkg/server/server.go: +GetCacheService
  • pkg/bdd/testserver/server.go: +cacheService, +FlushCache
  • pkg/bdd/testserver/state_tracer.go: +TraceStateCacheOperation
  • pkg/bdd/suite.go: +FlushCache call
  • scripts/run-bdd-tests.sh: +-p 1

🤖 Diagnosed and fixed during the 2026-05-03 autonomous trainer day. Mistral Vibe ICM workspace: ~/Work/Vibe/workspaces/bdd-flakiness-deep-fix/. Trainer (Claude) revised the approach after Mistral's initial fix proved insufficient.

## Summary PR #26 added `BDD_SCHEMA_ISOLATION=true` to CI, but this creates per-scenario schemas WITHOUT running table migrations on them, causing HTTP 500 errors on user registration. This PR reverts that and adopts a simpler approach. ## Root cause analysis Investigation locally: - Run AuthBDD with `BDD_SCHEMA_ISOLATION=true` (PR #26 mode): 0/5 runs PASS, 9 scenarios fail per run with 500 errors - Run AuthBDD without the flag: 5/5 PASS, deterministic - Reason: `SetupScenarioSchema` creates a fresh PostgreSQL schema and sets `search_path`, but the production server's `PostgresRepository` runs migrations once at startup (against the public schema). The new per-scenario schemas have NO tables. ## Fix 1. **Revert PR #26** in `.gitea/workflows/ci-cd.yaml` (remove `export BDD_SCHEMA_ISOLATION=true`) 2. **Sequential test packages** in `scripts/run-bdd-tests.sh` (`-p 1`) — prevents DB contention when multiple feature packages run their tests concurrently against the same Postgres DB 3. **Cache flush AfterScenario** (defensive infra by Mistral Vibe agent): - `pkg/server/server.go`: expose `GetCacheService()` - `pkg/bdd/testserver/server.go`: store cacheService, add `FlushCache()` method - `pkg/bdd/testserver/state_tracer.go`: `TraceStateCacheOperation()` - `pkg/bdd/suite.go`: AfterScenario hook calls `FlushCache()` after `ResetJWTSecrets()` ## Validation - AuthBDD only (5 consecutive runs): **5/5 PASS** (was 0/5 with broken schema isolation) - Full `features/` via `run-bdd-tests.sh`: ALL packages PASS (auth, config, greet, health, jwt, root) ## Trade-off This forces sequential package execution. **Parallel BDD is deferred as architectural follow-up** (T12) because it requires: - Running migrations per-schema (not just once at server start) - Dedicated DB connection pool per scenario (not shared via `SET search_path`) - Coordinated cleanup that doesn't race This PR's goal is CI green TODAY. Parallel scaling is the next architectural step. ## Files modified - `.gitea/workflows/ci-cd.yaml`: revert BDD_SCHEMA_ISOLATION export - `pkg/server/server.go`: +GetCacheService - `pkg/bdd/testserver/server.go`: +cacheService, +FlushCache - `pkg/bdd/testserver/state_tracer.go`: +TraceStateCacheOperation - `pkg/bdd/suite.go`: +FlushCache call - `scripts/run-bdd-tests.sh`: +-p 1 🤖 Diagnosed and fixed during the 2026-05-03 autonomous trainer day. Mistral Vibe ICM workspace: `~/Work/Vibe/workspaces/bdd-flakiness-deep-fix/`. Trainer (Claude) revised the approach after Mistral's initial fix proved insufficient.
arcodange added 1 commit 2026-05-03 16:28:44 +02:00
🐛 fix(bdd): revert PR #26 schema isolation, add cache flush + sequential test execution
All checks were successful
CI/CD Pipeline / Build Docker Cache (push) Successful in 12s
CI/CD Pipeline / CI Pipeline (push) Successful in 8m44s
CI/CD Pipeline / Trigger Docker Push (push) Has been skipped
63d27cc35e
PR #26 added BDD_SCHEMA_ISOLATION=true to CI but this creates per-scenario schemas
WITHOUT running migrations on them, causing 500 errors on user registration. This
PR reverts that and instead relies on:

1. The existing CleanupDatabase hook (truncates all tables AfterScenario)
2. Sequential test package execution (-p 1) to avoid contention between feature
   packages sharing the same Postgres DB

Plus defensive additions for future-proofing:
- pkg/server/server.go: GetCacheService() exposed for test cleanup
- pkg/bdd/testserver/server.go: cacheService field + FlushCache() method
- pkg/bdd/testserver/state_tracer.go: TraceStateCacheOperation
- pkg/bdd/suite.go: AfterScenario hook calls FlushCache()
- scripts/run-bdd-tests.sh: -p 1 added (sequential package execution)

Validation:
- AuthBDD alone: 5/5 PASS (was 0/5 with broken schema isolation)
- Full features/ via run-bdd-tests.sh: ALL PASS (auth, config, greet, health, jwt)

Out of scope (follow-up T12):
- Proper parallel BDD with schema migrations per scenario + dedicated connection
  pools. Required for scaling tests but architecturally significant. Tracked.

Co-Authored-By: Mistral Vibe (devstral-2 / mistral-medium-3.5) - cache flush diagnosis
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> - root cause + revert
arcodange merged commit 93bd384ca8 into main 2026-05-03 16:28:58 +02:00
arcodange deleted branch fix/bdd-cache-flush-and-revert-schema-isolation 2026-05-03 16:28:59 +02:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: arcodange/dance-lessons-coach#28