🔍 feat(bdd): add state tracer for debugging test execution and state pollution
- Add STATE_TRACER_README.md documenting purpose, usage, limitations, and future enhancements - Add state_tracer.go with per-process file-based tracing to $TMPDIR - Trace scenario start/end, database cleanup, JWT secret operations - Integrate tracing into suite.go BeforeScenario/AfterScenario hooks - Document findings: sequential execution per feature, shared database across processes - Identify root causes: in-memory JWT secrets not isolated by schema, config reload timing Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
This commit is contained in:
241
pkg/bdd/testserver/STATE_TRACER_README.md
Normal file
241
pkg/bdd/testserver/STATE_TRACER_README.md
Normal file
@@ -0,0 +1,241 @@
|
||||
# BDD State Tracer
|
||||
|
||||
## Overview
|
||||
|
||||
The BDD State Tracer is a debugging tool that logs scenario execution, database operations, and state modifications to a file in `$TMPDIR` for analysis of test execution order and state pollution issues.
|
||||
|
||||
## Purpose
|
||||
|
||||
### Why Tracing Was Added
|
||||
|
||||
During multi-iteration BDD test runs with `./scripts/validate-test-suite.sh`, intermittent failures occurred that were difficult to diagnose:
|
||||
- Tests passed when run individually
|
||||
- Tests failed when run together in the validation script
|
||||
- Patterns suggested database state pollution between scenarios across different feature packages
|
||||
|
||||
The tracer was created to answer key questions:
|
||||
1. **Execution Order**: Which scenarios run in which order?
|
||||
2. **State Modifications**: What database writes/cleanups occur and when?
|
||||
3. **Overlap Detection**: Are scenarios running in parallel (causing race conditions)?
|
||||
4. **Isolation Verification**: Is schema isolation working as expected?
|
||||
|
||||
### Key Findings from Tracing
|
||||
|
||||
1. **Sequential Execution**: Each feature package runs in a separate process (separate PIDs), but scenarios within each feature run sequentially
|
||||
2. **Shared Database**: All processes share the same PostgreSQL database connection
|
||||
3. **Schema Isolation Status**: When `BDD_SCHEMA_ISOLATION=false` (default in validate script), all scenarios share the `public` schema
|
||||
4. **Cleanup Operations**: Database cleanup (`CleanupDatabase`) runs after each scenario, deleting all test data from all tables
|
||||
5. **In-Memory State**: JWT secrets are stored in-memory only, not in database - schema isolation doesn't prevent JWT secret pollution
|
||||
|
||||
### Example Trace Output
|
||||
|
||||
```
|
||||
2026-04-11T10:10:53.032156 | auth | User registration | SCENARIO_START |
|
||||
2026-04-11T10:10:53.146438 | auth | User registration | SCENARIO_END | PASSED
|
||||
2026-04-11T10:10:53.152398 | auth | User registration | JWT_RESET | ok
|
||||
2026-04-11T10:10:53.162357 | auth | Failed authentication | SCENARIO_START |
|
||||
2026-04-11T10:10:53.268273 | auth | Failed authentication | SCENARIO_END | PASSED
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Enable Tracing
|
||||
|
||||
Set the environment variable `BDD_TRACE_STATE=1` before running tests:
|
||||
|
||||
```bash
|
||||
# Single run with tracing
|
||||
BDD_TRACE_STATE=1 go test ./features/auth -v
|
||||
|
||||
# Validation script with tracing
|
||||
BDD_TRACE_STATE=1 ./scripts/validate-test-suite.sh 1
|
||||
|
||||
# Multiple runs with tracing
|
||||
BDD_TRACE_STATE=1 ./scripts/validate-test-suite.sh 5
|
||||
```
|
||||
|
||||
### Trace File Location
|
||||
|
||||
Trace files are written to `$TMPDIR` (typically `/var/folders/.../T/` on macOS or `/tmp` on Linux):
|
||||
|
||||
```bash
|
||||
# Find trace files
|
||||
ls -la $TMPDIR/bdd-state-trace-*.log
|
||||
|
||||
# View a trace file
|
||||
cat $TMPDIR/bdd-state-trace-20260411-101053-12345.log
|
||||
```
|
||||
|
||||
### Trace File Format
|
||||
|
||||
```
|
||||
TIMESTAMP | FEATURE | SCENARIO | ACTION | DETAILS
|
||||
2026-04-11T10:10:53.032156 | auth | User registration | SCENARIO_START |
|
||||
2026-04-11T10:10:53.146438 | auth | User registration | SCENARIO_END | PASSED
|
||||
2026-04-11T10:10:53.152398 | auth | User registration | JWT_RESET | ok
|
||||
2026-04-11T10:10:53.162357 | auth | User registration | DB_CLEANUP | all_tables
|
||||
```
|
||||
|
||||
**Columns:**
|
||||
- `TIMESTAMP`: ISO 8601 format with microseconds
|
||||
- `FEATURE`: Feature name from `FEATURE` environment variable
|
||||
- `SCENARIO`: Scenario name (includes URI for disambiguation)
|
||||
- `ACTION`: Type of action (see below)
|
||||
- `DETAILS`: Additional context
|
||||
|
||||
**Action Types:**
|
||||
- `SCENARIO_START` - Scenario execution begins
|
||||
- `SCENARIO_END` - Scenario execution completes (PASSED or FAILED)
|
||||
- `DB_CLEANUP` - Database cleanup operation
|
||||
- `DB_SELECT` - Database read operation
|
||||
- `JWT_RESET` - JWT secrets reset to initial state
|
||||
- `DB_INSERT/UPDATE/DELETE` - Database write operations (future)
|
||||
- `SCHEMA_*` - Schema isolation operations (future)
|
||||
- `TX_*` - Transaction boundary operations (future)
|
||||
|
||||
## Implementation
|
||||
|
||||
### Architecture
|
||||
|
||||
The state tracer uses a simple file-based approach:
|
||||
|
||||
1. **Per-Process Tracing**: Each `go test` process creates its own trace file with unique filename based on timestamp and PID
|
||||
2. **Immediate Flush**: Each trace line is flushed immediately to disk using `Sync()` to prevent data loss
|
||||
3. **No Dependencies**: Uses only standard library (`os`, `fmt`, `time`, `path/filepath`)
|
||||
4. **Singleton Pattern**: Package-level functions for easy usage across the codebase
|
||||
|
||||
### Files
|
||||
|
||||
- `pkg/bdd/testserver/state_tracer.go` - Core tracing functions
|
||||
- `pkg/bdd/suite.go` - Integration with godog Before/After scenario hooks
|
||||
|
||||
### Key Functions
|
||||
|
||||
```go
|
||||
// Package-level functions (called from anywhere)
|
||||
TraceStateScenarioStart(feature, scenario string)
|
||||
TraceStateScenarioEnd(feature, scenario string, err error)
|
||||
TraceStateDBCleanup(feature, scenario, table string)
|
||||
TraceStateJWTSecretOperation(feature, scenario, operation, details string)
|
||||
TraceStateSchemaIsolation(feature, scenario, operation, details string)
|
||||
TraceStateTransaction(feature, scenario, action, details string)
|
||||
TraceStateDBRead(feature, scenario, table, details string)
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
### Current Limitations
|
||||
|
||||
1. **Per-Process Files**: Each `go test` process creates its own file, making correlation across processes manual
|
||||
2. **No Database Write Tracing**: Currently only traces cleanup, not individual INSERT/UPDATE/DELETE operations
|
||||
3. **No API Call Tracing**: Doesn't trace HTTP requests made during scenarios
|
||||
4. **No Timing Analysis**: Doesn't measure duration between operations automatically
|
||||
5. **No Schema Name in Trace**: When schema isolation is enabled, doesn't show which schema is active
|
||||
6. **File Rotation**: No automatic cleanup of old trace files
|
||||
|
||||
### Known Issues
|
||||
|
||||
1. **PID-based filenames**: If multiple runs happen in the same second, filenames could collide
|
||||
2. **Large file sizes**: High-volume tracing could create large files (mitigated by per-run files)
|
||||
3. **No header/footer**: Trace files start immediately with data, no metadata about the run
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Priority 1: Process Correlation
|
||||
- Add a unique run ID that can be passed across all processes
|
||||
- Include process start/end markers to show process lifecycle
|
||||
- Add parent PID tracking to show process hierarchy
|
||||
|
||||
### Priority 2: Database Operation Tracing
|
||||
- Add tracing for all database writes (INSERT, UPDATE, DELETE)
|
||||
- Include query text and affected rows
|
||||
- Trace transaction boundaries with IDs
|
||||
- Add schema name to all database operations when isolation is enabled
|
||||
|
||||
### Priority 3: API Call Tracing
|
||||
- Trace all HTTP requests made during scenarios
|
||||
- Include request method, path, status code, and duration
|
||||
- Mark requests that modify state (POST, PUT, DELETE vs GET)
|
||||
|
||||
### Priority 4: Analysis Tools
|
||||
- Create a `bdd-trace-analyzer` tool to:
|
||||
- Merge trace files from all processes in correct order
|
||||
- Detect overlapping scenarios (parallel execution)
|
||||
- Identify database state pollution patterns
|
||||
- Generate visualization of scenario execution timeline
|
||||
- Flag potential race conditions
|
||||
|
||||
### Priority 5: Improved Output
|
||||
- Add trace file header with metadata (run ID, start time, config, etc.)
|
||||
- Color-coded output for different action types
|
||||
- JSON output option for programmatic analysis
|
||||
- Trace level filtering (DEBUG, INFO, WARN, ERROR)
|
||||
|
||||
### Priority 6: Performance Optimization
|
||||
- Batch writes instead of per-line flush (with configurable flush interval)
|
||||
- Compress old trace files
|
||||
- Automatic cleanup of old files
|
||||
|
||||
## Analysis Use Cases
|
||||
|
||||
### Detecting State Pollution
|
||||
|
||||
Look for patterns like:
|
||||
```
|
||||
PID 1234 | auth | Scenario A | DB_CLEANUP | all_tables
|
||||
PID 5678 | greet | Scenario B | SCENARIO_START |
|
||||
# ^ Scenario B starts AFTER auth cleanup - potential issue
|
||||
```
|
||||
|
||||
### Detecting Parallel Execution
|
||||
|
||||
Check if timestamps overlap:
|
||||
```
|
||||
PID 1234 | 10:10:53.032 | auth | Scenario A | SCENARIO_START
|
||||
PID 5678 | 10:10:53.035 | greet | Scenario B | SCENARIO_START
|
||||
# ^ Both started within 3ms - likely parallel
|
||||
```
|
||||
|
||||
### Verifying Schema Isolation
|
||||
|
||||
Check that each scenario gets its own schema:
|
||||
```
|
||||
PID 1234 | auth | Scenario A | SCHEMA_CREATE | test_a1b2c3d4
|
||||
PID 1234 | auth | Scenario B | SCHEMA_CREATE | test_e5f6g7h8
|
||||
# ^ Different schemas for different scenarios - good
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Tracing Not Working
|
||||
|
||||
1. Verify `BDD_TRACE_STATE=1` is set:
|
||||
```bash
|
||||
echo $BDD_TRACE_STATE
|
||||
```
|
||||
2. Check if trace files are being created:
|
||||
```bash
|
||||
ls -la $TMPDIR/bdd-state-trace-*.log
|
||||
```
|
||||
3. Verify the `testserver` package is being used (tracing is integrated there)
|
||||
|
||||
### No Trace Files Found
|
||||
|
||||
- Tracing only works when `BDD_TRACE_STATE=1` is set before the test process starts
|
||||
- Each `go test` process creates its own file - if tests pass quickly, files may be short
|
||||
- Files are created in `$TMPDIR` which defaults to `/tmp` on Linux and a temp folder on macOS
|
||||
|
||||
### Trace Files Too Large
|
||||
|
||||
- Tracing every operation can generate large files
|
||||
- Consider filtering to specific scenarios:
|
||||
```bash
|
||||
# Run only failing scenarios with tracing
|
||||
BDD_TRACE_STATE=1 go test ./features/auth -v -run "TestAuthBDD/Password_reset"
|
||||
```
|
||||
|
||||
## Related Files
|
||||
|
||||
- `pkg/bdd/suite.go` - Godog test suite initialization with tracing hooks
|
||||
- `pkg/bdd/testserver/server.go` - Test server with tracing integration
|
||||
- `scripts/validate-test-suite.sh` - Test validation script
|
||||
Reference in New Issue
Block a user