From 189d44d70f109c7ab7ad822fd65e87a946317229 Mon Sep 17 00:00:00 2001 From: Gabriel Radureau Date: Tue, 5 May 2026 09:44:58 +0200 Subject: [PATCH] =?UTF-8?q?=F0=9F=94=A7=20chore(config):=20defense-in-dept?= =?UTF-8?q?h=20for=20the=20WatchAndApply=20test=20race=20(Q-038)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Follow-up to PR #48 after user question on whether mutex/atomic would be a cleaner fix than removing the log call. Honest answer: the racing memory location is zerolog's global gLevel, which IS already mutated atomically by zerolog itself. The race detector flags it because LoadConfig → SetupLogging writes gLevel via zerolog.SetGlobalLevel and a leaked watcher goroutine reads gLevel via log.Info() — both atomic individually, but go test -race treats the write/read pair as a happens-before violation across goroutine boundaries when there's no synchronization between them. A mutex on Config would not help: the shared state isn't on Config, it's on zerolog's package-level global. atomic.Pointer wouldn't help for the same reason. Combined fix: 1. Keep the log-removal (PR #48) — it's the actual race source: our cancel-handler goroutine's log.Info("watcher stopped") was the reading party. Add a longer comment explaining WHY it's gone. 2. Add pkg/config/main_test.go with TestMain that disables zerolog globally during the test suite. Defense in depth: any FUTURE leaked log call from a watcher-related goroutine won't trigger a race either, because no log call evaluates against the level. Production behavior unchanged. SetupLogging in production runs once at startup before any goroutine could race with it. go test -race -count=2 ./pkg/config/... passes (was failing). --- pkg/config/config.go | 13 +++++++++---- pkg/config/main_test.go | 26 ++++++++++++++++++++++++++ 2 files changed, 35 insertions(+), 4 deletions(-) create mode 100644 pkg/config/main_test.go diff --git a/pkg/config/config.go b/pkg/config/config.go index bb9d2ee..e676f19 100644 --- a/pkg/config/config.go +++ b/pkg/config/config.go @@ -730,10 +730,15 @@ func (c *Config) WatchAndApply(ctx context.Context) { // Stop the watcher on context cancel — we set a flag that the // OnConfigChange handler checks, avoiding the race with viper's // internal state that would occur if we called OnConfigChange again. - // We deliberately do NOT log here: viper's internal watcher goroutine - // has no public Stop, so it can outlive ctx, and a zerolog call here - // would race with the next test's LoadConfig → SetupLogging → - // zerolog.SetGlobalLevel under -race (observed 2026-05-05). + // + // We deliberately do NOT log inside this goroutine: this goroutine + // outlives ctx (parent's defer cancel only fires when the test's + // outer scope exits, not when t.Cleanup runs), so a log call here + // races with the next test's LoadConfig → SetupLogging → + // zerolog.SetGlobalLevel under -race (observed 2026-05-05, Q-038). + // The flag-set is the load-bearing operation; the missing log line + // is a small ops cost (operators learn the watcher stops on shutdown + // via the parent shutdown logs, not a dedicated message). go func() { <-ctx.Done() c.reloadMu.Lock() diff --git a/pkg/config/main_test.go b/pkg/config/main_test.go new file mode 100644 index 0000000..560b4dc --- /dev/null +++ b/pkg/config/main_test.go @@ -0,0 +1,26 @@ +package config + +import ( + "os" + "testing" + + "github.com/rs/zerolog" +) + +// TestMain quiets the global zerolog level for the duration of the test +// suite. Rationale (Q-038, 2026-05-05): viper's internal watcher goroutine +// (started by viper.WatchConfig in WatchAndApply) has no public Stop and +// can outlive a test's context. Any log call from a leaked goroutine +// races with the next test's LoadConfig → SetupLogging → +// zerolog.SetGlobalLevel under `go test -race`. Disabling the logger here +// is the root-cause fix: the racing memory location is zerolog's gLevel +// global, and if no log call ever evaluates against it we sidestep the +// race entirely without changing production behavior. +// +// In production, log calls happen against an unchanging global level +// (SetupLogging runs once at startup), so the race condition does not +// occur there. +func TestMain(m *testing.M) { + zerolog.SetGlobalLevel(zerolog.Disabled) + os.Exit(m.Run()) +}