🐛 fix: emit all config-loading logs in correct JSON format from the start #16
Reference in New Issue
Block a user
Delete Branch "fix/json-logging-first-message"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Closes #15
When
logging.json: true(orDLC_LOGGING_JSON=true), the logger was unconditionally initialised to console/text format at the top ofLoadConfig(), so early log lines — most visibly "Config file loaded" — were always written as human-readable text regardless of configuration.Root cause
Classic chicken-and-egg: the format flag lives inside the config that is being loaded. The format-switch block only ran after
v.Unmarshal(), too late for the config-file log.Changes
pkg/config/config.gopeekJSONLogging(): resolves the JSON flag before any log is emitted by (1) checkingDLC_LOGGING_JSONdirectly viaos.Getenv, then (2) doing a minimal throwaway Viper pre-read of the config file for thelogging.jsonkey. This mirrors Viper's own priority order without parsing the full config twice.Unmarshal().scripts/start-server.sh,test-graceful-shutdown.sh,test-opentelemetry.shPROJECT_DIRpath with a dynamicSCRIPTS_DIR=$(dirname $(realpath ${BASH_SOURCE[0]}))derivation so scripts work from any worktree or clone location.Test plan
go test ./pkg/...— all passscripts/test-graceful-shutdown.sh— all JSON valid, all startup logs present{"level":"info",...,"message":"Logging configured"}, every line is valid JSON🔍 Key decisions & observations
Why
peekJSONLogging()instead of deferring the early logsTwo approaches were possible:
Option 2 is cleaner — log order stays natural and there is no buffering logic. The cost is reading the config file twice at startup, but that is negligible for a one-time boot path.
Why env var is checked with
os.Getenvdirectly instead of ViperViper bindings are not active until
v.AutomaticEnv()/v.BindEnv()are called, which happens well after the first log. ReadingDLC_LOGGING_JSONviaos.Getenvfirst is zero-cost and covers the primary deployment scenario (env var set in container/systemd). Only when the env var is absent do we pay for the config file pre-read.Why the redundant format-switch block after
Unmarshal()was removedThe block set the same format that
peekJSONLogging()already applied. Keeping it would have been misleading (the comment said "Configure log output format first" — it was no longer first), and it could silently shadow a future change topeekJSONLogging. Removed for clarity.Pre-existing test failure in
test-graceful-shutdown.shThe script fires a background
curl /api/readyat line 64 and immediately callsstop. The curl resolves while the server is still healthy, so "Readiness check: not ready" is never logged. This race condition existed before this PR — confirmed by readingpkg/server/server.go:239where that message only fires when the readiness context is already cancelled. Not in scope here.Script
PROJECT_DIRfix included opportunisticallyThe three scripts used a hardcoded absolute path that broke in any worktree or clone at a different location. Fixed in the same commit since it was the only way to run
test-graceful-shutdown.shfor local validation.Three related issues fixed together: 1. Readiness context was never cancelled during shutdown server.Run() had a type assertion for a Cancel() method that no standard context.Context implements, so readiness stayed "ready" through the entire shutdown window. Added CancelableContext to pkg/server — a thin wrapper that exposes Cancel() — and switched cmd/server/main.go to use it. Test servers and CLI continue passing context.Background() unchanged. 2. "Server exited" log was never emitted The test script expected it; main.go had no log after server.Run() returned. Added log.Trace().Msg("Server exited") after the Run() call. 3. Double-SIGTERM caused non-JSON "signal: terminated" in server.log test-graceful-shutdown.sh sent SIGTERM, then called $SERVER_CMD stop which sent a second SIGTERM. After signal.NotifyContext is cancelled, the second signal hits the default handler and Go prints "signal: terminated" to stderr, breaking the all-JSON-lines assertion. Fixed by waiting for the PID to exit ourselves instead of re-invoking the stop script. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>Second commit — readiness cancellation + shutdown test stabilisation
Why
CancelableContextinstead of changingNewServerto accept aCancelFuncNewServerhas 5 callers (cmd/server, cmd/cli, and 3 in pkg/bdd/testserver). Adding a cancel func parameter would require updating all of them. The existing code already had the right intention —s.readyCtx.(interface{ Cancel() })— it just needed a concrete type to match.CancelableContextsatisfies that interface while leaving every caller that passescontext.Background()unchanged.Why
Server exitedgoes inmain.gonotserver.goRun()returns an error; if the caller wants to handle it differently (e.g. the CLI sub-command), it should decide whether to log.main.gois the natural place for the final lifecycle log, matching the symmetry of"Server running"being the last startup log.Why the test no longer calls
$SERVER_CMD stopsignal.NotifyContextstops forwarding signals once the context is cancelled. A second SIGTERM hits Go's default signal handler, which printssignal: terminatedto stderr — one non-JSON line that breaks the all-JSON assertion. Waiting on the PID directly avoids this and is cleaner: we own the signal, we own the wait.