🐛 fix: emit all config-loading logs in correct JSON format from the start #16

Merged
arcodange merged 5 commits from fix/json-logging-first-message into main 2026-04-12 23:28:36 +02:00
Owner

Summary

Closes #15

When logging.json: true (or DLC_LOGGING_JSON=true), the logger was unconditionally initialised to console/text format at the top of LoadConfig(), so early log lines — most visibly "Config file loaded" — were always written as human-readable text regardless of configuration.

Root cause

Classic chicken-and-egg: the format flag lives inside the config that is being loaded. The format-switch block only ran after v.Unmarshal(), too late for the config-file log.

Changes

pkg/config/config.go

  • Add peekJSONLogging(): resolves the JSON flag before any log is emitted by (1) checking DLC_LOGGING_JSON directly via os.Getenv, then (2) doing a minimal throwaway Viper pre-read of the config file for the logging.json key. This mirrors Viper's own priority order without parsing the full config twice.
  • Apply the resolved format immediately and emit "Logging configured" as the very first log line.
  • Remove the now-redundant format-switch block that ran after Unmarshal().

scripts/start-server.sh, test-graceful-shutdown.sh, test-opentelemetry.sh

  • Replace hardcoded PROJECT_DIR path with a dynamic SCRIPTS_DIR=$(dirname $(realpath ${BASH_SOURCE[0]})) derivation so scripts work from any worktree or clone location.

Test plan

  • go test ./pkg/... — all pass
  • scripts/test-graceful-shutdown.sh — all JSON valid, all startup logs present
  • Manual smoke test: first line is {"level":"info",...,"message":"Logging configured"}, every line is valid JSON
## Summary Closes #15 When `logging.json: true` (or `DLC_LOGGING_JSON=true`), the logger was unconditionally initialised to console/text format at the top of `LoadConfig()`, so early log lines — most visibly **"Config file loaded"** — were always written as human-readable text regardless of configuration. ## Root cause Classic chicken-and-egg: the format flag lives inside the config that is being loaded. The format-switch block only ran *after* `v.Unmarshal()`, too late for the config-file log. ## Changes ### `pkg/config/config.go` - Add `peekJSONLogging()`: resolves the JSON flag **before** any log is emitted by (1) checking `DLC_LOGGING_JSON` directly via `os.Getenv`, then (2) doing a minimal throwaway Viper pre-read of the config file for the `logging.json` key. This mirrors Viper's own priority order without parsing the full config twice. - Apply the resolved format immediately and emit **"Logging configured"** as the very first log line. - Remove the now-redundant format-switch block that ran after `Unmarshal()`. ### `scripts/start-server.sh`, `test-graceful-shutdown.sh`, `test-opentelemetry.sh` - Replace hardcoded `PROJECT_DIR` path with a dynamic `SCRIPTS_DIR=$(dirname $(realpath ${BASH_SOURCE[0]}))` derivation so scripts work from any worktree or clone location. ## Test plan - [x] `go test ./pkg/...` — all pass - [x] `scripts/test-graceful-shutdown.sh` — all JSON valid, all startup logs present - [x] Manual smoke test: first line is `{"level":"info",...,"message":"Logging configured"}`, every line is valid JSON
arcodange added 1 commit 2026-04-12 19:37:45 +02:00
🐛 fix: emit all config-loading logs in correct format from the start
All checks were successful
CI/CD Pipeline / Build Docker Cache (push) Successful in 10s
CI/CD Pipeline / CI Pipeline (push) Successful in 4m5s
CI/CD Pipeline / Trigger Docker Push (push) Has been skipped
18ac000d30
The logger was initialised to console format unconditionally, so the
"Config file loaded" message (and similar early logs) were always written
as human-readable text even when JSON logging was configured.

Root cause: classic chicken-and-egg — the format flag lives inside the
config that is being loaded.

Fix: add peekJSONLogging() which resolves the format before any log is
emitted by (1) checking DLC_LOGGING_JSON directly via os.Getenv, then
(2) doing a minimal throwaway Viper pre-read of the config file for the
logging.json key.  The redundant format-switch block that ran after
Unmarshal() is removed.

Also add the "Logging configured" log as the very first line, and
replace the hardcoded PROJECT_DIR path in start-server.sh,
test-graceful-shutdown.sh, and test-opentelemetry.sh with a dynamic
derivation from BASH_SOURCE[0].

Closes #15

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Owner

🔍 Key decisions & observations

Why peekJSONLogging() instead of deferring the early logs

Two approaches were possible:

  1. Store the early log messages and replay them after the format is known
  2. Resolve the format before the first log is emitted

Option 2 is cleaner — log order stays natural and there is no buffering logic. The cost is reading the config file twice at startup, but that is negligible for a one-time boot path.

Why env var is checked with os.Getenv directly instead of Viper

Viper bindings are not active until v.AutomaticEnv() / v.BindEnv() are called, which happens well after the first log. Reading DLC_LOGGING_JSON via os.Getenv first is zero-cost and covers the primary deployment scenario (env var set in container/systemd). Only when the env var is absent do we pay for the config file pre-read.

Why the redundant format-switch block after Unmarshal() was removed

The block set the same format that peekJSONLogging() already applied. Keeping it would have been misleading (the comment said "Configure log output format first" — it was no longer first), and it could silently shadow a future change to peekJSONLogging. Removed for clarity.

Pre-existing test failure in test-graceful-shutdown.sh

The script fires a background curl /api/ready at line 64 and immediately calls stop. The curl resolves while the server is still healthy, so "Readiness check: not ready" is never logged. This race condition existed before this PR — confirmed by reading pkg/server/server.go:239 where that message only fires when the readiness context is already cancelled. Not in scope here.

Script PROJECT_DIR fix included opportunistically

The three scripts used a hardcoded absolute path that broke in any worktree or clone at a different location. Fixed in the same commit since it was the only way to run test-graceful-shutdown.sh for local validation.

## 🔍 Key decisions & observations **Why `peekJSONLogging()` instead of deferring the early logs** Two approaches were possible: 1. Store the early log messages and replay them after the format is known 2. Resolve the format *before* the first log is emitted Option 2 is cleaner — log order stays natural and there is no buffering logic. The cost is reading the config file twice at startup, but that is negligible for a one-time boot path. **Why env var is checked with `os.Getenv` directly instead of Viper** Viper bindings are not active until `v.AutomaticEnv()` / `v.BindEnv()` are called, which happens well after the first log. Reading `DLC_LOGGING_JSON` via `os.Getenv` first is zero-cost and covers the primary deployment scenario (env var set in container/systemd). Only when the env var is absent do we pay for the config file pre-read. **Why the redundant format-switch block after `Unmarshal()` was removed** The block set the same format that `peekJSONLogging()` already applied. Keeping it would have been misleading (the comment said "Configure log output format first" — it was no longer first), and it could silently shadow a future change to `peekJSONLogging`. Removed for clarity. **Pre-existing test failure in `test-graceful-shutdown.sh`** The script fires a background `curl /api/ready` at line 64 and immediately calls `stop`. The curl resolves while the server is still healthy, so "Readiness check: not ready" is never logged. This race condition existed before this PR — confirmed by reading `pkg/server/server.go:239` where that message only fires when the readiness context is already cancelled. Not in scope here. **Script `PROJECT_DIR` fix included opportunistically** The three scripts used a hardcoded absolute path that broke in any worktree or clone at a different location. Fixed in the same commit since it was the only way to run `test-graceful-shutdown.sh` for local validation.
arcodange added 1 commit 2026-04-12 19:49:20 +02:00
🐛 fix: wire up readiness cancellation and stabilise graceful shutdown test
All checks were successful
CI/CD Pipeline / Build Docker Cache (push) Successful in 12s
CI/CD Pipeline / CI Pipeline (push) Successful in 4m21s
CI/CD Pipeline / Trigger Docker Push (push) Has been skipped
928fa9170c
Three related issues fixed together:

1. Readiness context was never cancelled during shutdown
   server.Run() had a type assertion for a Cancel() method that no standard
   context.Context implements, so readiness stayed "ready" through the entire
   shutdown window. Added CancelableContext to pkg/server — a thin wrapper that
   exposes Cancel() — and switched cmd/server/main.go to use it. Test servers
   and CLI continue passing context.Background() unchanged.

2. "Server exited" log was never emitted
   The test script expected it; main.go had no log after server.Run() returned.
   Added log.Trace().Msg("Server exited") after the Run() call.

3. Double-SIGTERM caused non-JSON "signal: terminated" in server.log
   test-graceful-shutdown.sh sent SIGTERM, then called $SERVER_CMD stop which
   sent a second SIGTERM. After signal.NotifyContext is cancelled, the second
   signal hits the default handler and Go prints "signal: terminated" to stderr,
   breaking the all-JSON-lines assertion. Fixed by waiting for the PID to exit
   ourselves instead of re-invoking the stop script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Owner

Second commit — readiness cancellation + shutdown test stabilisation

Why CancelableContext instead of changing NewServer to accept a CancelFunc

NewServer has 5 callers (cmd/server, cmd/cli, and 3 in pkg/bdd/testserver). Adding a cancel func parameter would require updating all of them. The existing code already had the right intention — s.readyCtx.(interface{ Cancel() }) — it just needed a concrete type to match. CancelableContext satisfies that interface while leaving every caller that passes context.Background() unchanged.

Why Server exited goes in main.go not server.go

Run() returns an error; if the caller wants to handle it differently (e.g. the CLI sub-command), it should decide whether to log. main.go is the natural place for the final lifecycle log, matching the symmetry of "Server running" being the last startup log.

Why the test no longer calls $SERVER_CMD stop

signal.NotifyContext stops forwarding signals once the context is cancelled. A second SIGTERM hits Go's default signal handler, which prints signal: terminated to stderr — one non-JSON line that breaks the all-JSON assertion. Waiting on the PID directly avoids this and is cleaner: we own the signal, we own the wait.

## Second commit — readiness cancellation + shutdown test stabilisation **Why `CancelableContext` instead of changing `NewServer` to accept a `CancelFunc`** `NewServer` has 5 callers (cmd/server, cmd/cli, and 3 in pkg/bdd/testserver). Adding a cancel func parameter would require updating all of them. The existing code already had the right intention — `s.readyCtx.(interface{ Cancel() })` — it just needed a concrete type to match. `CancelableContext` satisfies that interface while leaving every caller that passes `context.Background()` unchanged. **Why `Server exited` goes in `main.go` not `server.go`** `Run()` returns an error; if the caller wants to handle it differently (e.g. the CLI sub-command), it should decide whether to log. `main.go` is the natural place for the final lifecycle log, matching the symmetry of `"Server running"` being the last startup log. **Why the test no longer calls `$SERVER_CMD stop`** `signal.NotifyContext` stops forwarding signals once the context is cancelled. A second SIGTERM hits Go's default signal handler, which prints `signal: terminated` to stderr — one non-JSON line that breaks the all-JSON assertion. Waiting on the PID directly avoids this and is cleaner: we own the signal, we own the wait.
arcodange added 1 commit 2026-04-12 19:57:05 +02:00
🔍 test: show readiness HTTP response during shutdown window
All checks were successful
CI/CD Pipeline / Build Docker Cache (push) Successful in 9s
CI/CD Pipeline / CI Pipeline (push) Successful in 4m15s
CI/CD Pipeline / Trigger Docker Push (push) Has been skipped
165e082eb9
Print the full response body and status code from the /api/ready probe
that fires during the graceful shutdown window, so the 200→503 transition
is visible in test output rather than silently discarded.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
arcodange added 1 commit 2026-04-12 20:02:03 +02:00
🔍 test: print formatted server log at end of graceful shutdown test
All checks were successful
CI/CD Pipeline / Build Docker Cache (push) Successful in 12s
CI/CD Pipeline / CI Pipeline (push) Successful in 4m10s
CI/CD Pipeline / Trigger Docker Push (push) Has been skipped
18b2f0a87f
Display the full log as a readable summary before cleanup so the
complete server lifecycle is visible in test output without needing
to preserve the log file manually.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
arcodange added 1 commit 2026-04-12 23:22:17 +02:00
Cut AGENTS.md from 1296 to ~210 lines: removed development timeline,
changelog, stale AI agent section, verbose architecture examples, and
duplicate content. Kept tech stack, project structure, key commands,
config reference, API table, ADR index, and commit conventions.

Cut README.md from 423 to ~80 lines: removed duplicate CI/CD sections
(one had merge conflict markers), non-existent Cobra CLI and
documentation/ references, and the AI agent usage section. Kept
features, quick start, config table, API table, testing, and Gitea
client setup.

Also includes gitea-client.sh fixes from earlier session:
create-pr sub-command and safe jq-based JSON body in comment-pr.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
arcodange merged commit c17fb4f9b4 into main 2026-04-12 23:28:36 +02:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: arcodange/dance-lessons-coach#16