8 Commits

Author SHA1 Message Date
61837b7385 🐛 fix(bdd): shouldEnableV2 wrongly matched ~@v2 as @v2 substring
All checks were successful
CI/CD Pipeline / Build Docker Cache (push) Successful in 10s
CI/CD Pipeline / CI Pipeline (push) Successful in 6m46s
CI/CD Pipeline / Trigger Docker Push (push) Has been skipped
Pre-existing latent bug surfaced by the new @v2-gate scenario in
greet.feature (added in PR #56 follow-up): shouldEnableV2 used
strings.Contains(tags, "@v2") which matched the negation tag `~@v2`
as a positive inclusion. Result: the greet "v1" sub-test, intended
to run with v2_enabled=false, was actually starting the test server
with v2_enabled=true. The @v2-gate scenario asserting "v2 disabled →
404" got 200 instead.

Fix: parse the GODOG_TAGS expression by splitting on `&&` / `||` /
whitespace, then check each clause for exact `@v2` match (positive
inclusion only). The negation `~@v2` no longer matches.

The new BDD scenario is the regression test — without the fix, it
fails with status 200 instead of 404. With the fix, both greet "v1"
(v2 disabled) and "v2" (v2 enabled) sub-tests pass cleanly.

Verifier verdict: APPROVE. Race-clean. Full BDD suite green.
2026-05-05 10:37:50 +02:00
de5b599455 feat(server): api.v2_enabled hot-reload via middleware gate (ADR-0023 Phase 4) (#56)
Some checks failed
CI/CD Pipeline / Build Docker Cache (push) Successful in 9s
CI/CD Pipeline / Trigger Docker Push (push) Has been cancelled
CI/CD Pipeline / CI Pipeline (push) Has been cancelled
Co-authored-by: Gabriel Radureau <arcodange@gmail.com>
Co-committed-by: Gabriel Radureau <arcodange@gmail.com>
2026-05-05 10:35:03 +02:00
9895c159fe 📝 docs(adr): ADR-0027 Ollama Tier 1 onboarding + README index reconciliation (#55)
Co-authored-by: Gabriel Radureau <arcodange@gmail.com>
Co-committed-by: Gabriel Radureau <arcodange@gmail.com>
2026-05-05 10:24:01 +02:00
8d93050636 feat(server): add go_version to /api/info response (#54)
All checks were successful
CI/CD Pipeline / Build Docker Cache (push) Successful in 7s
CI/CD Pipeline / CI Pipeline (push) Successful in 4m57s
CI/CD Pipeline / Trigger Docker Push (push) Successful in 6s
Co-authored-by: Gabriel Radureau <arcodange@gmail.com>
Co-committed-by: Gabriel Radureau <arcodange@gmail.com>
2026-05-05 10:18:30 +02:00
42d165624b 🧪 test(user): SHA-256 fingerprint stays non-empty and != secret value (Mistral autonomous) (#53)
All checks were successful
CI/CD Pipeline / Build Docker Cache (push) Successful in 8s
CI/CD Pipeline / CI Pipeline (push) Successful in 4m9s
CI/CD Pipeline / Trigger Docker Push (push) Successful in 6s
Co-authored-by: Gabriel Radureau <arcodange@gmail.com>
Co-committed-by: Gabriel Radureau <arcodange@gmail.com>
2026-05-05 10:08:36 +02:00
e9d61a7fb0 🧪 test(bdd): admin metadata endpoint security property — no secret leak (#52)
All checks were successful
CI/CD Pipeline / Build Docker Cache (push) Successful in 11s
CI/CD Pipeline / CI Pipeline (push) Successful in 3m41s
CI/CD Pipeline / Trigger Docker Push (push) Successful in 5s
Co-authored-by: Gabriel Radureau <arcodange@gmail.com>
Co-committed-by: Gabriel Radureau <arcodange@gmail.com>
2026-05-05 09:56:17 +02:00
f71495b6fc feat(admin): GET /api/v1/admin/jwt/secrets — metadata-only introspection (#51)
Some checks failed
CI/CD Pipeline / Build Docker Cache (push) Successful in 57s
CI/CD Pipeline / Trigger Docker Push (push) Has been cancelled
CI/CD Pipeline / CI Pipeline (push) Has been cancelled
Co-authored-by: Gabriel Radureau <arcodange@gmail.com>
Co-committed-by: Gabriel Radureau <arcodange@gmail.com>
2026-05-05 09:51:54 +02:00
46df1f6170 🔧 chore(config): defense-in-depth for WatchAndApply test race (Q-038) (#50)
Some checks failed
CI/CD Pipeline / Build Docker Cache (push) Successful in 14s
CI/CD Pipeline / Trigger Docker Push (push) Has been cancelled
CI/CD Pipeline / CI Pipeline (push) Has been cancelled
Co-authored-by: Gabriel Radureau <arcodange@gmail.com>
Co-committed-by: Gabriel Radureau <arcodange@gmail.com>
2026-05-05 09:45:14 +02:00
19 changed files with 570 additions and 28 deletions

View File

@@ -1,6 +1,6 @@
# Config Hot Reloading Strategy
**Status:** Phase 1+2+3 Implemented (2026-05-05). Hot-reloadable fields: `logging.level`, `auth.jwt.ttl`, `telemetry.sampler.type`, `telemetry.sampler.ratio`. Plumbing: `Config.WatchAndApply` in `pkg/config/config.go`, `ReconfigureTracerProvider` in `pkg/telemetry/telemetry.go`, sampler reconfigure callback wired in `pkg/server/server.go Run`. Phase 2 also fixed a pre-existing bug where the hardcoded 24h TTL ignored `auth.jwt.ttl` from config. Remaining field `api.v2_enabled` is **deferred**: hot-reloading routing requires either an always-register-with-middleware-gate refactor of the chi router or an atomic router swap — different complexity class, separate ADR if reopened.
**Status:** Implemented — all 4 phases shipped (2026-05-05). Hot-reloadable fields: `logging.level` (Phase 1), `auth.jwt.ttl` (Phase 2), `telemetry.sampler.type` + `telemetry.sampler.ratio` (Phase 3), `api.v2_enabled` (Phase 4). Plumbing: `Config.WatchAndApply` in `pkg/config/config.go` is the single entry point. Phase 2 fixed a pre-existing bug where hardcoded 24h TTL ignored `auth.jwt.ttl`. Phase 4 chose the **always-register-with-middleware-gate** approach: v2 routes are now ALWAYS registered, and `Server.v2EnabledGate` middleware reads the live config on every request (returns 404 + JSON body when disabled). No router rebuild needed for the flag flip. 3 unit tests in `pkg/server/v2_gate_test.go` cover blocked-when-disabled / passes-when-enabled / hot-reload-mid-life-of-same-Server.
**Authors:** Gabriel Radureau, AI Agent
**Date:** 2026-04-05
**Last Updated:** 2026-05-05

View File

@@ -141,10 +141,13 @@ We will implement a new `GET /api/info` endpoint that returns a JSON object with
"build_date": "2026-05-04T08:00:00Z",
"uptime_seconds": 1234,
"cache_enabled": true,
"healthz_status": "healthy"
"healthz_status": "healthy",
"go_version": "go1.26.1"
}
```
The `go_version` field provides the Go runtime version via `runtime.Version()`, useful for ops debugging (e.g., identifying which Go version is running in production).
### Rationale
1. **Performance**: Single HTTP request instead of 3-4 separate calls

View File

@@ -0,0 +1,128 @@
# 27. Ollama Tier 1 onboarding via meta-trainer-bootstrap
**Date:** 2026-05-05
**Status:** Proposed
**Authors:** Gabriel Radureau, AI Agent (Claude Opus 4.7 Tier 3 inspector)
## Context and Problem Statement
The autonomous trainer day on 2026-05-05 validated that Mistral Vibe (cloud) can drive a complete PR lifecycle on this project: ICM workspace → phase-planner → implementation → verifier audit → PR open (cf. PR #54, Q-041 in `~/.vibe/memory/reference/mistral-quirks.md`). Two limitations remain:
1. **Vendor risk** — every autonomous run consumes the Mistral cloud forfait. If the forfait runs out mid-month or the API is unavailable, autonomous capability is lost.
2. **Sovereignty story** — ARCODANGE's stated direction (cf. `migration-claude-vers-mistral-phase-1.md`) is to reduce dependence on a single foreign vendor. The hardware exists locally (M4 128 GB) ; the missing link is wiring a local model into the same Tier 1 executor role Mistral plays today.
The user-flagged candidate models (cf. `~/.vibe/memory/reference/ollama-candidate-models.md`) :
* `nemotron-3-super`
* `gemma4:31b`
Both are large enough to plausibly handle the agentic coding role and small enough to fit in 128 GB RAM with headroom for tools. Neither has been tested under the ARCODANGE methodology (canary suite, ICM workspace traversal, verifier-skill discipline).
The methodology to onboard a new Tier 1 already exists : the `meta-trainer-bootstrap` skill at `~/.vibe/skills/meta-trainer-bootstrap/`. It runs a 10-canary suite (C-001..C-010), copies + adapts the skill library to the new model's harness tool names, stands up a `<model>-quirks.md` baseline, and produces a Tier 3 audit report. It has been validated on Mistral itself (we are currently running the methodology Mistral-on-Mistral, which is unusual — the canary suite was originally written for a different model).
## Decision Drivers
* **Forfait insurance** — a working local Tier 1 means autonomous capability survives a Mistral outage / forfait exhaustion
* **Sovereignty** — local execution removes the single-vendor dependency for the autonomous workflow
* **Methodology validation** — `meta-trainer-bootstrap` has never been run on a fresh model in production, only smoke-tested ; this is its first real test
* **Cost** — Ollama is local-only (no per-call price). The cost is the bootstrap effort + ongoing M4 power consumption.
* **Model maturity** — both candidates are recent ; their agentic coding ability is empirical, not theoretical
## Considered Options
### Option 1: Bootstrap `nemotron-3-super` first, then `gemma4:31b`
Run the canary suite on each, document quirks separately, decide based on canary pass rate and cost-per-task.
* Good — comparative data, makes the choice empirical
* Good — discovers any meta-trainer-bootstrap bugs early on the first attempt
* Bad — doubles the bootstrap effort (~4-8 hours per model)
* Bad — requires holding both models on disk (large)
### Option 2: Bootstrap one model only, picked on prior reputation
Pick one (e.g. `nemotron-3-super` per the user's explicit ordering in `ollama-candidate-models.md`) and commit. Skip the comparison.
* Good — half the effort, ships faster
* Bad — no fallback if the chosen model is unsuitable
* Bad — anchors the methodology to one model's quirks before we know they generalise
### Option 3: Defer until Mistral autonomous shows real strain
Do nothing yet. Wait for forfait pressure or a Mistral outage to force the issue. Reactive instead of proactive.
* Good — zero effort now
* Bad — when the trigger fires, we are unprepared and the bootstrap is rushed
* Bad — postpones validation of `meta-trainer-bootstrap` indefinitely
### Option 4: Skip Ollama, evaluate a different vendor (Anthropic, OpenAI)
Bring in a second cloud model as Tier 1 instead of going local.
* Good — likely higher quality than 31B local
* Bad — replaces vendor dependence with two-vendor dependence ; doesn't solve sovereignty
* Bad — we already have Claude as Tier 3 inspector via Anthropic ; mixing roles complicates the methodology
## Decision Outcome
Chosen option: **Option 2 — Bootstrap `nemotron-3-super` first**, deferring `gemma4:31b` to a follow-up ADR if `nemotron-3-super` underperforms or shows unfixable quirks.
Rationale :
- Forfait pressure is real but not immediate (~3.5% of monthly forfait spent on the heavy autonomous trainer day 2026-05-05) — we have time but should not procrastinate
- Comparative testing (Option 1) is technically right but pragmatically slow for an unproven methodology
- The user's explicit ordering signals their prior on which to try first ; respect it
- If the canary suite fails substantially on `nemotron-3-super`, we pivot to `gemma4:31b` with the lessons (and per-model quirks file) from the first attempt — net learning either way
## Implementation Plan
1. **Pre-flight** — verify `ollama` is installed, the model is pulled (`ollama pull nemotron-3-super`), and the M4 has enough free RAM (model size + ~16 GB headroom for tools).
2. **Run `meta-trainer-bootstrap` skill** — pointing `TARGET_MODEL_ID=nemotron-3-super`, `TARGET_HARNESS=ollama run nemotron-3-super`, `TARGET_PROJECT_ROOT=<a fresh clone or worktree>`. Budget : 5 EUR-equivalent of Mistral Tier-2 orchestration cost + 2-4 hours of trainer attention.
3. **Canary suite** — run C-001..C-010 ; record each result in `~/.vibe/memory/reference/nemotron-3-super-quirks.md` as `Q-101..Q-110` (the `Q-001..Q-099` range is reserved for the legacy Mistral baseline).
4. **Skill library adaptation** — for each ARCODANGE skill currently relying on Mistral-specific tool names (`read_file`, `write_file`, etc.), adapt to whatever Ollama exposes. Document deltas.
5. **Smoke test** — run a single small task end-to-end on a low-risk project. Use the ICM workspace pattern. Verify worktree isolation (Q-038 fix) still applies.
6. **Tier 3 report** — produce `bootstrap-report.md` for Claude inspector review. Include canary pass rate, key quirks, KPI baseline numbers, open friction points.
7. **Decision gate** — based on the report, either (a) promote `nemotron-3-super` to production Tier 1 and update `~/.vibe/config.toml` accordingly, (b) try `gemma4:31b` as a follow-up, or (c) escalate to Tier 3 for a strategic pivot.
## Pros and Cons of the Options
### Option 1 (Bootstrap both)
* Good — comparative data
* Good — early bug detection on the methodology
* Bad — double effort
* Bad — no clear way to choose without significant additional time investment for the second model
### Option 2 (Chosen — `nemotron-3-super` first)
* Good — concrete forward motion
* Good — methodology gets its first real test
* Good — `meta-trainer-bootstrap` skill validated end-to-end (currently only smoke-tested)
* Bad — risk of picking the wrong model and wasting the bootstrap effort
* Mitigation: per-model quirks files mean the second attempt is cheaper (skill adaptations transfer)
### Option 3 (Defer)
* Good — zero effort
* Bad — reactive, increases risk under outage scenarios
### Option 4 (Different vendor)
* Good — likely higher quality
* Bad — does not solve sovereignty
* Bad — methodology already has Claude as Tier 3 ; another Anthropic-family model in Tier 1 conflates roles
## Consequences
* `meta-trainer-bootstrap` skill is exercised end-to-end for the first time. Discoveries during this run will likely produce Q-042+ entries in `mistral-quirks.md` and a separate `nemotron-3-super-quirks.md`.
* `~/.vibe/config.toml` may need a new model alias (e.g. `local-nemotron`) configured for testing without affecting the production `mistral-vibe-cli-latest` default.
* If successful, the next ADR (0028 or higher) will document the production switch (or split, e.g. routine tasks → local, complex tasks → cloud).
* Forfait usage from this bootstrap : Tier 2 Mistral orchestration only ; Tier 1 Ollama runs are free at the API level.
## Links
* Three-tier methodology : `~/.vibe/skills/meta-trainer-bootstrap/references/three-tier-tutor.md`
* Candidate models reference : `~/.vibe/memory/reference/ollama-candidate-models.md`
* `meta-trainer-bootstrap` skill : `~/.vibe/skills/meta-trainer-bootstrap/SKILL.md`
* Canary suite : `~/.vibe/skills/meta-trainer-bootstrap/canaries/INDEX.md`
* Q-041 (autonomy story validated on Mistral) : `~/.vibe/memory/reference/mistral-quirks.md`
* Related ADRs : [ADR-0007](0007-opentelemetry-integration.md) (cloud / sovereignty considerations historically) ; [ADR-0023](0023-config-hot-reloading.md) (hot-reload may need different patterns under Ollama)

View File

@@ -14,21 +14,23 @@ This directory contains the Architecture Decision Records (ADRs) for the dance-l
| [0006](0006-configuration-management.md) | Use Viper for configuration management | Accepted |
| [0007](0007-opentelemetry-integration.md) | Integrate OpenTelemetry for distributed tracing | Accepted |
| [0008](0008-bdd-testing.md) | Adopt BDD with Godog for behavioral testing | Accepted |
| [0009](0009-hybrid-testing-approach.md) | Combine BDD and Swagger-based testing | Partially Implemented |
| [0009](0009-hybrid-testing-approach.md) | Combine BDD and Swagger-based testing | Implemented |
| [0010](0010-api-v2-feature-flag.md) | API v2 Feature Flag Implementation | Accepted |
| [0012](0012-git-hooks-staged-only-formatting.md) | Git Hooks: Staged-Only Formatting | Accepted |
| [0013](0013-openapi-swagger-toolchain.md) | OpenAPI/Swagger Toolchain Selection | Partially Implemented |
| [0013](0013-openapi-swagger-toolchain.md) | OpenAPI/Swagger Toolchain Selection | Implemented |
| [0015](0015-cli-subcommands-cobra.md) | CLI Subcommands and Flag Management with Cobra | Implemented |
| [0016](0016-ci-cd-pipeline-design.md) | CI/CD Pipeline Design for Multi-Platform Compatibility | Accepted |
| [0017](0017-trunk-based-development-workflow.md) | Trunk-Based Development Workflow for CI/CD Safety | Approved |
| [0018](0018-user-management-auth-system.md) | User Management and Authentication System | Proposed |
| [0019](0019-postgresql-integration.md) | PostgreSQL Database Integration | Proposed |
| [0018](0018-user-management-auth-system.md) | User Management and Authentication System | Implemented |
| [0019](0019-postgresql-integration.md) | PostgreSQL Database Integration | Implemented |
| [0020](0020-docker-build-strategy.md) | Docker Build Strategy: Traditional vs Buildx | Accepted |
| [0021](0021-jwt-secret-retention-policy.md) | JWT Secret Retention Policy | Proposed |
| [0022](0022-rate-limiting-cache-strategy.md) | Rate Limiting and Cache Strategy | Proposed |
| [0023](0023-config-hot-reloading.md) | Config Hot Reloading Strategy | Proposed |
| [0024](0024-bdd-test-organization-and-isolation.md) | BDD Test Organization and Isolation Strategy | Proposed |
| [0025](0025-bdd-scenario-isolation-strategies.md) | BDD Scenario Isolation Strategies | Proposed |
| [0021](0021-jwt-secret-retention-policy.md) | JWT Secret Retention Policy | Implemented |
| [0022](0022-rate-limiting-cache-strategy.md) | Rate Limiting and Cache Strategy | Implemented (Phase 1) |
| [0023](0023-config-hot-reloading.md) | Config Hot Reloading Strategy | Implemented |
| [0024](0024-bdd-test-organization-and-isolation.md) | BDD Test Organization and Isolation Strategy | Implemented |
| [0025](0025-bdd-scenario-isolation-strategies.md) | BDD Scenario Isolation Strategies | Implemented |
| [0026](0026-composite-info-endpoint.md) | Composite Info Endpoint vs Separate Calls | Implemented |
| [0027](0027-ollama-tier1-onboarding.md) | Ollama Tier 1 onboarding via meta-trainer-bootstrap | Proposed |
> **Note** : numbers `0011` and `0014` are not currently in use. Reserved for future ADRs or representing previously deleted entries.

View File

@@ -30,7 +30,8 @@ Reference document for all HTTP endpoints exposed by `dance-lessons-coach` serve
"build_date": "2026-05-05",
"uptime_seconds": 1234,
"cache_enabled": true,
"healthz_status": "healthy"
"healthz_status": "healthy",
"go_version": "go1.26.1"
}
```
@@ -82,7 +83,27 @@ JWT secret rotation policies: cf. ADR-0021 + JWT secrets endpoints under `/api/v
### Admin under v1 (`/api/v1/admin/...`)
JWT secret management endpoints. Cf. swag annotations in handlers + features/jwt/ BDD scenarios for the exact contract.
JWT secret management endpoints.
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/api/v1/admin/jwt/secrets` | List metadata (count + per-secret: is_primary, created_at_unix, expires_at_unix?, age_seconds, is_expired, sha256 fingerprint). **Secret values are NOT returned** — exposing them via API would defeat ADR-0021 retention. |
| `POST` | `/api/v1/admin/jwt/secrets` | Add a new JWT secret (body: `{secret, is_primary, expires_in}`) |
| `POST` | `/api/v1/admin/jwt/secrets/rotate` | Rotate to a new primary secret (body: `{new_secret}`) |
`GET` response shape (security: only fingerprint, no secret value):
```json
{
"count": 2,
"secrets": [
{"is_primary": true, "created_at_unix": 1714900000, "age_seconds": 600, "is_expired": false, "secret_sha256": "a3f9c2..."},
{"is_primary": false, "created_at_unix": 1714899000, "expires_at_unix": 1714902600, "age_seconds": 1600, "is_expired": false, "secret_sha256": "b8e1d0..."}
]
}
```
Cf. ADR-0021 + features/jwt/ BDD scenarios for the broader contract.
## v2 API

View File

@@ -15,6 +15,16 @@ Feature: Greet Service
When I request a greeting for "John"
Then the response should be "{\"message\":\"Hello John!\"}"
@critical @v2-gate
Scenario: v2 endpoint returns 404 when api.v2_enabled is disabled
# In the default tag-filter run (~@v2), the test server starts with
# v2_enabled=false. The v2EnabledGate middleware (ADR-0023 Phase 4)
# returns 404 with a JSON body explaining the flag state.
Given the server is running
When I send a POST request to v2 greet with name "John"
Then the status code should be 404
And the response should contain "v2 API is currently disabled"
@v2 @api
Scenario: v2 greeting with JSON POST request
Given the server is running with v2 enabled

View File

@@ -36,3 +36,10 @@ Feature: Info Endpoint
Then the response header "X-Cache" should be "MISS"
When I request the info endpoint again
Then the response header "X-Cache" should be "HIT"
@go_version @critical
Scenario: go_version field is non-empty
Given the server is running
When I request the info endpoint
Then the status code should be 200
And the response should contain "go_version"

View File

@@ -40,6 +40,16 @@ Feature: JWT Secret Retention Policy
Then the primary secret should not be removed
And the primary secret should remain active
@critical @admin-introspection
Scenario: Admin metadata endpoint exposes structure without leaking secret values
Given a primary JWT secret exists
And I add a secondary JWT secret "test-secret-do-not-leak-please-12345"
When I request the JWT secrets metadata endpoint
Then the status code should be 200
And the metadata should contain 2 secrets
And the metadata should NOT contain the secret value "test-secret-do-not-leak-please-12345"
And every secret in the metadata should have a SHA-256 fingerprint
@todo
Scenario: Multiple secrets with different ages
Given I have 3 JWT secrets of different ages

View File

@@ -822,3 +822,61 @@ func (s *JWTRetentionSteps) andSuggestRemediationSteps() error {
// Verify remediation suggestions
return godog.ErrPending
}
// =====================================================================
// Admin metadata introspection steps (PR #51 + this scenario)
// =====================================================================
// iAddASecondaryJWTSecretNamed adds a secret with a specific value via the
// admin API. Used by the admin-introspection scenario to verify that the
// metadata endpoint returns metadata only, not the secret value.
func (s *JWTRetentionSteps) iAddASecondaryJWTSecretNamed(secretValue string) error {
s.SetLastSecret(secretValue)
return s.client.Request("POST", "/api/v1/admin/jwt/secrets", map[string]string{
"secret": secretValue,
"is_primary": "false",
})
}
// iRequestTheJWTSecretsMetadataEndpoint hits GET /api/v1/admin/jwt/secrets.
func (s *JWTRetentionSteps) iRequestTheJWTSecretsMetadataEndpoint() error {
return s.client.Request("GET", "/api/v1/admin/jwt/secrets", nil)
}
// theMetadataShouldContainNSecrets verifies the response count field.
func (s *JWTRetentionSteps) theMetadataShouldContainNSecrets(expected int) error {
body := string(s.client.GetLastBody())
expectedFragment := `"count":` + strconv.Itoa(expected)
if !strings.Contains(body, expectedFragment) {
return fmt.Errorf("expected response to contain %q, got: %s", expectedFragment, body)
}
return nil
}
// theMetadataShouldNotContainTheSecretValue is the SECURITY-CRITICAL
// assertion. If the response contains the raw secret string anywhere,
// the endpoint has leaked. This is the property the metadata-only design
// is supposed to guarantee.
func (s *JWTRetentionSteps) theMetadataShouldNotContainTheSecretValue(secretValue string) error {
body := string(s.client.GetLastBody())
if strings.Contains(body, secretValue) {
return fmt.Errorf("SECURITY: response leaked the secret value %q (response body: %s)", secretValue, body)
}
return nil
}
// everySecretInTheMetadataShouldHaveASHA256Fingerprint asserts the
// secret_sha256 field is present and non-empty for each entry. Cheap
// regex-style check on the JSON body.
func (s *JWTRetentionSteps) everySecretInTheMetadataShouldHaveASHA256Fingerprint() error {
body := string(s.client.GetLastBody())
// Expect at least one occurrence of "secret_sha256":"<non-empty>"
if !strings.Contains(body, `"secret_sha256":"`) {
return fmt.Errorf("response does not include any secret_sha256 fingerprint: %s", body)
}
// Reject obviously-empty values
if strings.Contains(body, `"secret_sha256":""`) {
return fmt.Errorf("at least one secret_sha256 fingerprint is empty in response: %s", body)
}
return nil
}

View File

@@ -173,6 +173,12 @@ func InitializeAllSteps(ctx *godog.ScenarioContext, client *testserver.Client, s
ctx.Step(`^I should receive configuration validation error$`, sc.jwtRetentionSteps.iShouldReceiveConfigurationValidationError)
ctx.Step(`^the error should mention "([^"]*)"$`, sc.jwtRetentionSteps.theErrorShouldMention)
ctx.Step(`^I have enabled Prometheus metrics$`, sc.jwtRetentionSteps.iHaveEnabledPrometheusMetrics)
// Admin metadata introspection steps (PR #51 + admin-introspection scenario)
ctx.Step(`^I add a secondary JWT secret "([^"]*)"$`, sc.jwtRetentionSteps.iAddASecondaryJWTSecretNamed)
ctx.Step(`^I request the JWT secrets metadata endpoint$`, sc.jwtRetentionSteps.iRequestTheJWTSecretsMetadataEndpoint)
ctx.Step(`^the metadata should contain (\d+) secrets$`, sc.jwtRetentionSteps.theMetadataShouldContainNSecrets)
ctx.Step(`^the metadata should NOT contain the secret value "([^"]*)"$`, sc.jwtRetentionSteps.theMetadataShouldNotContainTheSecretValue)
ctx.Step(`^every secret in the metadata should have a SHA-256 fingerprint$`, sc.jwtRetentionSteps.everySecretInTheMetadataShouldHaveASHA256Fingerprint)
ctx.Step(`^I should see "([^"]*)" metric increment$`, sc.jwtRetentionSteps.iShouldSeeMetricIncrement)
ctx.Step(`^I should see "([^"]*)" metric decrease$`, sc.jwtRetentionSteps.iShouldSeeMetricDecrease)
ctx.Step(`^I should see "([^"]*)" histogram update$`, sc.jwtRetentionSteps.iShouldSeeHistogramUpdate)

View File

@@ -741,8 +741,14 @@ func (s *Server) waitForServerReady() error {
}
}
// shouldEnableV2 determines if v2 API should be enabled for this test server
// This is the ONLY place that reads FEATURE and GODOG_TAGS env vars
// shouldEnableV2 determines if v2 API should be enabled for this test server.
// This is the ONLY place that reads FEATURE and GODOG_TAGS env vars.
//
// 2026-05-05: previous version used strings.Contains(tags, "@v2") which
// wrongly matched the negation `~@v2` as well. This made the "v1" greet
// sub-test (tags `~@v2 && ~@skip`) actually run with v2 enabled, masking
// the gate behavior we now test in feature `@v2-gate` scenario. Fixed
// here by inspecting each && clause and checking for positive inclusion.
func (s *Server) shouldEnableV2() bool {
feature := os.Getenv("FEATURE")
@@ -753,9 +759,19 @@ func (s *Server) shouldEnableV2() bool {
return false
}
// For greet feature: enable v2 if tags include @v2
// For greet feature: enable v2 if tags include `@v2` as a POSITIVE clause.
// Godog tag expression syntax: clauses separated by `&&` or `||`, negation
// via leading `~`. A positive clause matches exactly `@v2` (after trim).
tags := os.Getenv("GODOG_TAGS")
return strings.Contains(tags, "@v2")
for _, clause := range strings.FieldsFunc(tags, func(r rune) bool {
return r == '&' || r == '|' || r == ' '
}) {
clause = strings.TrimSpace(clause)
if clause == "@v2" {
return true
}
}
return false
}
// createTestConfig creates a test configuration

View File

@@ -730,10 +730,15 @@ func (c *Config) WatchAndApply(ctx context.Context) {
// Stop the watcher on context cancel — we set a flag that the
// OnConfigChange handler checks, avoiding the race with viper's
// internal state that would occur if we called OnConfigChange again.
// We deliberately do NOT log here: viper's internal watcher goroutine
// has no public Stop, so it can outlive ctx, and a zerolog call here
// would race with the next test's LoadConfig → SetupLogging →
// zerolog.SetGlobalLevel under -race (observed 2026-05-05).
//
// We deliberately do NOT log inside this goroutine: this goroutine
// outlives ctx (parent's defer cancel only fires when the test's
// outer scope exits, not when t.Cleanup runs), so a log call here
// races with the next test's LoadConfig → SetupLogging →
// zerolog.SetGlobalLevel under -race (observed 2026-05-05, Q-038).
// The flag-set is the load-bearing operation; the missing log line
// is a small ops cost (operators learn the watcher stops on shutdown
// via the parent shutdown logs, not a dedicated message).
go func() {
<-ctx.Done()
c.reloadMu.Lock()

26
pkg/config/main_test.go Normal file
View File

@@ -0,0 +1,26 @@
package config
import (
"os"
"testing"
"github.com/rs/zerolog"
)
// TestMain quiets the global zerolog level for the duration of the test
// suite. Rationale (Q-038, 2026-05-05): viper's internal watcher goroutine
// (started by viper.WatchConfig in WatchAndApply) has no public Stop and
// can outlive a test's context. Any log call from a leaked goroutine
// races with the next test's LoadConfig → SetupLogging →
// zerolog.SetGlobalLevel under `go test -race`. Disabling the logger here
// is the root-cause fix: the racing memory location is zerolog's gLevel
// global, and if no log call ever evaluates against it we sidestep the
// race entirely without changing production behavior.
//
// In production, log calls happen against an unchanging global level
// (SetupLogging runs once at startup), so the race condition does not
// occur there.
func TestMain(m *testing.M) {
zerolog.SetGlobalLevel(zerolog.Disabled)
os.Exit(m.Run())
}

View File

@@ -9,6 +9,7 @@ import (
"net"
"net/http"
"os/signal"
"runtime"
"syscall"
"time"
@@ -192,13 +193,15 @@ func (s *Server) setupRoutes() {
r.Post("/cache/flush", s.handleAdminCacheFlush)
})
// Register v2 routes if enabled
if s.config.GetV2Enabled() {
s.router.Route("/api/v2", func(r chi.Router) {
r.Use(s.getAllMiddlewares()...)
s.registerApiV2Routes(r)
})
}
// Register v2 routes ALWAYS (ADR-0023 Phase 4 hot-reload). The
// v2EnabledGate middleware checks the live config on every request
// and returns 404 when api.v2_enabled is false. This lets the flag
// be flipped via config hot-reload without a router rebuild.
s.router.Route("/api/v2", func(r chi.Router) {
r.Use(s.getAllMiddlewares()...)
r.Use(s.v2EnabledGate)
s.registerApiV2Routes(r)
})
// Add Swagger UI with embedded spec
// Serve the embedded swagger.json file
@@ -268,6 +271,25 @@ func (s *Server) registerApiV2Routes(r chi.Router) {
})
}
// v2EnabledGate is the middleware that gates the /api/v2/* subtree on the
// live api.v2_enabled config value (ADR-0023 Phase 4 hot-reload). When
// disabled, returns 404 with the same body shape as a missing route would
// emit, so clients see "v2 doesn't exist" rather than "v2 is forbidden".
//
// Flipping the config at runtime via Config.WatchAndApply takes effect on
// the next request — no router rebuild, no restart.
func (s *Server) v2EnabledGate(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if !s.config.GetV2Enabled() {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusNotFound)
_, _ = w.Write([]byte(`{"error":"not_found","message":"v2 API is currently disabled"}`))
return
}
next.ServeHTTP(w, r)
})
}
// getAllMiddlewares returns all middleware including OpenTelemetry if enabled
func (s *Server) getAllMiddlewares() []func(http.Handler) http.Handler {
middlewares := []func(http.Handler) http.Handler{
@@ -453,6 +475,7 @@ type InfoResponse struct {
UptimeSeconds int64 `json:"uptime_seconds"`
CacheEnabled bool `json:"cache_enabled"`
HealthzStatus string `json:"healthz_status"`
GoVersion string `json:"go_version"`
}
// handleHealthz godoc
@@ -500,6 +523,7 @@ func (s *Server) handleInfo(w http.ResponseWriter, r *http.Request) {
UptimeSeconds: int64(time.Since(s.startedAt).Seconds()),
CacheEnabled: s.cacheService != nil,
HealthzStatus: "healthy",
GoVersion: runtime.Version(),
}
// Cache key

View File

@@ -0,0 +1,84 @@
package server
import (
"context"
"net/http"
"net/http/httptest"
"strings"
"testing"
"dance-lessons-coach/pkg/config"
"github.com/stretchr/testify/assert"
)
// TestV2EnabledGate_BlocksWhenDisabled verifies the ADR-0023 Phase 4
// hot-reload security property: when api.v2_enabled is false, ANY request
// to /api/v2/* returns 404 with a JSON body, not a 200, not a panic.
func TestV2EnabledGate_BlocksWhenDisabled(t *testing.T) {
cfg := &config.Config{}
cfg.API.V2Enabled = false // explicit, even though it is the zero value
s := NewServer(cfg, context.Background())
req := httptest.NewRequest(http.MethodPost, "/api/v2/greet", strings.NewReader(`{"name":"world"}`))
req.Header.Set("Content-Type", "application/json")
w := httptest.NewRecorder()
s.router.ServeHTTP(w, req)
assert.Equal(t, http.StatusNotFound, w.Code, "v2 disabled should 404")
assert.Contains(t, w.Body.String(), "v2 API is currently disabled",
"response should explain why")
assert.Equal(t, "application/json", w.Header().Get("Content-Type"))
}
// TestV2EnabledGate_PassesWhenEnabled verifies the gate lets requests
// through to the actual v2 handler when api.v2_enabled is true. We use
// a v2 endpoint that exists and responds with a 2xx so we can assert
// "got past the gate, hit the handler".
func TestV2EnabledGate_PassesWhenEnabled(t *testing.T) {
cfg := &config.Config{}
cfg.API.V2Enabled = true
s := NewServer(cfg, context.Background())
req := httptest.NewRequest(http.MethodPost, "/api/v2/greet", strings.NewReader(`{"name":"world"}`))
req.Header.Set("Content-Type", "application/json")
w := httptest.NewRecorder()
s.router.ServeHTTP(w, req)
// 200 = v2 handler executed. Anything other than 404 with the gate's
// message proves the gate let the request through.
assert.NotEqual(t, http.StatusNotFound, w.Code, "v2 enabled should not return 404 from gate")
assert.NotContains(t, w.Body.String(), "v2 API is currently disabled",
"gate message must NOT appear when enabled")
}
// TestV2EnabledGate_HotReloadEffect simulates the ADR-0023 Phase 4
// scenario: the same Server (same router) sees opposite responses
// before and after a config flip — proving the gate reads the live
// config rather than a snapshot from setupRoutes.
func TestV2EnabledGate_HotReloadEffect(t *testing.T) {
cfg := &config.Config{}
cfg.API.V2Enabled = false
s := NewServer(cfg, context.Background())
// Round 1: disabled
req1 := httptest.NewRequest(http.MethodPost, "/api/v2/greet", strings.NewReader(`{"name":"a"}`))
req1.Header.Set("Content-Type", "application/json")
w1 := httptest.NewRecorder()
s.router.ServeHTTP(w1, req1)
assert.Equal(t, http.StatusNotFound, w1.Code, "round 1 (disabled) should 404")
// Flip the config. In production, Config.WatchAndApply does this on
// file change; here we set the field directly to simulate the result.
cfg.API.V2Enabled = true
// Round 2: enabled — same Server, same router, just the config flipped
req2 := httptest.NewRequest(http.MethodPost, "/api/v2/greet", strings.NewReader(`{"name":"b"}`))
req2.Header.Set("Content-Type", "application/json")
w2 := httptest.NewRecorder()
s.router.ServeHTTP(w2, req2)
assert.NotEqual(t, http.StatusNotFound, w2.Code, "round 2 (enabled) should NOT 404")
assert.NotContains(t, w2.Body.String(), "v2 API is currently disabled")
}

View File

@@ -25,11 +25,32 @@ func NewAdminHandler(authService user.AuthService) *AdminHandler {
// RegisterRoutes registers admin routes
func (h *AdminHandler) RegisterRoutes(router chi.Router) {
router.Route("/jwt", func(r chi.Router) {
r.Get("/secrets", h.handleListJWTSecrets)
r.Post("/secrets", h.handleAddJWTSecret)
r.Post("/secrets/rotate", h.handleRotateJWTSecret)
})
}
// handleListJWTSecrets godoc
//
// @Summary List JWT secrets metadata
// @Description Returns metadata for every tracked JWT secret. The actual secret values are NOT included — exposing them via an admin endpoint would defeat the retention/rotation infrastructure. Each entry has is_primary, created_at_unix, expires_at_unix (optional), age_seconds, is_expired, and a SHA-256 fingerprint (first 16 hex chars) for ops correlation.
// @Tags API/v1/Admin
// @Produce json
// @Success 200 {object} map[string]interface{} "List of secret metadata"
// @Failure 401 {object} map[string]string "Unauthorized"
// @Router /v1/admin/jwt/secrets [get]
func (h *AdminHandler) handleListJWTSecrets(w http.ResponseWriter, r *http.Request) {
infos := h.authService.ListJWTSecretsInfo()
resp := map[string]interface{}{
"count": len(infos),
"secrets": infos,
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
_ = json.NewEncoder(w).Encode(resp)
}
// AddJWTSecretRequest represents a request to add a new JWT secret
type AddJWTSecretRequest struct {
Secret string `json:"secret" validate:"required,min=16"`

View File

@@ -2,6 +2,8 @@ package user
import (
"context"
"crypto/sha256"
"encoding/hex"
"errors"
"fmt"
"time"
@@ -247,6 +249,31 @@ func (s *userServiceImpl) RemoveExpiredJWTSecrets() int {
return s.secretManager.RemoveExpiredSecrets()
}
// ListJWTSecretsInfo returns metadata about every currently-tracked JWT
// secret WITHOUT exposing the secret values. Used by the admin
// introspection endpoint and BDD tests verifying cleanup behavior.
func (s *userServiceImpl) ListJWTSecretsInfo() []JWTSecretInfo {
all := s.secretManager.GetAllValidSecrets()
now := time.Now()
out := make([]JWTSecretInfo, 0, len(all))
for _, sec := range all {
hash := sha256.Sum256([]byte(sec.Secret))
info := JWTSecretInfo{
IsPrimary: sec.IsPrimary,
CreatedAtUnix: sec.CreatedAt.Unix(),
AgeSeconds: int64(now.Sub(sec.CreatedAt).Seconds()),
SecretSHA256: hex.EncodeToString(hash[:8]), // 16 hex chars = 8 bytes — fingerprint
}
if sec.ExpiresAt != nil {
exp := sec.ExpiresAt.Unix()
info.ExpiresAtUnix = &exp
info.IsExpired = !sec.ExpiresAt.After(now)
}
out = append(out, info)
}
return out
}
// UserExists checks if a user exists by username
func (s *userServiceImpl) UserExists(ctx context.Context, username string) (bool, error) {
return s.repo.UserExists(ctx, username)

View File

@@ -155,3 +155,79 @@ func TestStartCleanupLoop_FiresAndStops(t *testing.T) {
assert.Len(t, secrets, 1, "expired secret should have been removed by the loop")
assert.Equal(t, "primary", secrets[0].Secret)
}
// TestListJWTSecretsInfo confirms metadata is exposed without secret values
// (security: the fingerprint is a SHA-256 prefix, not the secret itself).
func TestListJWTSecretsInfo_ReturnsMetadataOnlyNotSecretValues(t *testing.T) {
manager := NewJWTSecretManager("primary-secret-do-not-leak")
manager.AddSecret("expiring", false, 1*time.Hour)
manager.AddSecret("about-to-expire", false, 1*time.Nanosecond)
// Force the 1ns to actually expire
time.Sleep(5 * time.Millisecond)
// Build the same view ListJWTSecretsInfo would produce, exercising the
// path the AuthService implementation will take in production.
all := manager.GetAllValidSecrets()
// 'about-to-expire' should be excluded by GetAllValidSecrets because its
// ExpiresAt is in the past.
assert.Len(t, all, 2, "GetAllValidSecrets should exclude expired secret")
// Verify the secret value is in the data (the manager itself returns it),
// but the AuthService.ListJWTSecretsInfo deliberately strips it. The
// safety guarantee is enforced at the AuthService level, not here.
foundPrimary := false
for _, s := range all {
if s.IsPrimary {
foundPrimary = true
assert.Equal(t, "primary-secret-do-not-leak", s.Secret)
}
}
assert.True(t, foundPrimary)
}
// TestListJWTSecretsInfo_SecretSHA256NonEmptyAndDifferentFromSecret verifies
// the security property that SecretSHA256 fingerprint is non-empty and
// DIFFERENT from the actual secret value for every returned JWTSecretInfo.
func TestListJWTSecretsInfo_SecretSHA256NonEmptyAndDifferentFromSecret(t *testing.T) {
// Create a mock repository (nil operations are fine for this test)
var nilRepo UserRepository
jwtConfig := JWTConfig{
Secret: "test-secret-value-12345",
ExpirationTime: 1 * time.Hour,
Issuer: "test-issuer",
}
svc := NewUserService(nilRepo, jwtConfig, "admin-password")
// Call ListJWTSecretsInfo to get metadata
infos := svc.ListJWTSecretsInfo()
// Must have at least one entry (the initial secret from jwtConfig)
assert.GreaterOrEqual(t, len(infos), 1, "ListJWTSecretsInfo should return at least one secret")
// Known secret for verification
knownSecret := "test-secret-value-12345"
// Verify each JWTSecretInfo has valid SecretSHA256
for _, info := range infos {
// 1. SecretSHA256 must be non-empty
assert.NotEmpty(t, info.SecretSHA256, "SecretSHA256 must be non-empty")
// 2. SecretSHA256 must be different from the actual secret value
// Note: We verify against the known secret value used in the service.
// The service's ListJWTSecretsInfo computes SHA-256 of the secret,
// takes first 8 bytes, and hex-encodes them. This will NEVER equal
// the original secret string.
assert.NotEqual(t, knownSecret, info.SecretSHA256, "SecretSHA256 must differ from secret value")
// 3. SecretSHA256 must be exactly 16 hex characters (8 bytes = 16 hex chars)
assert.Len(t, info.SecretSHA256, 16, "SecretSHA256 must be 16 hex characters")
// 4. SecretSHA256 must be valid hex (lowercase)
for _, c := range info.SecretSHA256 {
assert.True(t, (c >= '0' && c <= '9') || (c >= 'a' && c <= 'f'),
"SecretSHA256 must be valid lowercase hex: %q", c)
}
}
}

View File

@@ -52,6 +52,24 @@ type AuthService interface {
// the count of removed non-primary expired secrets. Useful for tests
// driving cleanup synchronously.
RemoveExpiredJWTSecrets() int
// ListJWTSecretsInfo returns metadata about every currently-tracked JWT
// secret WITHOUT exposing the secret values. Used by the admin
// introspection endpoint and BDD tests verifying cleanup behavior.
// Order is preserved from internal storage (insertion order).
ListJWTSecretsInfo() []JWTSecretInfo
}
// JWTSecretInfo is a non-sensitive metadata view of a JWT secret.
// The secret VALUE is intentionally NOT included — exposing it via an
// API endpoint, even an admin one, would defeat the point of the
// retention/rotation infrastructure.
type JWTSecretInfo struct {
IsPrimary bool `json:"is_primary"`
CreatedAtUnix int64 `json:"created_at_unix"`
ExpiresAtUnix *int64 `json:"expires_at_unix,omitempty"`
AgeSeconds int64 `json:"age_seconds"`
IsExpired bool `json:"is_expired"`
SecretSHA256 string `json:"secret_sha256"` // first 16 hex chars of sha256 — fingerprint, not the secret
}
// UserManager defines interface for user management operations