Files
dance-lessons-coach/adr/0007-opentelemetry-integration.md
Gabriel Radureau a24b4fdb3b 📝 docs(adr): homogenize 23 ADRs + rewrite README (Tâche 7 migration) (#18)
## Summary

Homogenize all 23 ADRs to a single canonical header format, and rewrite `adr/README.md` to match the actual state of the corpus.

This is **Tâche 7** of the ARCODANGE Phase 1 migration (Claude Code → Mistral Vibe). Independent from PR #17 (Tâche 6 — restructure AGENTS.md) — both can merge in any order. No code changes; only documentation.

## Changes

### 1. Homogenize 21 ADR headers (commit `db09d0a`)

The audit (Tâche 6 Phase A, Mistral intent-router agent, 2026-05-02) had identified **3 inconsistent header formats** :

- **F1** — list bullets (`* Status:` / `* Date:` / `* Deciders:`) : 11 ADRs (0001-0008, 0011, 0014, 0023)
- **F2** — bold fields (`**Status:**` / `**Date:**` / `**Authors:**`) : 9 ADRs (0009, 0010, 0012, 0013, 0015, 0016, 0017, 0018, 0019)
- **F3** — dedicated section (`## Status\n**Value** `) : 5 ADRs (0020, 0021, 0022, 0024, 0025)

Plus mixed metadata names (Authors / Deciders / Decision Date / Implementation Date / Implementation Status / Last Updated) and decorative emojis on status values made the corpus hard to scan or template against.

**Canonical format adopted** (see `adr/README.md` for full template) :

```markdown
# NN. Title

**Status:** <Proposed | Accepted | Implemented | Partially Implemented | Approved | Rejected | Deferred | Deprecated | Superseded by ADR-NNNN>
**Date:** YYYY-MM-DD
**Authors:** Name(s)

[optional **Field:** ... lines]

## Context...
```

**Transformations applied** (via `/tmp/homogenize-adrs.py` script, 23 files scanned, 21 modified — 0010 and 0012 were already conform) :

- F1 list bullets → bold fields
- F2 cleanup : `**Deciders:**` → `**Authors:**`, strip status emojis
- F3 sections : `## Status\n**Value** ` → `**Status:** Value` (single line)
- Strip decorative emojis from `**Status:**` and `**Implementation Status:**`
- Convert `* Last Updated:` / `* Implementation Status:` / `* Decision Drivers:` / `* Decision Date:` to bold
- Date typo fix : `2024-04-XX` → `2026-04-XX` for ADRs 0018, 0019 (off-by-2-years in original)
- Normalize multiple blank lines after header (max 1)

**ADR body content is preserved unchanged.** Only headers transformed.

### 2. Rewrite `adr/README.md` (commit `d64ab02`)

Previous README had multiple inconsistencies :

- Index table listed wrong titles for ADRs 0010-0021 (looked like an aspirational forecast that never matched reality — e.g. "0011 = Trunk-Based Development" but real 0011 is absent and Trunk-Based Development is actually 0017)
- Listed entries for ADRs 0011 (validation library) and 0014 (gRPC) but **these files do not exist** in the repo
- 0024 (BDD Test Organization) was missing from the detail list
- Template still showed the obsolete F1 format (`* Status:`)
- Decorative emojis on every status entry

Rewrite :

- Index table **regenerated from actual file contents** (title from H1, status from `**Status:**` line) — emoji-free, accurate
- Notes that 0011 / 0014 are not currently in use (reserved)
- Updated template block matches the canonical format
- Status Legend extended with `Approved`, `Partially Implemented`, `Deferred`
- Added note that 0026 is the next free number for new ADRs

## Test plan

- [x] All 23 ADRs follow `**Status:**` / `**Date:**` / `**Authors:**` (verified via grep)
- [x] No more occurrences of `* Status:` (F1) or `## Status` (F3) in any ADR header
- [x] No more emojis on `**Status:**` lines
- [x] `adr/README.md` index links resolve to existing files (no more 0011 / 0014 dead links)
- [x] Pre-commit hooks pass (`go mod tidy`, `go fmt`, `swag fmt`)

## Migration context

Part of Phase 1 of the ARCODANGE migration from Claude Code to Mistral Vibe. Tâche 7 of the curriculum.

Independent from PR #17 (which restructures `AGENTS.md`). The two PRs touch disjoint files — no merge conflict expected when both are merged.

🤖 Generated with [Claude Code](https://claude.com/claude-code) (Opus 4.7, 1M context). Mistral Vibe (intent-router agent / mistral-medium-3.5) did the original audit identifying the 3 formats during Tâche 6 Phase A.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Mistral Vibe (devstral-2 / mistral-medium-3.5)
Reviewed-on: #18
Co-authored-by: Gabriel Radureau <arcodange@gmail.com>
Co-committed-by: Gabriel Radureau <arcodange@gmail.com>
2026-05-03 11:01:13 +02:00

4.4 KiB

Integrate OpenTelemetry for distributed tracing

Status: Accepted Authors: Gabriel Radureau, AI Agent Date: 2026-04-04

Context and Problem Statement

We needed to add observability to dance-lessons-coach that provides:

  • Distributed tracing capabilities
  • Performance monitoring
  • Request flow visualization
  • Integration with existing monitoring systems
  • Minimal impact on application performance

Decision Drivers

  • Need for distributed tracing in microservices architecture
  • Desire for performance monitoring
  • Requirement for request flow visualization
  • Need for integration with monitoring tools
  • Desire for minimal performance impact

Considered Options

  • OpenTelemetry - CNCF standard for observability
  • Jaeger client - Direct Jaeger integration
  • Zipkin - Alternative tracing system
  • Custom solution - Build our own tracing

Decision Outcome

Chosen option: "OpenTelemetry" because it provides industry-standard observability, good performance, flexibility for multiple backends, and is becoming the standard for distributed tracing.

Pros and Cons of the Options

OpenTelemetry

  • Good, because CNCF standard with broad industry adoption
  • Good, because supports multiple tracing backends (Jaeger, Zipkin, etc.)
  • Good, because good performance characteristics
  • Good, because active development and community
  • Good, because vendor-neutral
  • Bad, because more complex setup
  • Bad, because larger dependency footprint

Jaeger client

  • Good, because direct integration with Jaeger
  • Good, because simpler setup
  • Bad, because vendor-locked to Jaeger
  • Bad, because less flexible for future changes

Zipkin

  • Good, because established tracing system
  • Good, because good ecosystem
  • Bad, because less feature-rich than OpenTelemetry
  • Bad, because declining popularity

Custom solution

  • Good, because tailored to our needs
  • Good, because no external dependencies
  • Bad, because time-consuming to develop
  • Bad, because need to maintain ourselves
  • Bad, because likely less feature-rich

Implementation Approach

Middleware-only approach

We chose a middleware-only approach using otelhttp.NewHandler rather than manual instrumentation:

// In pkg/server/server.go
func (s *Server) getAllMiddlewares() []func(http.Handler) http.Handler {
    middlewares := []func(http.Handler) http.Handler{
        middleware.StripSlashes,
        middleware.Recoverer,
    }

    if s.withOTEL {
        middlewares = append(middlewares, func(next http.Handler) http.Handler {
            return otelhttp.NewHandler(next, "")
        })
    }

    return middlewares
}

Benefits of middleware approach

  • Clean separation: Tracing logic separate from business logic
  • Consistent instrumentation: All endpoints automatically traced
  • Easy to enable/disable: Single configuration flag
  • Maintainable: No tracing boilerplate in service code
  • Upgradable: Easy to change tracing implementation

Configuration

# config.yaml
telemetry:
  enabled: true
  otlp_endpoint: "localhost:4317"
  service_name: "dance-lessons-coach"
  insecure: true
  sampler:
    type: "parentbased_always_on"
    ratio: 1.0

Jaeger Integration

# Start Jaeger with OTLP support
docker run -d --name jaeger \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 16686:16686 \
  -p 4317:4317 \
  jaegertracing/all-in-one:latest

# Start server with OpenTelemetry
DLC_TELEMETRY_ENABLED=true ./scripts/start-server.sh start

# View traces at http://localhost:16686

Sampler Types Supported

  • always_on - Sample all traces
  • always_off - Sample no traces
  • traceidratio - Sample based on trace ID ratio
  • parentbased_always_on - Sample based on parent span (always on)
  • parentbased_always_off - Sample based on parent span (always off)
  • parentbased_traceidratio - Sample based on parent span with ratio

Performance Considerations

  • OpenTelemetry adds minimal overhead when disabled
  • Sampling can be used to reduce overhead in production
  • Tracing data is sent asynchronously to minimize impact
  • Context propagation is efficient using Go's context package