Audit 2026-05-02 (Tâche 6 Phase A) had identified 3 inconsistent formats across the ADR corpus : - F1 list bullets : `* Status:` / `* Date:` / `* Deciders:` (11 ADRs) - F2 bold fields : `**Status:**` / `**Date:**` / `**Authors:**` (9 ADRs) - F3 dedicated section : `## Status\n**Value** ✅` (5 ADRs) Mixed metadata names (Authors / Deciders / Decision Date / Implementation Date / Implementation Status / Last Updated) and decorative emojis on status values made the corpus hard to scan or template against. Canonical format adopted (see adr/README.md for full template) : # NN. Title **Status:** <Proposed|Accepted|Implemented|Partially Implemented| Approved|Rejected|Deferred|Deprecated|Superseded by ADR-NNNN> **Date:** YYYY-MM-DD **Authors:** Name(s) [optional **Field:** ... lines] ## Context... Transformations applied (via /tmp/homogenize-adrs.py) : - F1 list bullets → bold fields - F2 cleanup : `**Deciders:**` → `**Authors:**`, strip status emojis - F3 sections : `## Status\n**Value** ✅` → `**Status:** Value` - Strip decorative emojis from `**Status:**` and `**Implementation Status:**` - Convert any `* Implementation Status:` / `* Last Updated:` / `* Decision Drivers:` / `* Decision Date:` to bold equivalents - Date typo fix : `2024-04-XX` → `2026-04-XX` for ADRs 0018, 0019 (already noted in PR #17 but here re-applied since branch starts from origin/main pre-PR17) - Normalize multiple blank lines after header (max 1) 21 / 23 ADRs modified. 0010 and 0012 were already conform. 0011 and 0014 do not exist in the repo (cf. README index update). Body content of each ADR is preserved unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.4 KiB
4.4 KiB
Integrate OpenTelemetry for distributed tracing
Status: Accepted Authors: Gabriel Radureau, AI Agent Date: 2026-04-04
Context and Problem Statement
We needed to add observability to dance-lessons-coach that provides:
- Distributed tracing capabilities
- Performance monitoring
- Request flow visualization
- Integration with existing monitoring systems
- Minimal impact on application performance
Decision Drivers
- Need for distributed tracing in microservices architecture
- Desire for performance monitoring
- Requirement for request flow visualization
- Need for integration with monitoring tools
- Desire for minimal performance impact
Considered Options
- OpenTelemetry - CNCF standard for observability
- Jaeger client - Direct Jaeger integration
- Zipkin - Alternative tracing system
- Custom solution - Build our own tracing
Decision Outcome
Chosen option: "OpenTelemetry" because it provides industry-standard observability, good performance, flexibility for multiple backends, and is becoming the standard for distributed tracing.
Pros and Cons of the Options
OpenTelemetry
- Good, because CNCF standard with broad industry adoption
- Good, because supports multiple tracing backends (Jaeger, Zipkin, etc.)
- Good, because good performance characteristics
- Good, because active development and community
- Good, because vendor-neutral
- Bad, because more complex setup
- Bad, because larger dependency footprint
Jaeger client
- Good, because direct integration with Jaeger
- Good, because simpler setup
- Bad, because vendor-locked to Jaeger
- Bad, because less flexible for future changes
Zipkin
- Good, because established tracing system
- Good, because good ecosystem
- Bad, because less feature-rich than OpenTelemetry
- Bad, because declining popularity
Custom solution
- Good, because tailored to our needs
- Good, because no external dependencies
- Bad, because time-consuming to develop
- Bad, because need to maintain ourselves
- Bad, because likely less feature-rich
Implementation Approach
Middleware-only approach
We chose a middleware-only approach using otelhttp.NewHandler rather than manual instrumentation:
// In pkg/server/server.go
func (s *Server) getAllMiddlewares() []func(http.Handler) http.Handler {
middlewares := []func(http.Handler) http.Handler{
middleware.StripSlashes,
middleware.Recoverer,
}
if s.withOTEL {
middlewares = append(middlewares, func(next http.Handler) http.Handler {
return otelhttp.NewHandler(next, "")
})
}
return middlewares
}
Benefits of middleware approach
- Clean separation: Tracing logic separate from business logic
- Consistent instrumentation: All endpoints automatically traced
- Easy to enable/disable: Single configuration flag
- Maintainable: No tracing boilerplate in service code
- Upgradable: Easy to change tracing implementation
Configuration
# config.yaml
telemetry:
enabled: true
otlp_endpoint: "localhost:4317"
service_name: "dance-lessons-coach"
insecure: true
sampler:
type: "parentbased_always_on"
ratio: 1.0
Jaeger Integration
# Start Jaeger with OTLP support
docker run -d --name jaeger \
-e COLLECTOR_OTLP_ENABLED=true \
-p 16686:16686 \
-p 4317:4317 \
jaegertracing/all-in-one:latest
# Start server with OpenTelemetry
DLC_TELEMETRY_ENABLED=true ./scripts/start-server.sh start
# View traces at http://localhost:16686
Links
Sampler Types Supported
always_on- Sample all tracesalways_off- Sample no tracestraceidratio- Sample based on trace ID ratioparentbased_always_on- Sample based on parent span (always on)parentbased_always_off- Sample based on parent span (always off)parentbased_traceidratio- Sample based on parent span with ratio
Performance Considerations
- OpenTelemetry adds minimal overhead when disabled
- Sampling can be used to reduce overhead in production
- Tracing data is sent asynchronously to minimize impact
- Context propagation is efficient using Go's context package