# Integrate OpenTelemetry for distributed tracing * Status: Accepted * Deciders: Gabriel Radureau, AI Agent * Date: 2026-04-04 ## Context and Problem Statement We needed to add observability to DanceLessonsCoach that provides: - Distributed tracing capabilities - Performance monitoring - Request flow visualization - Integration with existing monitoring systems - Minimal impact on application performance ## Decision Drivers * Need for distributed tracing in microservices architecture * Desire for performance monitoring * Requirement for request flow visualization * Need for integration with monitoring tools * Desire for minimal performance impact ## Considered Options * OpenTelemetry - CNCF standard for observability * Jaeger client - Direct Jaeger integration * Zipkin - Alternative tracing system * Custom solution - Build our own tracing ## Decision Outcome Chosen option: "OpenTelemetry" because it provides industry-standard observability, good performance, flexibility for multiple backends, and is becoming the standard for distributed tracing. ## Pros and Cons of the Options ### OpenTelemetry * Good, because CNCF standard with broad industry adoption * Good, because supports multiple tracing backends (Jaeger, Zipkin, etc.) * Good, because good performance characteristics * Good, because active development and community * Good, because vendor-neutral * Bad, because more complex setup * Bad, because larger dependency footprint ### Jaeger client * Good, because direct integration with Jaeger * Good, because simpler setup * Bad, because vendor-locked to Jaeger * Bad, because less flexible for future changes ### Zipkin * Good, because established tracing system * Good, because good ecosystem * Bad, because less feature-rich than OpenTelemetry * Bad, because declining popularity ### Custom solution * Good, because tailored to our needs * Good, because no external dependencies * Bad, because time-consuming to develop * Bad, because need to maintain ourselves * Bad, because likely less feature-rich ## Implementation Approach ### Middleware-only approach We chose a middleware-only approach using `otelhttp.NewHandler` rather than manual instrumentation: ```go // In pkg/server/server.go func (s *Server) getAllMiddlewares() []func(http.Handler) http.Handler { middlewares := []func(http.Handler) http.Handler{ middleware.StripSlashes, middleware.Recoverer, } if s.withOTEL { middlewares = append(middlewares, func(next http.Handler) http.Handler { return otelhttp.NewHandler(next, "") }) } return middlewares } ``` ### Benefits of middleware approach * **Clean separation**: Tracing logic separate from business logic * **Consistent instrumentation**: All endpoints automatically traced * **Easy to enable/disable**: Single configuration flag * **Maintainable**: No tracing boilerplate in service code * **Upgradable**: Easy to change tracing implementation ## Configuration ```yaml # config.yaml telemetry: enabled: true otlp_endpoint: "localhost:4317" service_name: "DanceLessonsCoach" insecure: true sampler: type: "parentbased_always_on" ratio: 1.0 ``` ## Jaeger Integration ```bash # Start Jaeger with OTLP support docker run -d --name jaeger \ -e COLLECTOR_OTLP_ENABLED=true \ -p 16686:16686 \ -p 4317:4317 \ jaegertracing/all-in-one:latest # Start server with OpenTelemetry DLC_TELEMETRY_ENABLED=true ./scripts/start-server.sh start # View traces at http://localhost:16686 ``` ## Links * [OpenTelemetry GitHub](https://github.com/open-telemetry/opentelemetry-go) * [OpenTelemetry Documentation](https://opentelemetry.io/docs/instrumentation/go/) * [Jaeger Documentation](https://www.jaegertracing.io/docs/) * [OTLP Specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md) ## Sampler Types Supported * `always_on` - Sample all traces * `always_off` - Sample no traces * `traceidratio` - Sample based on trace ID ratio * `parentbased_always_on` - Sample based on parent span (always on) * `parentbased_always_off` - Sample based on parent span (always off) * `parentbased_traceidratio` - Sample based on parent span with ratio ## Performance Considerations * OpenTelemetry adds minimal overhead when disabled * Sampling can be used to reduce overhead in production * Tracing data is sent asynchronously to minimize impact * Context propagation is efficient using Go's context package