Files
dance-lessons-coach/adr/0020-docker-build-strategy.md
Gabriel Radureau 69e7c44eb2 📝 docs: add comprehensive user management ADR and technical documentation
Added ADR-0018 for User Management and Authentication System with:
- Non-persisted admin user with master password authentication
- JWT-based authentication with bcrypt password hashing
- PostgreSQL database schema and GORM integration
- Admin-assisted password reset workflow
- Comprehensive security considerations

Added ADR-0019 for BDD Feature Structure:
- Epic/User Story organization pattern
- Unified development workflow
- Source of truth hierarchy

Added ADR-0020 for Docker Build Strategy:
- Multi-stage build approach
- Cache optimization strategy
- Production vs development build differences

Added technical documentation:
- Complete user management system specification
- API endpoints and integration details
- Security architecture and best practices

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-04-09 00:25:35 +02:00

17 KiB

ADR 0020: Docker Build Strategy - Traditional vs Buildx

Status

Accepted

Context

The dance-lessons-coach CI/CD pipeline initially used Docker Buildx (docker buildx build --push) for building and pushing Docker cache images. However, this approach encountered several issues:

Issues with Buildx Approach

  1. TLS Certificate Problems: Buildx had difficulty with self-signed certificates, requiring complex workaround steps
  2. Performance Concerns: Buildx setup and execution was significantly slower than expected
  3. Complexity: Buildx introduced additional complexity without providing immediate benefits
  4. Reliability Issues: Buildx builds were less reliable in the GitHub Actions environment

Working Solution Analysis

The working webapp CI/CD pipeline uses traditional docker build + docker push approach:

# Working approach from webapp
- name: Build and push image to Gitea Container Registry
  run: |-
    docker build -t app .
    docker tag app gitea.arcodange.lab/${{ github.repository }}:$TAG
    docker push gitea.arcodange.lab/${{ github.repository }}:$TAG

This approach is simpler, more reliable, and works consistently with self-signed certificates.

Decision

Replace Docker Buildx with traditional docker build + push for the CI/CD pipeline and implement a two-stage Docker build strategy.

Implementation

1. Build Cache Strategy

# Build cache using traditional docker build
- name: Build and push Docker cache image
  if: steps.check_cache.outputs.cache_hit == 'false'
  run: |
    IMAGE_NAME="${{ env.CI_REGISTRY }}/${{ env.GITEA_ORG }}/${{ env.GITEA_REPO }}-build-cache:${{ steps.calculate_hash.outputs.deps_hash }}"
    echo "Building cache image: $IMAGE_NAME"
    
    # Build the image using traditional docker build
    docker build \
      --file Dockerfile.build \
      --tag "$IMAGE_NAME" \
      .
    
    # Push the image
    docker push "$IMAGE_NAME"
    
    echo "✅ Build cache image pushed successfully"

2. Production Build Strategy

# Production build using Dockerfile.prod
- name: Build and push Docker image
  if: github.ref == 'refs/heads/main'
  run: |
    source VERSION
    IMAGE_VERSION="$MAJOR.$MINOR.$PATCH${PRERELEASE:+-$PRERELEASE}"
    
    TAGS="$IMAGE_VERSION latest ${{ github.sha }}"
    echo "Building Docker image with tags: $TAGS"
    
    # Use the production Dockerfile that leverages the build cache
    docker build -t dance-lessons-coach -f Dockerfile.prod .
    
    for TAG in $TAGS; do
      IMAGE_NAME="${{ env.CI_REGISTRY }}/${{ env.GITEA_ORG }}/${{ env.GITEA_REPO }}:$TAG"
      echo "Tagging and pushing: $IMAGE_NAME"
      docker tag dance-lessons-coach "$IMAGE_NAME"
      docker push "$IMAGE_NAME"
    done

3. Dockerfile Structure

Dockerfile.build - Build environment with all dependencies:

FROM golang:1.26.1-alpine AS builder

# Install build dependencies
RUN apk add --no-cache git bash curl make gcc musl-dev bc grep sed jq ca-certificates

# Install Go tools
RUN go install github.com/swaggo/swag/cmd/swag@latest

# Copy and verify dependencies
COPY go.mod go.sum ./
RUN go mod download && go mod verify

WORKDIR /workspace

Dockerfile.prod - Minimal production image:

# Use the build cache image as base
FROM gitea.arcodange.lab/arcodange/dance-lessons-coach-build-cache:latest AS builder

# Final minimal image
FROM alpine:3.18

WORKDIR /app

# Install minimal dependencies
RUN apk add --no-cache ca-certificates tzdata

# Copy binary from builder
COPY --from=builder /workspace/dance-lessons-coach /app/dance-lessons-coach

# Copy configuration
COPY config.yaml /app/config.yaml

# Set permissions and entrypoint
RUN chmod +x /app/dance-lessons-coach
ENV TZ=UTC
EXPOSE 8080
ENTRYPOINT ["/app/dance-lessons-coach"]

docker/Dockerfile - Development Dockerfile (kept for local development):

# Multi-stage build for development
FROM golang:1.26.1-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . ./
RUN go build -o /dance-lessons-coach ./cmd/server

FROM alpine:3.18
WORKDIR /app
RUN apk add --no-cache ca-certificates tzdata
COPY --from=builder /dance-lessons-coach /app/dance-lessons-coach
COPY config.yaml /app/config.yaml
RUN chmod +x /app/dance-lessons-coach
ENV TZ=UTC
EXPOSE 8080
ENTRYPOINT ["/app/dance-lessons-coach"]

File Organization

All Dockerfiles are now organized in the docker/ directory:

  • docker/Dockerfile - Development Dockerfile
  • docker/Dockerfile.build - Build cache Dockerfile
  • docker/Dockerfile.prod - Production Dockerfile (development only, uses latest)
  • docker/Dockerfile.prod.template - Template for reference

This organization keeps the root directory clean and makes it clear which files are for development vs production.

Benefits

CI/CD Pipeline Benefits

  1. Simplicity: Traditional approach is easier to understand and debug
  2. Reliability: Consistent behavior across different environments
  3. Certificate Handling: Works seamlessly with self-signed certificates
  4. Performance: Faster execution without Buildx overhead
  5. Compatibility: Better compatibility with GitHub Actions environment

Two-Stage Build Benefits

  1. Separation of Concerns: Clear separation between build environment and production runtime
  2. Optimized Production Image: Minimal Alpine-based image with only necessary dependencies
  3. Reusable Build Cache: Build environment can be reused across multiple CI runs
  4. Faster CI Execution: Pre-built build cache reduces CI execution time
  5. Consistent Builds: All builds use the same build environment

Development vs Production Clarity

  1. Development Dockerfile: Full build environment for local development
  2. Production Dockerfile: Minimal runtime environment for deployment
  3. Build Cache Dockerfile: Optimized build environment for CI/CD
  4. Clear Documentation: Each Dockerfile has a specific purpose

Trade-offs

What We Lose

  1. Multi-platform builds: Cannot build for multiple architectures simultaneously
  2. BuildKit caching: Less sophisticated caching mechanism
  3. Advanced features: No secret mounting, SSH agents, etc.
  4. Parallel processing: Slower builds without Buildx optimizations

What We Gain

  1. Stability: More reliable CI/CD pipeline
  2. Simplicity: Easier to maintain and troubleshoot
  3. Consistency: Matches proven patterns from working projects
  4. Faster feedback: Quicker build times in practice
  5. Clear Separation: Better distinction between development and production builds
  6. Optimized Production: Smaller, more secure production images

Rationale

  1. Current Needs: We don't need multi-platform builds or advanced BuildKit features
  2. Simple Dockerfile: Our Dockerfile.build doesn't require Buildx-specific features
  3. Proven Pattern: Traditional approach works reliably in production (webapp project)
  4. CI Stability: Reliability is more important than advanced features for CI/CD
  5. Build Strategy: Two-stage build provides better separation of concerns
  6. Maintenance: Simpler approach is easier to maintain and debug

Critical Bug Fix: Dependency Hash Usage

Issue Identified

The initial implementation had a critical bug where Dockerfile.prod used latest tag instead of the specific dependency hash:

# ❌ WRONG - this would never work
FROM gitea.arcodange.lab/arcodange/dance-lessons-coach-build-cache:latest AS builder

This approach would never work because:

  1. The build cache images are tagged with specific dependency hashes
  2. No image is ever tagged as latest
  3. The CI/CD workflow would fail to find the cache image

Solution Implemented

  1. Dynamic Dockerfile Generation: The CI/CD workflow now generates Dockerfile.prod dynamically with the correct dependency hash
  2. Dependency Hash Calculation: Added scripts/calculate-deps-hash.sh for consistent hash calculation
  3. Template Approach: Created Dockerfile.prod.template for reference

CI/CD Workflow Fix

# ✅ CORRECT - generate Dockerfile.prod with proper hash
- name: Build and push Docker image
  if: github.ref == 'refs/heads/main'
  run: |
    # Generate Dockerfile.prod with correct dependency hash
    DEPS_HASH="${{ needs.build-cache.outputs.deps_hash }}"
    
    # Create Dockerfile.prod with the correct cache image tag
    cat > Dockerfile.prod << EOF
    FROM gitea.arcodange.lab/arcodange/dance-lessons-coach-build-cache:$DEPS_HASH AS builder
    # ... rest of Dockerfile
    EOF
    
    # Build using the generated Dockerfile
    docker build -t dance-lessons-coach -f Dockerfile.prod .

CI/CD Pipeline Optimization

Changes Made

  1. Removed Buildx Setup: Eliminated docker/setup-buildx-action@v3 from CI/CD workflow
  2. Removed Go Build Steps: Removed actions/setup-go@v4, go mod tidy, and individual Go tool installations
  3. Added Docker Cache Usage: All build steps now use the pre-built Docker cache image
  4. Updated Production Build: Production Docker build now generates Dockerfile.prod dynamically with correct dependency hash

CI/CD Workflow Structure

# CI Pipeline Job Structure
jobs:
  build-cache:
    # Builds Docker cache image if needed
    # Note: No certificate configuration needed with traditional docker
    
  ci-pipeline:
    needs: build-cache
    steps:
      - name: Set up build environment
        # Sets CACHE_IMAGE variable with proper tag
        # No Buildx setup, no Go installation, no certificate configuration
        
      - name: Generate Swagger Docs using Docker cache
        # Uses: docker run ${{ env.CACHE_IMAGE }} sh -c "cd pkg/server && go generate"
        
      - name: Build all packages using Docker cache
        # Uses: docker run ${{ env.CACHE_IMAGE }} sh -c "go build ./..."
        
      - name: Run tests with coverage using Docker cache
        # Uses: docker run ${{ env.CACHE_IMAGE }} sh -c "go test ./..."
        
      - name: Build and push Docker image
        # Uses: docker build -t dance-lessons-coach -f Dockerfile.prod .
        # No Buildx, no certificate issues

Key Improvements

  1. Faster Execution: No need to set up Go environment for each job
  2. Consistent Environment: All builds use the same Docker cache image
  3. Reduced Complexity: Simpler workflow with fewer steps
  4. Better Error Handling: Docker cache handles dependency management
  5. No Certificate Configuration: Traditional docker works seamlessly with self-signed certificates
  6. Improved Reliability: Elimination of Buildx-related failures

Future Considerations

When to Reconsider Buildx

  1. Multi-platform needs: If we need ARM/AMD64 builds simultaneously
  2. Complex builds: If Dockerfile requires BuildKit-specific features
  3. Performance optimization: If build times become unacceptable
  4. Certificate issues resolved: If Docker Buildx improves self-signed certificate handling

Migration Path

If we need to reintroduce Buildx in the future:

  1. Fix certificate issues properly at the Docker daemon level
  2. Test thoroughly in staging environment
  3. Monitor performance impact
  4. Document benefits clearly for the specific use case

Alternatives Considered

Option 1: Keep Buildx with Certificate Workaround

  • Complex setup with questionable reliability
  • Slow performance in GitHub Actions
  • Ongoing maintenance burden

Option 2: Use Insecure Registry Flag

docker buildx build --allow security.insecure --push .
  • Security concerns
  • Not recommended for production
  • Temporary workaround, not solution

Option 3: Traditional Docker Build + Push CHOSEN

  • Simple and reliable
  • Proven in production
  • Better performance in practice
  • Easy to maintain

Decision Outcome

Chosen Option: Traditional docker build + push (Option 3)

This decision prioritizes CI/CD reliability and simplicity over advanced features we don't currently need. The traditional approach has been proven to work consistently in our environment and matches the successful pattern from the webapp project.

Success Metrics

CI/CD Pipeline Metrics

  1. CI/CD reliability: No TLS certificate failures
  2. Build consistency: Predictable build times
  3. Maintenance: Reduced complexity and debugging time
  4. Compatibility: Works across all target environments

Build Strategy Metrics

  1. Cache hit rate: Percentage of CI runs using existing cache
  2. Build time reduction: Comparison of build times with vs without cache
  3. Image size: Production image size vs development image size
  4. CI execution time: Total CI pipeline duration

Quality Metrics

  1. Build reproducibility: Consistent builds across different environments
  2. Error rate: Reduction in CI/CD failures
  3. Recovery time: Time to recover from cache misses
  4. Resource utilization: Memory and CPU usage during builds

Implementation Checklist

  • Create Dockerfile.prod for production builds
  • Update Dockerfile.build for build cache
  • Keep Dockerfile for development use
  • Remove Docker Buildx from CI/CD workflow
  • Remove Go build steps from CI/CD workflow
  • Remove certificate configuration step (no longer needed)
  • Add Docker cache usage to all build steps
  • Fix Dockerfile.prod to use proper dependency hash (not latest)
  • Create dependency hash calculation script
  • Create build cache environment test script
  • Update CI/CD workflow to generate Dockerfile.prod dynamically
  • Update ADR 0020 with comprehensive documentation
  • Test changes locally
  • Push changes to trigger CI/CD workflow
  • Monitor workflow execution
  • Verify successful completion
  • Document results and metrics

Testing and Validation

Build Cache Environment Testing

A comprehensive test script is provided to validate the build cache environment:

# Test the build cache environment (simulates Gitea act runner)
./scripts/test-build-cache-environment.sh

This script tests:

  1. Dependency hash calculation
  2. Build cache image creation
  3. Go environment inside container
  4. Swagger generation
  5. Go build and test
  6. Binary build
  7. Production Dockerfile with cache
  8. Production container runtime

Dependency Hash Calculation

# Calculate dependency hash (used for cache image tagging)
./scripts/calculate-deps-hash.sh

# Export to file for use in scripts
./scripts/calculate-deps-hash.sh deps_hash.env
source deps_hash.env
echo "Hash: $DEPS_HASH"

Workflow Monitoring

# Monitor the workflow
./scripts/gitea-client.sh monitor-workflow arcodange dance-lessons-coach 420 30

# Check job status
./scripts/gitea-client.sh job-status arcodange dance-lessons-coach 420

# List workflow jobs
./scripts/gitea-client.sh list-workflow-jobs arcodange dance-lessons-coach 420

Validation Commands

# Verify CI/CD changes
./scripts/verify-cicd-changes.sh

# Test new CI/CD workflow
./scripts/test-new-cicd.sh

# Check Dockerfile syntax
docker run --rm -i hadolint/hadolint < Dockerfile.prod

Cleanup and Organization

Files Removed

  1. docker-compose.cicd-test.yml: Unused Docker Compose file
  2. scripts/cicd/: Old CI/CD test scripts (replaced by main test scripts)

Files Organized

All Dockerfiles moved to docker/ directory:

  • docker/Dockerfile - Development
  • docker/Dockerfile.build - Build cache
  • docker/Dockerfile.prod - Production (dev only)
  • docker/Dockerfile.prod.template - Template

Utility Scripts

  • scripts/calculate-deps-hash.sh - Consistent hash calculation
  • scripts/test-local-ci-cd.sh - Main local testing
  • scripts/test-build-cache-environment.sh - Build cache testing

Expected Outcomes

  1. Successful workflow execution: Workflow completes without errors
  2. Cache image created: Build cache image pushed to registry
  3. Production image built: Final Docker image built using generated docker/Dockerfile.prod
  4. Faster CI execution: Reduced build times compared to previous approach
  5. No certificate errors: No TLS certificate verification failures
  6. Clean organization: No clutter in root directory

References


Approved by: @arcodange Date: 2026-04-07 Updated: 2026-04-07 Supersedes: None Superseded by: None