Files
dance-lessons-coach/adr/0020-docker-build-strategy.md
Gabriel Radureau 69e7c44eb2 📝 docs: add comprehensive user management ADR and technical documentation
Added ADR-0018 for User Management and Authentication System with:
- Non-persisted admin user with master password authentication
- JWT-based authentication with bcrypt password hashing
- PostgreSQL database schema and GORM integration
- Admin-assisted password reset workflow
- Comprehensive security considerations

Added ADR-0019 for BDD Feature Structure:
- Epic/User Story organization pattern
- Unified development workflow
- Source of truth hierarchy

Added ADR-0020 for Docker Build Strategy:
- Multi-stage build approach
- Cache optimization strategy
- Production vs development build differences

Added technical documentation:
- Complete user management system specification
- API endpoints and integration details
- Security architecture and best practices

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-04-09 00:25:35 +02:00

494 lines
17 KiB
Markdown

# ADR 0020: Docker Build Strategy - Traditional vs Buildx
## Status
**Accepted** ✅
## Context
The dance-lessons-coach CI/CD pipeline initially used Docker Buildx (`docker buildx build --push`) for building and pushing Docker cache images. However, this approach encountered several issues:
### Issues with Buildx Approach
1. **TLS Certificate Problems**: Buildx had difficulty with self-signed certificates, requiring complex workaround steps
2. **Performance Concerns**: Buildx setup and execution was significantly slower than expected
3. **Complexity**: Buildx introduced additional complexity without providing immediate benefits
4. **Reliability Issues**: Buildx builds were less reliable in the GitHub Actions environment
### Working Solution Analysis
The working webapp CI/CD pipeline uses traditional `docker build` + `docker push` approach:
```yaml
# Working approach from webapp
- name: Build and push image to Gitea Container Registry
run: |-
docker build -t app .
docker tag app gitea.arcodange.lab/${{ github.repository }}:$TAG
docker push gitea.arcodange.lab/${{ github.repository }}:$TAG
```
This approach is simpler, more reliable, and works consistently with self-signed certificates.
## Decision
**Replace Docker Buildx with traditional docker build + push** for the CI/CD pipeline and implement a two-stage Docker build strategy.
### Implementation
#### 1. Build Cache Strategy
```yaml
# Build cache using traditional docker build
- name: Build and push Docker cache image
if: steps.check_cache.outputs.cache_hit == 'false'
run: |
IMAGE_NAME="${{ env.CI_REGISTRY }}/${{ env.GITEA_ORG }}/${{ env.GITEA_REPO }}-build-cache:${{ steps.calculate_hash.outputs.deps_hash }}"
echo "Building cache image: $IMAGE_NAME"
# Build the image using traditional docker build
docker build \
--file Dockerfile.build \
--tag "$IMAGE_NAME" \
.
# Push the image
docker push "$IMAGE_NAME"
echo "✅ Build cache image pushed successfully"
```
#### 2. Production Build Strategy
```yaml
# Production build using Dockerfile.prod
- name: Build and push Docker image
if: github.ref == 'refs/heads/main'
run: |
source VERSION
IMAGE_VERSION="$MAJOR.$MINOR.$PATCH${PRERELEASE:+-$PRERELEASE}"
TAGS="$IMAGE_VERSION latest ${{ github.sha }}"
echo "Building Docker image with tags: $TAGS"
# Use the production Dockerfile that leverages the build cache
docker build -t dance-lessons-coach -f Dockerfile.prod .
for TAG in $TAGS; do
IMAGE_NAME="${{ env.CI_REGISTRY }}/${{ env.GITEA_ORG }}/${{ env.GITEA_REPO }}:$TAG"
echo "Tagging and pushing: $IMAGE_NAME"
docker tag dance-lessons-coach "$IMAGE_NAME"
docker push "$IMAGE_NAME"
done
```
#### 3. Dockerfile Structure
**Dockerfile.build** - Build environment with all dependencies:
```dockerfile
FROM golang:1.26.1-alpine AS builder
# Install build dependencies
RUN apk add --no-cache git bash curl make gcc musl-dev bc grep sed jq ca-certificates
# Install Go tools
RUN go install github.com/swaggo/swag/cmd/swag@latest
# Copy and verify dependencies
COPY go.mod go.sum ./
RUN go mod download && go mod verify
WORKDIR /workspace
```
**Dockerfile.prod** - Minimal production image:
```dockerfile
# Use the build cache image as base
FROM gitea.arcodange.lab/arcodange/dance-lessons-coach-build-cache:latest AS builder
# Final minimal image
FROM alpine:3.18
WORKDIR /app
# Install minimal dependencies
RUN apk add --no-cache ca-certificates tzdata
# Copy binary from builder
COPY --from=builder /workspace/dance-lessons-coach /app/dance-lessons-coach
# Copy configuration
COPY config.yaml /app/config.yaml
# Set permissions and entrypoint
RUN chmod +x /app/dance-lessons-coach
ENV TZ=UTC
EXPOSE 8080
ENTRYPOINT ["/app/dance-lessons-coach"]
```
**docker/Dockerfile** - Development Dockerfile (kept for local development):
```dockerfile
# Multi-stage build for development
FROM golang:1.26.1-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . ./
RUN go build -o /dance-lessons-coach ./cmd/server
FROM alpine:3.18
WORKDIR /app
RUN apk add --no-cache ca-certificates tzdata
COPY --from=builder /dance-lessons-coach /app/dance-lessons-coach
COPY config.yaml /app/config.yaml
RUN chmod +x /app/dance-lessons-coach
ENV TZ=UTC
EXPOSE 8080
ENTRYPOINT ["/app/dance-lessons-coach"]
```
### File Organization
All Dockerfiles are now organized in the `docker/` directory:
- `docker/Dockerfile` - Development Dockerfile
- `docker/Dockerfile.build` - Build cache Dockerfile
- `docker/Dockerfile.prod` - Production Dockerfile (development only, uses latest)
- `docker/Dockerfile.prod.template` - Template for reference
This organization keeps the root directory clean and makes it clear which files are for development vs production.
## Benefits
### CI/CD Pipeline Benefits
1. **Simplicity**: Traditional approach is easier to understand and debug
2. **Reliability**: Consistent behavior across different environments
3. **Certificate Handling**: Works seamlessly with self-signed certificates
4. **Performance**: Faster execution without Buildx overhead
5. **Compatibility**: Better compatibility with GitHub Actions environment
### Two-Stage Build Benefits
1. **Separation of Concerns**: Clear separation between build environment and production runtime
2. **Optimized Production Image**: Minimal Alpine-based image with only necessary dependencies
3. **Reusable Build Cache**: Build environment can be reused across multiple CI runs
4. **Faster CI Execution**: Pre-built build cache reduces CI execution time
5. **Consistent Builds**: All builds use the same build environment
### Development vs Production Clarity
1. **Development Dockerfile**: Full build environment for local development
2. **Production Dockerfile**: Minimal runtime environment for deployment
3. **Build Cache Dockerfile**: Optimized build environment for CI/CD
4. **Clear Documentation**: Each Dockerfile has a specific purpose
## Trade-offs
### What We Lose
1. **Multi-platform builds**: Cannot build for multiple architectures simultaneously
2. **BuildKit caching**: Less sophisticated caching mechanism
3. **Advanced features**: No secret mounting, SSH agents, etc.
4. **Parallel processing**: Slower builds without Buildx optimizations
### What We Gain
1. **Stability**: More reliable CI/CD pipeline
2. **Simplicity**: Easier to maintain and troubleshoot
3. **Consistency**: Matches proven patterns from working projects
4. **Faster feedback**: Quicker build times in practice
5. **Clear Separation**: Better distinction between development and production builds
6. **Optimized Production**: Smaller, more secure production images
## Rationale
1. **Current Needs**: We don't need multi-platform builds or advanced BuildKit features
2. **Simple Dockerfile**: Our `Dockerfile.build` doesn't require Buildx-specific features
3. **Proven Pattern**: Traditional approach works reliably in production (webapp project)
4. **CI Stability**: Reliability is more important than advanced features for CI/CD
5. **Build Strategy**: Two-stage build provides better separation of concerns
6. **Maintenance**: Simpler approach is easier to maintain and debug
## Critical Bug Fix: Dependency Hash Usage
### Issue Identified
The initial implementation had a critical bug where `Dockerfile.prod` used `latest` tag instead of the specific dependency hash:
```dockerfile
# ❌ WRONG - this would never work
FROM gitea.arcodange.lab/arcodange/dance-lessons-coach-build-cache:latest AS builder
```
This approach would never work because:
1. The build cache images are tagged with specific dependency hashes
2. No image is ever tagged as `latest`
3. The CI/CD workflow would fail to find the cache image
### Solution Implemented
1. **Dynamic Dockerfile Generation**: The CI/CD workflow now generates `Dockerfile.prod` dynamically with the correct dependency hash
2. **Dependency Hash Calculation**: Added `scripts/calculate-deps-hash.sh` for consistent hash calculation
3. **Template Approach**: Created `Dockerfile.prod.template` for reference
### CI/CD Workflow Fix
```yaml
# ✅ CORRECT - generate Dockerfile.prod with proper hash
- name: Build and push Docker image
if: github.ref == 'refs/heads/main'
run: |
# Generate Dockerfile.prod with correct dependency hash
DEPS_HASH="${{ needs.build-cache.outputs.deps_hash }}"
# Create Dockerfile.prod with the correct cache image tag
cat > Dockerfile.prod << EOF
FROM gitea.arcodange.lab/arcodange/dance-lessons-coach-build-cache:$DEPS_HASH AS builder
# ... rest of Dockerfile
EOF
# Build using the generated Dockerfile
docker build -t dance-lessons-coach -f Dockerfile.prod .
```
## CI/CD Pipeline Optimization
### Changes Made
1. **Removed Buildx Setup**: Eliminated `docker/setup-buildx-action@v3` from CI/CD workflow
2. **Removed Go Build Steps**: Removed `actions/setup-go@v4`, `go mod tidy`, and individual Go tool installations
3. **Added Docker Cache Usage**: All build steps now use the pre-built Docker cache image
4. **Updated Production Build**: Production Docker build now generates `Dockerfile.prod` dynamically with correct dependency hash
### CI/CD Workflow Structure
```yaml
# CI Pipeline Job Structure
jobs:
build-cache:
# Builds Docker cache image if needed
# Note: No certificate configuration needed with traditional docker
ci-pipeline:
needs: build-cache
steps:
- name: Set up build environment
# Sets CACHE_IMAGE variable with proper tag
# No Buildx setup, no Go installation, no certificate configuration
- name: Generate Swagger Docs using Docker cache
# Uses: docker run ${{ env.CACHE_IMAGE }} sh -c "cd pkg/server && go generate"
- name: Build all packages using Docker cache
# Uses: docker run ${{ env.CACHE_IMAGE }} sh -c "go build ./..."
- name: Run tests with coverage using Docker cache
# Uses: docker run ${{ env.CACHE_IMAGE }} sh -c "go test ./..."
- name: Build and push Docker image
# Uses: docker build -t dance-lessons-coach -f Dockerfile.prod .
# No Buildx, no certificate issues
```
### Key Improvements
1. **Faster Execution**: No need to set up Go environment for each job
2. **Consistent Environment**: All builds use the same Docker cache image
3. **Reduced Complexity**: Simpler workflow with fewer steps
4. **Better Error Handling**: Docker cache handles dependency management
5. **No Certificate Configuration**: Traditional docker works seamlessly with self-signed certificates
6. **Improved Reliability**: Elimination of Buildx-related failures
## Future Considerations
### When to Reconsider Buildx
1. **Multi-platform needs**: If we need ARM/AMD64 builds simultaneously
2. **Complex builds**: If Dockerfile requires BuildKit-specific features
3. **Performance optimization**: If build times become unacceptable
4. **Certificate issues resolved**: If Docker Buildx improves self-signed certificate handling
### Migration Path
If we need to reintroduce Buildx in the future:
1. **Fix certificate issues properly** at the Docker daemon level
2. **Test thoroughly** in staging environment
3. **Monitor performance** impact
4. **Document benefits** clearly for the specific use case
## Alternatives Considered
### Option 1: Keep Buildx with Certificate Workaround
- ❌ Complex setup with questionable reliability
- ❌ Slow performance in GitHub Actions
- ❌ Ongoing maintenance burden
### Option 2: Use Insecure Registry Flag
```yaml
docker buildx build --allow security.insecure --push .
```
- ❌ Security concerns
- ❌ Not recommended for production
- ❌ Temporary workaround, not solution
### Option 3: Traditional Docker Build + Push ✅ **CHOSEN**
- ✅ Simple and reliable
- ✅ Proven in production
- ✅ Better performance in practice
- ✅ Easy to maintain
## Decision Outcome
**Chosen Option**: Traditional docker build + push (Option 3)
This decision prioritizes CI/CD reliability and simplicity over advanced features we don't currently need. The traditional approach has been proven to work consistently in our environment and matches the successful pattern from the webapp project.
## Success Metrics
### CI/CD Pipeline Metrics
1. **CI/CD reliability**: No TLS certificate failures
2. **Build consistency**: Predictable build times
3. **Maintenance**: Reduced complexity and debugging time
4. **Compatibility**: Works across all target environments
### Build Strategy Metrics
1. **Cache hit rate**: Percentage of CI runs using existing cache
2. **Build time reduction**: Comparison of build times with vs without cache
3. **Image size**: Production image size vs development image size
4. **CI execution time**: Total CI pipeline duration
### Quality Metrics
1. **Build reproducibility**: Consistent builds across different environments
2. **Error rate**: Reduction in CI/CD failures
3. **Recovery time**: Time to recover from cache misses
4. **Resource utilization**: Memory and CPU usage during builds
## Implementation Checklist
- [x] Create `Dockerfile.prod` for production builds
- [x] Update `Dockerfile.build` for build cache
- [x] Keep `Dockerfile` for development use
- [x] Remove Docker Buildx from CI/CD workflow
- [x] Remove Go build steps from CI/CD workflow
- [x] Remove certificate configuration step (no longer needed)
- [x] Add Docker cache usage to all build steps
- [x] Fix Dockerfile.prod to use proper dependency hash (not latest)
- [x] Create dependency hash calculation script
- [x] Create build cache environment test script
- [x] Update CI/CD workflow to generate Dockerfile.prod dynamically
- [x] Update ADR 0020 with comprehensive documentation
- [x] Test changes locally
- [x] Push changes to trigger CI/CD workflow
- [ ] Monitor workflow execution
- [ ] Verify successful completion
- [ ] Document results and metrics
## Testing and Validation
### Build Cache Environment Testing
A comprehensive test script is provided to validate the build cache environment:
```bash
# Test the build cache environment (simulates Gitea act runner)
./scripts/test-build-cache-environment.sh
```
This script tests:
1. Dependency hash calculation
2. Build cache image creation
3. Go environment inside container
4. Swagger generation
5. Go build and test
6. Binary build
7. Production Dockerfile with cache
8. Production container runtime
### Dependency Hash Calculation
```bash
# Calculate dependency hash (used for cache image tagging)
./scripts/calculate-deps-hash.sh
# Export to file for use in scripts
./scripts/calculate-deps-hash.sh deps_hash.env
source deps_hash.env
echo "Hash: $DEPS_HASH"
```
### Workflow Monitoring
```bash
# Monitor the workflow
./scripts/gitea-client.sh monitor-workflow arcodange dance-lessons-coach 420 30
# Check job status
./scripts/gitea-client.sh job-status arcodange dance-lessons-coach 420
# List workflow jobs
./scripts/gitea-client.sh list-workflow-jobs arcodange dance-lessons-coach 420
```
### Validation Commands
```bash
# Verify CI/CD changes
./scripts/verify-cicd-changes.sh
# Test new CI/CD workflow
./scripts/test-new-cicd.sh
# Check Dockerfile syntax
docker run --rm -i hadolint/hadolint < Dockerfile.prod
```
## Cleanup and Organization
### Files Removed
1. **docker-compose.cicd-test.yml**: Unused Docker Compose file
2. **scripts/cicd/**: Old CI/CD test scripts (replaced by main test scripts)
### Files Organized
All Dockerfiles moved to `docker/` directory:
- `docker/Dockerfile` - Development
- `docker/Dockerfile.build` - Build cache
- `docker/Dockerfile.prod` - Production (dev only)
- `docker/Dockerfile.prod.template` - Template
### Utility Scripts
- `scripts/calculate-deps-hash.sh` - Consistent hash calculation
- `scripts/test-local-ci-cd.sh` - Main local testing
- `scripts/test-build-cache-environment.sh` - Build cache testing
## Expected Outcomes
1. **Successful workflow execution**: Workflow completes without errors
2. **Cache image created**: Build cache image pushed to registry
3. **Production image built**: Final Docker image built using generated `docker/Dockerfile.prod`
4. **Faster CI execution**: Reduced build times compared to previous approach
5. **No certificate errors**: No TLS certificate verification failures
6. **Clean organization**: No clutter in root directory
## References
- [Docker Buildx Documentation](https://docs.docker.com/buildx/working-with-buildx/)
- [Docker Build Documentation](https://docs.docker.com/engine/reference/commandline/build/)
- [GitHub Actions Docker Examples](https://github.com/actions/starter-workflows/tree/main/ci-and-cd)
- [webapp CI/CD Pipeline](https://gitea.arcodange.fr/arcodange-org/webapp/src/branch/main/.gitea/workflows/dockerimage.yaml)
- [Docker Multi-stage Builds](https://docs.docker.com/build/building/multi-stage/)
- [Alpine Linux Docker Images](https://hub.docker.com/_/alpine)
---
**Approved by**: @arcodange
**Date**: 2026-04-07
**Updated**: 2026-04-07
**Supersedes**: None
**Superseded by**: None