Files
factory/vibe/guidebooks/lab-ecosystem
Gabriel Radureau c00c4cdd5c feat(multi-env): Phase B — make factory machinery env-capable (no activation)
ADR-0002 Phase B. Makes postgres/iac, argocd, and the conventions docs
multi-environment-capable WITHOUT activating any sandbox yet — every app
stays prod-only, so this change is behaviour-neutral:
  - postgres/iac `tofu plan` is a no-op (proven: the elision flatten keys
    are bare app names, db=<app>, role=<app>_role — identical addresses)
  - the argocd apps.yaml render is byte-identical (181→181 lines, diff
    empty) since no app declares `envs`

postgres/iac:
- variables.tf: `applications` becomes set(object({name, envs=optional(["prod"])}))
- main.tf: a `local.app_instances` flatten of applications × envs keyed by the
  elided instance id (env=prod → "<app>"); per-app resources iterate it and
  reference each.key / each.value.{database,role}. For prod-only apps every
  resource address + attribute is unchanged. (main.tf also got a full
  `tofu fmt` pass — the pgbouncer function block reindents 4→2 spaces, which
  is cosmetic; the correctness gate is the CI tofu plan, not the text diff.)
- terraform.tfvars: string entries → { name = "..." } objects.

argocd/templates/apps.yaml:
- after the prod Application, a `range $app_attr.envs` loop renders one extra
  Application per non-prod env: name/namespace `<app>-<env>`, shared repoURL,
  helm.valueFiles [values.yaml, values-<env>.yaml], per-env syncPolicy override.
  Renders nothing while no app sets `envs` → prod render unchanged.

docs:
- doc/runbooks/new-web-app/conventions.md (FR, authoritative): new section
  "Plusieurs environnements pour une même app" — elision rule, suffix rule,
  snake-case owner-role exception, erp/erp-sandbox table, ADR-0002 link.
- vibe/guidebooks/lab-ecosystem/naming-conventions.md (EN mirror): the env
  coordinate section + a "Two sandbox models" section reconciling the
  separate-cluster (ADR-0001, names repeat) vs in-cluster sibling (ADR-0002,
  <env> suffix) strategies; Last Updated bumped; ADR-0002 cross-links.

Activation (erp gets envs=["prod","sandbox"] in postgres tfvars + argocd
values + erp/iac) is Phase D, gated by its own plan review.

Refs ADR-0002 (factory#15). Phase A = tools#2 (merged). Phase C = erp#11 (merged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-28 16:28:28 +02:00
..

vibe > Guidebooks > Lab ecosystem

Lab ecosystem

Status: Active Last Updated: 2026-06-23 Related: ADR-0001 · safe prod-like environment · PRD · safe prod-like environment · INV-001 · prod blast-radius couplings

What this is

This guidebook is the end-to-end map of the Arcodange home lab — how the three repos (factory, tools, cms), the three Raspberry Pis, and the cloud edge wire together into one running system. It is a descriptive reference map, not a procedure: it answers "how does this fit together right now?". For "how do I add a new app step by step?" see the new-web-app runbook; for "why was it built this way?" see the factory ADRs.

The lab is run from one control node — a MacBook Pro M4 — driving everything via Ansible (imperative host setup) and OpenTofu (declarative cloud/Gitea/Vault/Postgres state). The three Pis (pi1/pi2/pi3 = 192.168.1.201-203) sit behind a home Livebox. pi1 is the k3s server; pi2/pi3 are agents. Gitea + PostgreSQL run as Docker Compose outside k3s on pi2's disk; everything else runs inside k3s on Longhorn distributed block storage. The public edge is a Cloudflared Zero-Trust tunnel into the internal Traefik, with Cloudflare DNS and Zoho email fronting arcodange.fr.

The whole lab, end to end

%%{init: {'theme': 'base'}}%%
flowchart TB
    classDef ctrl fill:#2563eb,stroke:#1e40af,color:#fff
    classDef host fill:#0891b2,stroke:#0e7490,color:#fff
    classDef proc fill:#059669,stroke:#047857,color:#fff
    classDef store fill:#7c3aed,stroke:#6d28d9,color:#fff
    classDef edge fill:#d97706,stroke:#b45309,color:#fff
    classDef dead fill:#6b7280,stroke:#4b5563,color:#fff

    MAC["Control node (MacBook Pro M4)<br>Ansible + OpenTofu"]:::ctrl

    subgraph LAN["Home LAN (Livebox) — 192.168.1.0/24"]
        subgraph PI2["pi2 · 192.168.1.202 (docker-compose, outside k3s)"]
            GITEA["Gitea<br>arcodange-org/*"]:::host
            PG[("PostgreSQL")]:::store
        end
        subgraph K3S["k3s cluster — pi1 server, pi2/pi3 agents"]
            ARGO["ArgoCD app-of-apps<br> /argocd"]:::proc
            LH[("Longhorn<br>block storage")]:::store
            VAULT["Vault + VSO<br>secrets"]:::store
            TRAEFIK["Traefik<br>ingress"]:::proc
            TOOLS["tools namespace<br>(Vault, Grafana, CrowdSec, …)"]:::host
            APPS["app namespaces<br>(webapp, erp, cms, …)"]:::host
        end
        OLLAMA["pi3 · ollama"]:::host
    end

    subgraph CLOUD["Cloud edge"]
        CF["Cloudflare DNS<br>+ Cloudflared tunnel"]:::edge
        ZOHO["Zoho<br>email (arcodange.fr)"]:::edge
        GCS[("GCS gs://arcodange-tf<br>OpenTofu state + Longhorn backup")]:::store
    end

    INTERNET(["Internet"]):::edge

    MAC -- "Ansible: provision hosts, k3s, docker-compose" --> PI2
    MAC -- "Ansible: k3s, Longhorn, Traefik" --> K3S
    MAC -- "OpenTofu: Gitea/Vault/PG/Cloudflare/OVH state" --> GITEA
    MAC -- "OpenTofu state" --> GCS

    GITEA -- "repoURL chart/" --> ARGO
    ARGO -- "Application CRDs (prune+selfHeal)" --> TOOLS
    ARGO -- "Application CRDs (prune+selfHeal)" --> APPS
    VAULT -- "VSO injects secrets into pods" --> TOOLS
    VAULT -- "VSO injects secrets into pods" --> APPS
    APPS -- "dynamic creds" --> PG
    LH -. "PVCs" .- TOOLS
    LH -. "PVCs" .- APPS
    LH -- "backup target" --> GCS

    INTERNET --> CF -- "tunnel" --> TRAEFIK --> APPS
    INTERNET --> ZOHO
  1. The control node (MacBook) provisions the three Pis with Ansible (OS, disks, Docker, k3s, Longhorn, Traefik) and manages all SaaS/Gitea/Vault/Postgres state with OpenTofu.
  2. On pi2, Gitea and PostgreSQL run as Docker Compose outside k3s, on the local disk — they are the source-of-truth services the cluster depends on.
  3. OpenTofu keeps its state in GCS (gs://arcodange-tf), and Longhorn pushes volume backups to the same GCS project.
  4. Gitea hosts every app repo; each repo's chart/ directory is the deployable Helm chart.
  5. ArgoCD's app-of-apps turns each Gitea repo into an Application CRD (automated prune + selfHeal) that deploys into the tools namespace and the per-app namespaces.
  6. Vault is the single source of truth for secrets; the Vault Secrets Operator (VSO) injects them into pods via Kubernetes auth, and apps draw dynamic PostgreSQL credentials from Vault against pi2.
  7. Longhorn provides the PVCs the in-cluster workloads mount, and backs up to GCS.
  8. The public edge routes Internet traffic through Cloudflare DNS and a Cloudflared Zero-Trust tunnel into the internal Traefik, which fronts the app namespaces; Zoho handles arcodange.fr email.

Note

The ArgoCD Helm chart under argocd/ is defined and templated, but ArgoCD itself is not currently deployed in-cluster (its install step is commented out in the 03_cicd provisioning). The app-of-apps wiring documented here is the intended steady state; see 01 · factory for the caveat.

Deploy / secrets / DNS flows

  • Deploy flow. Push to a Gitea repo → CI builds an image into the Gitea registry → ArgoCD (via the app-of-apps and, for some apps, the Image Updater) syncs the chart/ directory into the matching namespace with prune + selfHeal. The whole chain keys off one <app> identifier — see naming-conventions.md.
  • Secrets flow. Vault is the single source of truth (no sops/age). CI authenticates to Vault via Gitea OIDC JWT (role gitea_cicd_<app>); pods receive secrets at runtime via VSO (Kubernetes auth + VaultDynamicSecret CRDs). Detail in secrets-and-vault.md.
  • DNS / edge flow. Internal names resolve under *.arcodange.lab (Pi-hole + Step-CA-issued TLS). Public traffic for arcodange.fr enters through Cloudflare and a Cloudflared tunnel to internal Traefik; public TLS is Let's Encrypt via Traefik's DNS-challenge (DuckDNS). Email runs through Zoho. Edge detail in 03 · cms.

Master index

Page What it maps Status
01 · factory The cornerstone admin repo: Ansible host/cluster provisioning, ArgoCD app-of-apps, OpenTofu (iac/), and per-app PostgreSQL (postgres/iac/) Active
02 · tools The tools namespace: Vault, VSO, Prometheus, Grafana, CrowdSec, poolers, Redis/KeyDB, Plausible + ClickHouse, the tool library chart Active
03 · cms The public-facing site: Nuxt static site, Cloudflare zone + tunnel + Turnstile, Zoho email (MX/SPF/DKIM/DMARC/BIMI + aliases) Active
naming-conventions.md The <app> join key — one kebab-case name reused identically across Gitea, PG, Vault, k8s, ArgoCD, GCS, DNS Active
secrets-and-vault.md How Vault is the single source of truth: Gitea OIDC JWT for CI, VSO injection for pods, dynamic PostgreSQL creds Active
storage-and-recovery.md Longhorn block storage, GCS backup target, and the tested power-cut recovery sequence Active

Status legend

done · 🟡 beta · 🔴 critical · ⚠️ known issue · disabled · not started.

Maintenance rule

Important

If you alter a component documented here, update its page in the same change. A reference map that drifts from reality sends readers (and agents) confidently down dead paths. The PR that changes the component is the PR that updates its guidebook page — treat the doc edit as part of the diff, not a follow-up.

Cross-references