Files
factory/vibe/guidebooks/lab-ecosystem/01-factory.md
Gabriel Radureau dbe32161dc docs(vibe): add factory-provisioning guidebook (Ansible + OpenTofu)
Deep, code-grounded tree-docs guidebook under vibe/guidebooks/factory-provisioning/,
explored from the actual playbooks/roles and tofu code:

- Hub: the two provisioning engines (operator-run Ansible vs CI-applied OpenTofu),
  a green-field bring-up flow, master index, maintenance rule.
- ansible/ sub-tree: ordered pages 01-system .. 06-recover, an inventory & variables
  concept page, and a Tier-1/Tier-2 roles reference (hashicorp_vault, step_ca,
  crowdsec, pihole, deploy_docker_compose + the gitea_* family and helpers).
- opentofu/ sub-tree: factory-iac (Cloudflare/OVH/GCP/Gitea/Vault edge +
  cloudflare_token module), postgres-iac (per-app DB/role/pgbouncer lookup),
  ci-apply-flow (Gitea OIDC-JWT -> Vault -> auto-approve apply).

Cross-linked bidirectionally with the lab-ecosystem guidebook and the safe-env
ADR/PRD (the sandbox rehearses exactly these engines). 14 mermaid diagrams
MCP-validated; zero dead links. Authored by the Lab Cartographer cohort.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 21:11:51 +02:00

9.2 KiB

vibe > Guidebooks > Lab ecosystem > 01 · factory

01 · factory

Status: Active Last Updated: 2026-06-23 Downstream: 02 · tools · 03 · cms Deeper dive: Factory provisioning guidebook — page-by-page walkthrough of the Ansible playbooks/roles and OpenTofu modules summarized here Related: naming-conventions.md · secrets-and-vault.md · storage-and-recovery.md

factory is the cornerstone admin repo: it provisions the hosts and the cluster, declares what gets deployed, and owns the platform-level cloud/Gitea/Vault/Postgres state that every app leans on. It has four pillars — Ansible (imperative host & cluster setup), ArgoCD (declarative app-of-apps), iac/ (OpenTofu for the cloud/Gitea/Vault edge), and postgres/iac/ (per-app PostgreSQL provisioning). The repos tools and cms are deployed by factory's ArgoCD and are mapped in 02 · tools and 03 · cms.

Pillar 1 — Ansible (ansible/)

The collection lives at ansible/arcodange/factory/. The inventory groups the three Pis and pins the service placement; numbered playbooks run an ordered narrative from bare OS to backups; recover/ holds the disaster-recovery playbooks.

Inventory (inventory/hosts.yml)

Group Hosts Purpose
raspberries pi1, pi2, pi3 (192.168.1.201-203) All three Pis; ansible_user: pi
postgres pi2 The PostgreSQL host (docker-compose, outside k3s)
gitea children of postgres (→ pi2) Gitea co-located with PG on pi2
pihole pi1, pi3 Internal DNS resolvers
step_ca pi1, pi2, pi3 Step-CA PKI for *.arcodange.lab (primary pi1, replicas pi2/pi3)
local localhost + the Pis Control-node-local tasks

Numbered playbooks (playbooks/)

Playbook Imports / does Notes
01_system system/system.yml → rpi base, DNS, SSL, prepare disks, Docker, iSCSI, k3s install (--docker --disable traefik), CoreDNS, cert-issuer, Longhorn/Traefik config k3s v1.34.3+k3s1 via upstream k3s-ansible; pi1 server, pi2/pi3 agents
02_setup setup/setup.yml → PostgreSQL + Gitea docker-compose; optional backup-NFS share Stands up the two out-of-cluster source-of-truth services on pi2
03_cicd Gitea act-runner docker-compose on pi1/pi3 (raspberries:&local:!gitea), plus the ArgoCD/Image-Updater install See the ArgoCD caveat below
04_tools tools/tools.ymlhashicorp_vault.yml, crowdsec.yml Platform tooling that bootstraps the cluster's Vault + CrowdSec
05_backup backup/backup.ymlpostgres.yml, gitea.yml, k3s_pvc.yml to /mnt/backups Scheduled PG/Gitea/PVC backups; cron-report wiring present

Recovery playbooks (playbooks/recover/)

Playbook When to use
longhorn.yml Recover Longhorn after a power cut when Volume CRDs still exist (CSI driver registration loss)
longhorn_data.yml Recover app data from raw replica .img files when Volume CRDs are gone (block-device level)

The tested power-cut recovery sequence (Longhorn restore → Vault unseal → VSO re-auth → ERP scaled up last) is documented in CLUSTER_RECOVERY.md at the lab root (outside this repo) and summarized in storage-and-recovery.md. Background on PVC recovery is in the Longhorn PVC recovery ADR.

Key roles

deploy_docker_compose (renders compose stacks), gitea_repo / gitea_token / gitea_secret / gitea_sync (Gitea repo/token/secret/mirror management), traefik_certs, playwright, plus sub-roles step_ca, hashicorp_vault, crowdsec, pihole.

Pillar 2 — ArgoCD app-of-apps (argocd/)

A Helm chart whose templates/apps.yaml loops over values.gitea_applications and emits one Application CRD per app. Each Application derives everything from the app name: repoURL = https://gitea.arcodange.lab/<org>/<app>, path = chart, namespace = <app> (CreateNamespace=true), with syncPolicy.automated prune: true + selfHeal: true by default.

App Org override Image Updater
url-shortener
tools explicit prune+selfHeal
webapp digest strategy
telegram-gateway arcodange digest strategy
erp
cms digest strategy
dance-lessons-coach arcodange digest strategy

Note

The chart also templates a longhorn_backup_target and the ArgoCD Image Updater config (argocd.arcodange.lab). ArgoCD itself is not currently deployed in-cluster — its install is commented out in 03_cicd. This page documents the intended steady state; treat ArgoCD as "designed, not live" until that step is enabled.

Pillar 3 — OpenTofu (iac/)

Manages the cloud/Gitea/Vault edge. State lives in GCS (backend "gcs", bucket arcodange-tf, prefix factory/main). Tofu authenticates to Vault via Gitea OIDC JWT (mount gitea_jwt, role gitea_cicd).

Provider Used for
go-gitea/gitea (0.6.0) Repos, users, action secrets (e.g. the restricted tofu_module_reader CI user, CMS secrets)
vault (4.4.0) KV secrets + policies + k8s auth roles (e.g. Longhorn GCS-backup creds & policy)
google (7.0.1) GCS backup bucket + service account + HMAC key for Longhorn
cloudflare/cloudflare (~> 5) R2 bucket, API tokens, CMS edge wiring (detailed in 03 · cms)
ovh/ovh (2.8.0) OAuth2 client + IAM policy for the arcodange.fr domain (registrar = OVH)

modules/cloudflare_token is a reusable scoped-token factory. The whole module reuses the <app> name as the GCS state prefix (<app>/main) — see naming-conventions.md.

Pillar 4 — per-app PostgreSQL (postgres/iac/)

OpenTofu using the cyrilgdn/postgresql provider against PG on 192.168.1.202 (state prefix factory/postgres). It iterates over a var.applications set and, per app, creates:

Resource Name pattern Purpose
Database <app> The app's database (template0, owned by the role)
Owner role (non-login) <app>_role Database owner; granted to dynamic users by Vault
Editor role (login) credentials_editor Shared admin role that can grant the per-app roles
user_lookup() function per-<app> db SECURITY DEFINER lookup for pgbouncer auth (granted to pgbouncer_auth, revoked from public)

Current applications set: webapp, erp, crowdsec, plausible, dance-lessons-coach. Vault's PostgreSQL secrets engine then issues dynamic credentials on top of these roles — see secrets-and-vault.md. The pooler (pgbouncer) that consumes user_lookup() lives in the tools namespace — see 02 · tools.

Provisioning order

%%{init: {'theme': 'base'}}%%
flowchart LR
    classDef proc fill:#059669,stroke:#047857,color:#fff
    classDef store fill:#7c3aed,stroke:#6d28d9,color:#fff
    S1["01_system<br>OS + k3s + Longhorn"]:::proc --> S2["02_setup<br>PG + Gitea (pi2)"]:::proc --> S3["03_cicd<br>runners + ArgoCD"]:::proc --> S4["04_tools<br>Vault + CrowdSec"]:::proc --> S5["05_backup<br>PG/Gitea/PVC"]:::proc
    IAC["iac/ + postgres/iac<br>(OpenTofu state in GCS)"]:::store -. "declares cloud/Gitea/Vault/PG" .- S2
  1. 01_system lays the OS, disks, Docker, and k3s with Longhorn + Traefik onto the three Pis.
  2. 02_setup stands up PostgreSQL and Gitea as docker-compose on pi2 — the out-of-cluster source-of-truth services.
  3. 03_cicd registers the Gitea act-runners (and is where ArgoCD would install, currently commented out).
  4. 04_tools bootstraps the cluster's Vault and CrowdSec.
  5. 05_backup schedules PostgreSQL, Gitea, and k3s-PVC backups to /mnt/backups.
  6. In parallel, OpenTofu (iac/ and postgres/iac/) declares the cloud, Gitea, Vault, and PostgreSQL objects, keeping state in GCS.

Cross-references