Files
factory/vibe/guidebooks/factory-provisioning
Gabriel Radureau 1824a1885d docs(vibe): add maintenance rule to the ansible + opentofu sub-hubs
The two factory-provisioning sub-hubs were the only guidebook index pages without
the "alter a documented component -> update its page in the same PR" reminder that
every sibling hub carries. Add a scoped maintenance rule to each, pointing back to
the factory-provisioning maintenance rule and the guidebooks' Rules to contribute,
so no folder hub silently drifts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 23:42:24 +02:00
..

vibe > Guidebooks > Factory provisioning

Factory provisioning

Status: Active Last Updated: 2026-06-23 Upstream: Lab ecosystem guidebook · 01 · factory Related: safe-prod-like-environment ADR · safe-prod-like-environment PRD

This guidebook is the deep dive into how the factory repo turns three Raspberry Pis + a handful of cloud accounts into the running lab. Where the lab-ecosystem map shows which components exist and how they join, this guidebook drills into the two provisioning engines that build and maintain them: the Ansible collection that the operator runs from the Mac, and the OpenTofu modules that Gitea CI applies. Every page below describes the engine as it is wired right now — playbook imports, role responsibilities, inventory placement, provider versions, state backends, and the CI flow that ties Tofu to Vault.

Two engines, two trigger models

The factory splits provisioning along a hard line: imperative, operator-driven host/cluster build (Ansible) versus declarative, CI-driven forge/cloud/database state (OpenTofu). They never overlap on the same resource, and they run at different moments.

Engine Trigger Runs from Owns Lives at
Ansible One-shot, operator-run on demand The Mac (control node) The cluster + base layer + stateful services: k3s, Longhorn, Pi-hole, step-ca, PostgreSQL, Gitea, Vault, CrowdSec — plus the disaster-recovery playbooks ansible/sub-hub
OpenTofu CI-applied on Gitea (path-filtered push/pull_request + workflow_dispatch) Gitea act-runners Forge/cloud edge state (Cloudflare, OVH, GCP, Gitea, Vault) and per-app PostgreSQL databases iac/ + postgres/sub-hub

Note

Ansible is imperative and human-gated because it touches bare hosts and one-time bootstrap (disk prep, k3s install, Vault init). OpenTofu is declarative and machine-gated because its targets are reconcilable API objects (a DNS record, a bucket, a database) whose desired state belongs in version control and converges on every merge.

How a green-field lab comes up

%%{init: {'theme': 'base'}}%%
flowchart LR
    classDef op fill:#1e3a8a,stroke:#1e40af,color:#fff
    classDef eng fill:#059669,stroke:#047857,color:#fff
    classDef host fill:#7c3aed,stroke:#6d28d9,color:#fff
    classDef store fill:#b45309,stroke:#92400e,color:#fff

    OP["Operator<br>at the Mac"]:::op -->|"runs playbooks 01→05"| ANS["Ansible collection<br>arcodange.factory"]:::eng
    ANS -->|"OS · k3s · Longhorn · base layer"| PIS["3× Raspberry Pi<br>pi1 / pi2 / pi3"]:::host
    PIS -->|"hosts Gitea + act-runners"| CI["Gitea CI<br>act-runners"]:::store
    CI -->|"path-filtered apply"| TOFU["OpenTofu<br>iac/ + postgres/iac/"]:::eng
    TOFU -->|"forge · cloud · PG state"| EDGE["Cloudflare · OVH · GCP<br>Gitea · Vault · PostgreSQL"]:::store
    TOFU -. "state in GCS gs://arcodange-tf" .- EDGE
  1. The operator, working from the Mac control node, runs the numbered Ansible playbooks 01_system05_backup in order.
  2. Ansible lays the OS, k3s (v1.34.3+k3s1), Longhorn, and the base layer (Pi-hole, step-ca, Vault, CrowdSec) plus the stateful out-of-cluster services (PostgreSQL + Gitea) onto the three Raspberry Pis (pi1/pi2/pi3).
  3. Once pi2 is hosting Gitea and pi1/pi3 are running the act-runners (registered by 03_cicd), the forge can run CI.
  4. A push or merge to factory that touches iac/** or postgres/** triggers the corresponding Gitea CI workflow on those runners.
  5. The CI job authenticates to Vault via Gitea OIDC JWT and runs OpenTofu, which reconciles the forge/cloud/database edge — Cloudflare, OVH, GCP, Gitea action-secrets, Vault KV/policies, and the per-app PostgreSQL objects.
  6. All OpenTofu state is kept in GCS under gs://arcodange-tf (prefix factory/main for the cloud edge, factory/postgres for the databases), so each CI run reads and writes the authoritative state remotely.

Master index

Sub-hub What it maps Status
Ansible The arcodange.factory collection: numbered playbooks 0106, the inventory + group_vars, and the reusable roles that build hosts, the cluster, and the stateful services Active
OpenTofu The CI-applied IaC: the cloud/forge edge (iac/), the per-app PostgreSQL provisioning (postgres/iac/), and the Gitea-OIDC → Vault apply flow Active

All pages

  • Ansible
    • System (01) — OS, DNS, SSL, disks, Docker, iSCSI, k3s, CoreDNS, cert-issuer, Longhorn/Traefik config
    • Setup (02) — PostgreSQL + Gitea docker-compose on pi2 (and the optional backup-NFS share)
    • CI/CD (03) — Gitea act-runner registration on pi1/pi3 and the ArgoCD/Image-Updater install
    • Tools (04) — Vault + CrowdSec bootstrap into the cluster
    • Backup (05) — scheduled PostgreSQL / Gitea / k3s-PVC backups to /mnt/backups
    • Recover (06) — the Longhorn disaster-recovery playbooks (recover/)
    • Inventory & variableshosts.yml groups and the group_vars tree
    • Roles referencedeploy_docker_compose, the gitea_* family, traefik_certs, playwright, and the service sub-roles
  • OpenTofu
    • factory iaciac/: Cloudflare/OVH/GCP/Gitea/Vault edge + the cloudflare_token module
    • postgres iacpostgres/iac/: per-app databases, roles, and the pgbouncer user_lookup() function
    • CI apply flow — the Gitea workflows, OIDC-JWT → Vault auth, and the GCS state backend

Maintenance rule

Important

Alter a documented component → update its page in the same change. If you change a playbook, a role, an inventory entry, a provider version, a Tofu resource, or the CI flow, the matching page in this guidebook MUST be edited in the same PR. A provisioning map that drifts from the code sends operators (and agents) down dead paths during a rebuild or a recovery — exactly when the map matters most.

Why this guidebook earns its keep

The safe-prod-like-environment work rehearses exactly these playbooks and Tofu modules in a throwaway sandbox before they touch the real lab: the sandbox stands up the same 0105 narrative and runs the same iac/ + postgres/iac/ apply, so the rehearsal only holds if this guidebook tracks the engines faithfully. See the safe-prod-like-environment ADR for the decision and the PRD (with its QA strategy) for what the sandbox must reproduce.

Cross-references