Deep, code-grounded tree-docs guidebook under vibe/guidebooks/factory-provisioning/, explored from the actual playbooks/roles and tofu code: - Hub: the two provisioning engines (operator-run Ansible vs CI-applied OpenTofu), a green-field bring-up flow, master index, maintenance rule. - ansible/ sub-tree: ordered pages 01-system .. 06-recover, an inventory & variables concept page, and a Tier-1/Tier-2 roles reference (hashicorp_vault, step_ca, crowdsec, pihole, deploy_docker_compose + the gitea_* family and helpers). - opentofu/ sub-tree: factory-iac (Cloudflare/OVH/GCP/Gitea/Vault edge + cloudflare_token module), postgres-iac (per-app DB/role/pgbouncer lookup), ci-apply-flow (Gitea OIDC-JWT -> Vault -> auto-approve apply). Cross-linked bidirectionally with the lab-ecosystem guidebook and the safe-env ADR/PRD (the sandbox rehearses exactly these engines). 14 mermaid diagrams MCP-validated; zero dead links. Authored by the Lab Cartographer cohort. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
vibe > Guidebooks > Factory provisioning
Factory provisioning
Status: ✅ Active Last Updated: 2026-06-23 Upstream: Lab ecosystem guidebook · 01 · factory Related: safe-prod-like-environment ADR · safe-prod-like-environment PRD
This guidebook is the deep dive into how the factory repo turns three Raspberry Pis + a handful of cloud accounts into the running lab. Where the lab-ecosystem map shows which components exist and how they join, this guidebook drills into the two provisioning engines that build and maintain them: the Ansible collection that the operator runs from the Mac, and the OpenTofu modules that Gitea CI applies. Every page below describes the engine as it is wired right now — playbook imports, role responsibilities, inventory placement, provider versions, state backends, and the CI flow that ties Tofu to Vault.
Two engines, two trigger models
The factory splits provisioning along a hard line: imperative, operator-driven host/cluster build (Ansible) versus declarative, CI-driven forge/cloud/database state (OpenTofu). They never overlap on the same resource, and they run at different moments.
| Engine | Trigger | Runs from | Owns | Lives at |
|---|---|---|---|---|
| Ansible | One-shot, operator-run on demand | The Mac (control node) | The cluster + base layer + stateful services: k3s, Longhorn, Pi-hole, step-ca, PostgreSQL, Gitea, Vault, CrowdSec — plus the disaster-recovery playbooks | ansible/ → sub-hub |
| OpenTofu | CI-applied on Gitea (path-filtered push/pull_request + workflow_dispatch) |
Gitea act-runners | Forge/cloud edge state (Cloudflare, OVH, GCP, Gitea, Vault) and per-app PostgreSQL databases | iac/ + postgres/ → sub-hub |
Note
Ansible is imperative and human-gated because it touches bare hosts and one-time bootstrap (disk prep, k3s install, Vault init). OpenTofu is declarative and machine-gated because its targets are reconcilable API objects (a DNS record, a bucket, a database) whose desired state belongs in version control and converges on every merge.
How a green-field lab comes up
%%{init: {'theme': 'base'}}%%
flowchart LR
classDef op fill:#1e3a8a,stroke:#1e40af,color:#fff
classDef eng fill:#059669,stroke:#047857,color:#fff
classDef host fill:#7c3aed,stroke:#6d28d9,color:#fff
classDef store fill:#b45309,stroke:#92400e,color:#fff
OP["Operator<br>at the Mac"]:::op -->|"runs playbooks 01→05"| ANS["Ansible collection<br>arcodange.factory"]:::eng
ANS -->|"OS · k3s · Longhorn · base layer"| PIS["3× Raspberry Pi<br>pi1 / pi2 / pi3"]:::host
PIS -->|"hosts Gitea + act-runners"| CI["Gitea CI<br>act-runners"]:::store
CI -->|"path-filtered apply"| TOFU["OpenTofu<br>iac/ + postgres/iac/"]:::eng
TOFU -->|"forge · cloud · PG state"| EDGE["Cloudflare · OVH · GCP<br>Gitea · Vault · PostgreSQL"]:::store
TOFU -. "state in GCS gs://arcodange-tf" .- EDGE
- The operator, working from the Mac control node, runs the numbered Ansible playbooks
01_system→05_backupin order. - Ansible lays the OS, k3s (
v1.34.3+k3s1), Longhorn, and the base layer (Pi-hole, step-ca, Vault, CrowdSec) plus the stateful out-of-cluster services (PostgreSQL + Gitea) onto the three Raspberry Pis (pi1/pi2/pi3). - Once
pi2is hosting Gitea andpi1/pi3are running the act-runners (registered by03_cicd), the forge can run CI. - A push or merge to
factorythat touchesiac/**orpostgres/**triggers the corresponding Gitea CI workflow on those runners. - The CI job authenticates to Vault via Gitea OIDC JWT and runs OpenTofu, which reconciles the forge/cloud/database edge — Cloudflare, OVH, GCP, Gitea action-secrets, Vault KV/policies, and the per-app PostgreSQL objects.
- All OpenTofu state is kept in GCS under
gs://arcodange-tf(prefixfactory/mainfor the cloud edge,factory/postgresfor the databases), so each CI run reads and writes the authoritative state remotely.
Master index
| Sub-hub | What it maps | Status |
|---|---|---|
| Ansible | The arcodange.factory collection: numbered playbooks 01–06, the inventory + group_vars, and the reusable roles that build hosts, the cluster, and the stateful services |
✅ Active |
| OpenTofu | The CI-applied IaC: the cloud/forge edge (iac/), the per-app PostgreSQL provisioning (postgres/iac/), and the Gitea-OIDC → Vault apply flow |
✅ Active |
All pages
- Ansible
- System (
01) — OS, DNS, SSL, disks, Docker, iSCSI, k3s, CoreDNS, cert-issuer, Longhorn/Traefik config - Setup (
02) — PostgreSQL + Gitea docker-compose onpi2(and the optional backup-NFS share) - CI/CD (
03) — Gitea act-runner registration onpi1/pi3and the ArgoCD/Image-Updater install - Tools (
04) — Vault + CrowdSec bootstrap into the cluster - Backup (
05) — scheduled PostgreSQL / Gitea / k3s-PVC backups to/mnt/backups - Recover (
06) — the Longhorn disaster-recovery playbooks (recover/) - Inventory & variables —
hosts.ymlgroups and thegroup_varstree - Roles reference —
deploy_docker_compose, thegitea_*family,traefik_certs,playwright, and the service sub-roles
- System (
- OpenTofu
- factory iac —
iac/: Cloudflare/OVH/GCP/Gitea/Vault edge + thecloudflare_tokenmodule - postgres iac —
postgres/iac/: per-app databases, roles, and the pgbounceruser_lookup()function - CI apply flow — the Gitea workflows, OIDC-JWT → Vault auth, and the GCS state backend
- factory iac —
Maintenance rule
Important
Alter a documented component → update its page in the same change. If you change a playbook, a role, an inventory entry, a provider version, a Tofu resource, or the CI flow, the matching page in this guidebook MUST be edited in the same PR. A provisioning map that drifts from the code sends operators (and agents) down dead paths during a rebuild or a recovery — exactly when the map matters most.
Why this guidebook earns its keep
The safe-prod-like-environment work rehearses exactly these playbooks and Tofu modules in a throwaway sandbox before they touch the real lab: the sandbox stands up the same 01–05 narrative and runs the same iac/ + postgres/iac/ apply, so the rehearsal only holds if this guidebook tracks the engines faithfully. See the safe-prod-like-environment ADR for the decision and the PRD (with its QA strategy) for what the sandbox must reproduce.
Cross-references
- Lab ecosystem guidebook — the higher-altitude whole-lab map; this guidebook is its provisioning deep dive.
- 01 · factory — the four-pillar summary of the
factoryrepo that this guidebook expands. - secrets-and-vault.md — Gitea OIDC JWT for Tofu/CI and the dynamic PostgreSQL credentials these engines set up.
- storage-and-recovery.md — Longhorn + GCS backup + the power-cut recovery the
06 · recoverplaybooks serve. - naming-conventions.md — the
<app>join key shared by the OpenTofu state prefixes and per-app PostgreSQL objects. - safe-prod-like-environment ADR · PRD — the sandbox that rehearses these engines before they touch the real lab.