[vibe](../../README.md) > [Guidebooks](../README.md) > **Factory provisioning** # Factory provisioning > **Status:** ✅ Active > **Last Updated:** 2026-06-23 > **Upstream:** [Lab ecosystem guidebook](../lab-ecosystem/README.md) · [01 · factory](../lab-ecosystem/01-factory.md) > **Related:** [safe-prod-like-environment ADR](../../ADR/0001-safe-prod-like-environment.md) · [safe-prod-like-environment PRD](../../PRD/safe-prod-like-environment/README.md) This guidebook is the deep dive into **how the `factory` repo turns three Raspberry Pis + a handful of cloud accounts into the running lab.** Where the [lab-ecosystem](../lab-ecosystem/README.md) map shows *which* components exist and how they join, this guidebook drills into the two provisioning **engines** that build and maintain them: the Ansible collection that the operator runs from the Mac, and the OpenTofu modules that Gitea CI applies. Every page below describes the engine *as it is wired right now* — playbook imports, role responsibilities, inventory placement, provider versions, state backends, and the CI flow that ties Tofu to Vault. ## Two engines, two trigger models The factory splits provisioning along a hard line: **imperative, operator-driven host/cluster build** (Ansible) versus **declarative, CI-driven forge/cloud/database state** (OpenTofu). They never overlap on the same resource, and they run at different moments. | Engine | Trigger | Runs from | Owns | Lives at | |---|---|---|---|---| | **Ansible** | One-shot, operator-run on demand | The Mac (control node) | The cluster + base layer + stateful services: k3s, Longhorn, Pi-hole, step-ca, PostgreSQL, Gitea, Vault, CrowdSec — plus the disaster-recovery playbooks | [`ansible/`](../../../ansible/) → [sub-hub](ansible/README.md) | | **OpenTofu** | CI-applied on Gitea (path-filtered `push`/`pull_request` + `workflow_dispatch`) | Gitea act-runners | Forge/cloud edge state (Cloudflare, OVH, GCP, Gitea, Vault) and **per-app PostgreSQL databases** | [`iac/`](../../../iac/) + [`postgres/`](../../../postgres/) → [sub-hub](opentofu/README.md) | > [!NOTE] > Ansible is **imperative and human-gated** because it touches bare hosts and one-time bootstrap (disk prep, k3s install, Vault init). OpenTofu is **declarative and machine-gated** because its targets are reconcilable API objects (a DNS record, a bucket, a database) whose desired state belongs in version control and converges on every merge. ## How a green-field lab comes up ```mermaid %%{init: {'theme': 'base'}}%% flowchart LR classDef op fill:#1e3a8a,stroke:#1e40af,color:#fff classDef eng fill:#059669,stroke:#047857,color:#fff classDef host fill:#7c3aed,stroke:#6d28d9,color:#fff classDef store fill:#b45309,stroke:#92400e,color:#fff OP["Operator
at the Mac"]:::op -->|"runs playbooks 01→05"| ANS["Ansible collection
arcodange.factory"]:::eng ANS -->|"OS · k3s · Longhorn · base layer"| PIS["3× Raspberry Pi
pi1 / pi2 / pi3"]:::host PIS -->|"hosts Gitea + act-runners"| CI["Gitea CI
act-runners"]:::store CI -->|"path-filtered apply"| TOFU["OpenTofu
iac/ + postgres/iac/"]:::eng TOFU -->|"forge · cloud · PG state"| EDGE["Cloudflare · OVH · GCP
Gitea · Vault · PostgreSQL"]:::store TOFU -. "state in GCS gs://arcodange-tf" .- EDGE ``` 1. The **operator**, working from the **Mac control node**, runs the numbered Ansible playbooks `01_system` → `05_backup` in order. 2. **Ansible** lays the OS, k3s (`v1.34.3+k3s1`), Longhorn, and the base layer (Pi-hole, step-ca, Vault, CrowdSec) plus the stateful out-of-cluster services (PostgreSQL + Gitea) onto the **three Raspberry Pis** (`pi1`/`pi2`/`pi3`). 3. Once `pi2` is hosting **Gitea** and `pi1`/`pi3` are running the **act-runners** (registered by `03_cicd`), the forge can run CI. 4. A push or merge to `factory` that touches `iac/**` or `postgres/**` triggers the corresponding **Gitea CI** workflow on those runners. 5. The CI job authenticates to Vault via Gitea OIDC JWT and runs **OpenTofu**, which reconciles the **forge/cloud/database edge** — Cloudflare, OVH, GCP, Gitea action-secrets, Vault KV/policies, and the per-app PostgreSQL objects. 6. All OpenTofu state is kept in **GCS** under `gs://arcodange-tf` (prefix `factory/main` for the cloud edge, `factory/postgres` for the databases), so each CI run reads and writes the authoritative state remotely. ## Master index | Sub-hub | What it maps | Status | |---|---|---| | [Ansible](ansible/README.md) | The `arcodange.factory` collection: numbered playbooks `01`–`06`, the inventory + group_vars, and the reusable roles that build hosts, the cluster, and the stateful services | ✅ Active | | [OpenTofu](opentofu/README.md) | The CI-applied IaC: the cloud/forge edge (`iac/`), the per-app PostgreSQL provisioning (`postgres/iac/`), and the Gitea-OIDC → Vault apply flow | ✅ Active | ### All pages - **Ansible** - [System (`01`)](ansible/01-system.md) — OS, DNS, SSL, disks, Docker, iSCSI, k3s, CoreDNS, cert-issuer, Longhorn/Traefik config - [Setup (`02`)](ansible/02-setup.md) — PostgreSQL + Gitea docker-compose on `pi2` (and the optional backup-NFS share) - [CI/CD (`03`)](ansible/03-cicd.md) — Gitea act-runner registration on `pi1`/`pi3` and the ArgoCD/Image-Updater install - [Tools (`04`)](ansible/04-tools.md) — Vault + CrowdSec bootstrap into the cluster - [Backup (`05`)](ansible/05-backup.md) — scheduled PostgreSQL / Gitea / k3s-PVC backups to `/mnt/backups` - [Recover (`06`)](ansible/06-recover.md) — the Longhorn disaster-recovery playbooks (`recover/`) - [Inventory & variables](ansible/inventory.md) — `hosts.yml` groups and the `group_vars` tree - [Roles reference](ansible/roles.md) — `deploy_docker_compose`, the `gitea_*` family, `traefik_certs`, `playwright`, and the service sub-roles - **OpenTofu** - [factory iac](opentofu/factory-iac.md) — `iac/`: Cloudflare/OVH/GCP/Gitea/Vault edge + the `cloudflare_token` module - [postgres iac](opentofu/postgres-iac.md) — `postgres/iac/`: per-app databases, roles, and the pgbouncer `user_lookup()` function - [CI apply flow](opentofu/ci-apply-flow.md) — the Gitea workflows, OIDC-JWT → Vault auth, and the GCS state backend ## Maintenance rule > [!IMPORTANT] > **Alter a documented component → update its page in the same change.** If you change a playbook, a role, an inventory entry, a provider version, a Tofu resource, or the CI flow, the matching page in this guidebook MUST be edited in the same PR. A provisioning map that drifts from the code sends operators (and agents) down dead paths during a rebuild or a recovery — exactly when the map matters most. ## Why this guidebook earns its keep The safe-prod-like-environment work rehearses **exactly these playbooks and Tofu modules** in a throwaway sandbox before they touch the real lab: the sandbox stands up the same `01`–`05` narrative and runs the same `iac/` + `postgres/iac/` apply, so the rehearsal only holds if this guidebook tracks the engines faithfully. See the [safe-prod-like-environment ADR](../../ADR/0001-safe-prod-like-environment.md) for the decision and the [PRD](../../PRD/safe-prod-like-environment/README.md) (with its [QA strategy](../../PRD/safe-prod-like-environment/qa-strategy.md)) for what the sandbox must reproduce. ## Cross-references - [Lab ecosystem guidebook](../lab-ecosystem/README.md) — the higher-altitude whole-lab map; this guidebook is its provisioning deep dive. - [01 · factory](../lab-ecosystem/01-factory.md) — the four-pillar summary of the `factory` repo that this guidebook expands. - [secrets-and-vault.md](../lab-ecosystem/secrets-and-vault.md) — Gitea OIDC JWT for Tofu/CI and the dynamic PostgreSQL credentials these engines set up. - [storage-and-recovery.md](../lab-ecosystem/storage-and-recovery.md) — Longhorn + GCS backup + the power-cut recovery the `06 · recover` playbooks serve. - [naming-conventions.md](../lab-ecosystem/naming-conventions.md) — the `` join key shared by the OpenTofu state prefixes and per-app PostgreSQL objects. - [safe-prod-like-environment ADR](../../ADR/0001-safe-prod-like-environment.md) · [PRD](../../PRD/safe-prod-like-environment/README.md) — the sandbox that rehearses these engines before they touch the real lab.