[vibe](../../README.md) > [Guidebooks](../README.md) > **Factory provisioning**
# Factory provisioning
> **Status:** ✅ Active
> **Last Updated:** 2026-06-23
> **Upstream:** [Lab ecosystem guidebook](../lab-ecosystem/README.md) · [01 · factory](../lab-ecosystem/01-factory.md)
> **Related:** [safe-prod-like-environment ADR](../../ADR/0001-safe-prod-like-environment.md) · [safe-prod-like-environment PRD](../../PRD/safe-prod-like-environment/README.md)
This guidebook is the deep dive into **how the `factory` repo turns three Raspberry Pis + a handful of cloud accounts into the running lab.** Where the [lab-ecosystem](../lab-ecosystem/README.md) map shows *which* components exist and how they join, this guidebook drills into the two provisioning **engines** that build and maintain them: the Ansible collection that the operator runs from the Mac, and the OpenTofu modules that Gitea CI applies. Every page below describes the engine *as it is wired right now* — playbook imports, role responsibilities, inventory placement, provider versions, state backends, and the CI flow that ties Tofu to Vault.
## Two engines, two trigger models
The factory splits provisioning along a hard line: **imperative, operator-driven host/cluster build** (Ansible) versus **declarative, CI-driven forge/cloud/database state** (OpenTofu). They never overlap on the same resource, and they run at different moments.
| Engine | Trigger | Runs from | Owns | Lives at |
|---|---|---|---|---|
| **Ansible** | One-shot, operator-run on demand | The Mac (control node) | The cluster + base layer + stateful services: k3s, Longhorn, Pi-hole, step-ca, PostgreSQL, Gitea, Vault, CrowdSec — plus the disaster-recovery playbooks | [`ansible/`](../../../ansible/) → [sub-hub](ansible/README.md) |
| **OpenTofu** | CI-applied on Gitea (path-filtered `push`/`pull_request` + `workflow_dispatch`) | Gitea act-runners | Forge/cloud edge state (Cloudflare, OVH, GCP, Gitea, Vault) and **per-app PostgreSQL databases** | [`iac/`](../../../iac/) + [`postgres/`](../../../postgres/) → [sub-hub](opentofu/README.md) |
> [!NOTE]
> Ansible is **imperative and human-gated** because it touches bare hosts and one-time bootstrap (disk prep, k3s install, Vault init). OpenTofu is **declarative and machine-gated** because its targets are reconcilable API objects (a DNS record, a bucket, a database) whose desired state belongs in version control and converges on every merge.
## How a green-field lab comes up
```mermaid
%%{init: {'theme': 'base'}}%%
flowchart LR
classDef op fill:#1e3a8a,stroke:#1e40af,color:#fff
classDef eng fill:#059669,stroke:#047857,color:#fff
classDef host fill:#7c3aed,stroke:#6d28d9,color:#fff
classDef store fill:#b45309,stroke:#92400e,color:#fff
OP["Operator
at the Mac"]:::op -->|"runs playbooks 01→05"| ANS["Ansible collection
arcodange.factory"]:::eng
ANS -->|"OS · k3s · Longhorn · base layer"| PIS["3× Raspberry Pi
pi1 / pi2 / pi3"]:::host
PIS -->|"hosts Gitea + act-runners"| CI["Gitea CI
act-runners"]:::store
CI -->|"path-filtered apply"| TOFU["OpenTofu
iac/ + postgres/iac/"]:::eng
TOFU -->|"forge · cloud · PG state"| EDGE["Cloudflare · OVH · GCP
Gitea · Vault · PostgreSQL"]:::store
TOFU -. "state in GCS gs://arcodange-tf" .- EDGE
```
1. The **operator**, working from the **Mac control node**, runs the numbered Ansible playbooks `01_system` → `05_backup` in order.
2. **Ansible** lays the OS, k3s (`v1.34.3+k3s1`), Longhorn, and the base layer (Pi-hole, step-ca, Vault, CrowdSec) plus the stateful out-of-cluster services (PostgreSQL + Gitea) onto the **three Raspberry Pis** (`pi1`/`pi2`/`pi3`).
3. Once `pi2` is hosting **Gitea** and `pi1`/`pi3` are running the **act-runners** (registered by `03_cicd`), the forge can run CI.
4. A push or merge to `factory` that touches `iac/**` or `postgres/**` triggers the corresponding **Gitea CI** workflow on those runners.
5. The CI job authenticates to Vault via Gitea OIDC JWT and runs **OpenTofu**, which reconciles the **forge/cloud/database edge** — Cloudflare, OVH, GCP, Gitea action-secrets, Vault KV/policies, and the per-app PostgreSQL objects.
6. All OpenTofu state is kept in **GCS** under `gs://arcodange-tf` (prefix `factory/main` for the cloud edge, `factory/postgres` for the databases), so each CI run reads and writes the authoritative state remotely.
## Master index
| Sub-hub | What it maps | Status |
|---|---|---|
| [Ansible](ansible/README.md) | The `arcodange.factory` collection: numbered playbooks `01`–`06`, the inventory + group_vars, and the reusable roles that build hosts, the cluster, and the stateful services | ✅ Active |
| [OpenTofu](opentofu/README.md) | The CI-applied IaC: the cloud/forge edge (`iac/`), the per-app PostgreSQL provisioning (`postgres/iac/`), and the Gitea-OIDC → Vault apply flow | ✅ Active |
### All pages
- **Ansible**
- [System (`01`)](ansible/01-system.md) — OS, DNS, SSL, disks, Docker, iSCSI, k3s, CoreDNS, cert-issuer, Longhorn/Traefik config
- [Setup (`02`)](ansible/02-setup.md) — PostgreSQL + Gitea docker-compose on `pi2` (and the optional backup-NFS share)
- [CI/CD (`03`)](ansible/03-cicd.md) — Gitea act-runner registration on `pi1`/`pi3` and the ArgoCD/Image-Updater install
- [Tools (`04`)](ansible/04-tools.md) — Vault + CrowdSec bootstrap into the cluster
- [Backup (`05`)](ansible/05-backup.md) — scheduled PostgreSQL / Gitea / k3s-PVC backups to `/mnt/backups`
- [Recover (`06`)](ansible/06-recover.md) — the Longhorn disaster-recovery playbooks (`recover/`)
- [Inventory & variables](ansible/inventory.md) — `hosts.yml` groups and the `group_vars` tree
- [Roles reference](ansible/roles.md) — `deploy_docker_compose`, the `gitea_*` family, `traefik_certs`, `playwright`, and the service sub-roles
- **OpenTofu**
- [factory iac](opentofu/factory-iac.md) — `iac/`: Cloudflare/OVH/GCP/Gitea/Vault edge + the `cloudflare_token` module
- [postgres iac](opentofu/postgres-iac.md) — `postgres/iac/`: per-app databases, roles, and the pgbouncer `user_lookup()` function
- [CI apply flow](opentofu/ci-apply-flow.md) — the Gitea workflows, OIDC-JWT → Vault auth, and the GCS state backend
## Maintenance rule
> [!IMPORTANT]
> **Alter a documented component → update its page in the same change.** If you change a playbook, a role, an inventory entry, a provider version, a Tofu resource, or the CI flow, the matching page in this guidebook MUST be edited in the same PR. A provisioning map that drifts from the code sends operators (and agents) down dead paths during a rebuild or a recovery — exactly when the map matters most.
## Why this guidebook earns its keep
The safe-prod-like-environment work rehearses **exactly these playbooks and Tofu modules** in a throwaway sandbox before they touch the real lab: the sandbox stands up the same `01`–`05` narrative and runs the same `iac/` + `postgres/iac/` apply, so the rehearsal only holds if this guidebook tracks the engines faithfully. See the [safe-prod-like-environment ADR](../../ADR/0001-safe-prod-like-environment.md) for the decision and the [PRD](../../PRD/safe-prod-like-environment/README.md) (with its [QA strategy](../../PRD/safe-prod-like-environment/qa-strategy.md)) for what the sandbox must reproduce.
## Cross-references
- [Lab ecosystem guidebook](../lab-ecosystem/README.md) — the higher-altitude whole-lab map; this guidebook is its provisioning deep dive.
- [01 · factory](../lab-ecosystem/01-factory.md) — the four-pillar summary of the `factory` repo that this guidebook expands.
- [secrets-and-vault.md](../lab-ecosystem/secrets-and-vault.md) — Gitea OIDC JWT for Tofu/CI and the dynamic PostgreSQL credentials these engines set up.
- [storage-and-recovery.md](../lab-ecosystem/storage-and-recovery.md) — Longhorn + GCS backup + the power-cut recovery the `06 · recover` playbooks serve.
- [naming-conventions.md](../lab-ecosystem/naming-conventions.md) — the `` join key shared by the OpenTofu state prefixes and per-app PostgreSQL objects.
- [safe-prod-like-environment ADR](../../ADR/0001-safe-prod-like-environment.md) · [PRD](../../PRD/safe-prod-like-environment/README.md) — the sandbox that rehearses these engines before they touch the real lab.