docs(vibe): add factory-provisioning guidebook (Ansible + OpenTofu)
Deep, code-grounded tree-docs guidebook under vibe/guidebooks/factory-provisioning/, explored from the actual playbooks/roles and tofu code: - Hub: the two provisioning engines (operator-run Ansible vs CI-applied OpenTofu), a green-field bring-up flow, master index, maintenance rule. - ansible/ sub-tree: ordered pages 01-system .. 06-recover, an inventory & variables concept page, and a Tier-1/Tier-2 roles reference (hashicorp_vault, step_ca, crowdsec, pihole, deploy_docker_compose + the gitea_* family and helpers). - opentofu/ sub-tree: factory-iac (Cloudflare/OVH/GCP/Gitea/Vault edge + cloudflare_token module), postgres-iac (per-app DB/role/pgbouncer lookup), ci-apply-flow (Gitea OIDC-JWT -> Vault -> auto-approve apply). Cross-linked bidirectionally with the lab-ecosystem guidebook and the safe-env ADR/PRD (the sandbox rehearses exactly these engines). 14 mermaid diagrams MCP-validated; zero dead links. Authored by the Lab Cartographer cohort. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
148
vibe/guidebooks/factory-provisioning/opentofu/factory-iac.md
Normal file
148
vibe/guidebooks/factory-provisioning/opentofu/factory-iac.md
Normal file
@@ -0,0 +1,148 @@
|
||||
[vibe](../../../README.md) > [Guidebooks](../../README.md) > [Factory provisioning](../README.md) > [OpenTofu](README.md) > **factory iac**
|
||||
|
||||
# factory iac — the `iac/` state root
|
||||
|
||||
> [!NOTE]
|
||||
> **Status:** ✅ active · **Last Updated:** 2026-06-23
|
||||
> **Code:** [`iac/`](../../../../iac) · **State backend:** `gs://arcodange-tf/factory/main` ([`iac/backend.tf`](../../../../iac/backend.tf))
|
||||
> **Upstream:** [OpenTofu hub](README.md) · [Factory provisioning hub](../README.md) · [Lab ecosystem · 01 factory](../../lab-ecosystem/01-factory.md)
|
||||
> **Related:** [Secrets & Vault](../../lab-ecosystem/secrets-and-vault.md) · [Storage & recovery](../../lab-ecosystem/storage-and-recovery.md) · [Naming conventions](../../lab-ecosystem/naming-conventions.md) · [CI apply flow](ci-apply-flow.md) · [postgres iac](postgres-iac.md) · [ADR-0001 safe prod-like environment](../../../ADR/0001-safe-prod-like-environment.md)
|
||||
|
||||
The `iac/` root provisions everything that lives **outside** the K3s cluster: the Cloudflare R2 backend that holds OpenTofu state itself, the per-service Cloudflare and OVH API tokens consumed by the [cms](https://gitea.arcodange.lab/arcodange-org/cms) repo, a restricted Gitea CI user for reading private module repos, and the GCS bucket that backs up Longhorn volumes. Each provisioned credential is written **both** to a Gitea Actions secret (where the consuming workflow expects it) **and** to a Vault path (the durable source of truth — see [Secrets & Vault](../../lab-ecosystem/secrets-and-vault.md)).
|
||||
|
||||
This root's state lives at `gs://arcodange-tf/factory/main` and is applied by [`.gitea/workflows/iac.yaml`](../../../../.gitea/workflows/iac.yaml) on any change under `iac/**` — see [CI apply flow](ci-apply-flow.md) for the job-by-job walkthrough.
|
||||
|
||||
---
|
||||
|
||||
## Providers
|
||||
|
||||
Declared in [`iac/providers.tf`](../../../../iac/providers.tf).
|
||||
|
||||
| Provider | Source | Version | Endpoint / scope | Auth |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| `gitea` | `go-gitea/gitea` | `0.6.0` | `https://gitea.arcodange.lab` | `GITEA_TOKEN` env var |
|
||||
| `vault` | `vault` | `4.4.0` | `https://vault.arcodange.lab` | JWT login — mount `gitea_jwt`, role `gitea_cicd` |
|
||||
| `google` | `google` | `7.0.1` | project `arcodange`, region `US-EAST1` | `GOOGLE_CREDENTIALS` env var |
|
||||
| `cloudflare` | `cloudflare/cloudflare` | `~> 5` | DNS / Pages / R2 / IAM | `CLOUDFLARE_API_TOKEN` env var |
|
||||
| `ovh` | `ovh/ovh` | `2.8.0` | endpoint `ovh-eu` | `OVH_APPLICATION_KEY` / `OVH_APPLICATION_SECRET` / `OVH_CONSUMER_KEY` |
|
||||
|
||||
> [!NOTE]
|
||||
> The Cloudflare account ID is **not** hard-coded — it is resolved at plan time from `data.cloudflare_account.arcodange` filtered on the account name `arcodange@gmail.com` ([`iac/cloudflare.tf`](../../../../iac/cloudflare.tf)) and exposed as `local.cloudflare_account_id`.
|
||||
|
||||
---
|
||||
|
||||
## Cloudflare — R2 backend bucket & service tokens
|
||||
|
||||
Defined in [`iac/cloudflare.tf`](../../../../iac/cloudflare.tf). Two tokens are minted through the [`modules/cloudflare_token`](#the-cloudflare_token-module) mechanism: one scoped to the R2 state bucket, one broad token handed to the cms repo.
|
||||
|
||||
| Resource | Type | Identity / scope | Secret destination |
|
||||
| --- | --- | --- | --- |
|
||||
| `cloudflare_r2_bucket.arcodange_tf` | R2 bucket | name `arcodange-tf`, jurisdiction `eu` | — (holds the *cms* repo's own OpenTofu state) |
|
||||
| `module.cf_r2_arcodange_tf_token` | module → `cloudflare_account_token` | account: `Workers R2 Storage Read`, `Account Settings Read`; bucket: `Workers R2 Storage Bucket Item Write` | `vault_kv_secret.cf_r2_arcodange_tf` → `kvv1/cloudflare/r2/arcodange-tf` (S3 access key, secret, `https://<account_id>.eu.r2.cloudflarestorage.com` endpoint) |
|
||||
| `vault_policy.cf_r2_arcodange_tf` | Vault policy | name `factory__cf_r2_arcodange_tf` | read on `kvv1/cloudflare/r2/arcodange-tf` **and** `kvv1/zoho/self_client` (the Zoho mail client is created manually) |
|
||||
| `module.cf_arcodange_cms_token` | module → `cloudflare_account_token` | account-scope: `Pages Write`, `Account DNS Settings Write`, `Account Settings Read`, `Zone Write`, `Zone Settings Write`, `DNS Write`, `Cloudflare Tunnel Write`, `Turnstile Sites Write` | Gitea secrets `CLOUDFLARE_API_TOKEN` + `CLOUDFLARE_ACCOUNT_ID` on the `cms` repo; Vault `kvv1/cloudflare/cms/cf_arcodange_cms_token` |
|
||||
|
||||
The `cms` repo (`data.gitea_repo.cms`, owner `arcodange-org`) receives the broad token because it manages the public site end to end: Cloudflare Pages deploys, DNS records, zone settings, the Tunnel, and Turnstile.
|
||||
|
||||
> [!CAUTION]
|
||||
> Both tokens are minted with **`expires_on = null`** — they never expire. A leaked `cf_arcodange_cms_token` grants standing DNS/Pages/Tunnel/Turnstile write on the whole account until manually revoked. There is no automatic rotation; rotation means tainting the module's `cloudflare_account_token` and re-applying.
|
||||
|
||||
---
|
||||
|
||||
## OVH — OAuth2 client for the cms domain
|
||||
|
||||
Defined in [`iac/ovh.tf`](../../../../iac/ovh.tf). A `CLIENT_CREDENTIALS` OAuth2 client lets the cms workflow edit DNS nameservers for `arcodange.fr`, constrained by an IAM policy.
|
||||
|
||||
| Resource | Type | Scope |
|
||||
| --- | --- | --- |
|
||||
| `ovh_me_api_oauth2_client.cms` | OAuth2 client | name `cms repo`, flow `CLIENT_CREDENTIALS` — "arcodange.fr management" |
|
||||
| `ovh_iam_policy.cms` | IAM policy | name `cms_manager`; identity = the OAuth2 client; resources = account URN + `urn:v1:eu:resource:domain:arcodange.fr`; allow = a handful of `me/*` reads, all domain **READ** reference-actions (computed via `data.ovh_iam_reference_actions.domain`), plus `domain:apiovh:nameServer/edit` |
|
||||
| `gitea_repository_actions_secret.ovh_cms_client_id` | Gitea secret | `OVH_CLIENT_ID` on the `cms` repo |
|
||||
| `gitea_repository_actions_secret.ovh_cms_client_secret` | Gitea secret | `OVH_CLIENT_SECRET` on the `cms` repo |
|
||||
| `vault_kv_secret.ovh_cms_token` | Vault secret | `kvv1/ovh/cms/app` — `client_id`, `client_secret`, `urn` |
|
||||
|
||||
> [!NOTE]
|
||||
> The write surface is deliberately narrow: the policy grants **only** `nameServer/edit` for writes; everything else is read-only. This lets the cms pipeline point `arcodange.fr` at Cloudflare nameservers without exposing the broader OVH account.
|
||||
|
||||
---
|
||||
|
||||
## Gitea — restricted CI module-reader user
|
||||
|
||||
Defined in [`iac/gitea_tofu_ci_user.tf`](../../../../iac/gitea_tofu_ci_user.tf). A locked-down Gitea account whose SSH key lets CI clone private Terraform module repos without exposing a privileged token.
|
||||
|
||||
| Resource | Type | Notes |
|
||||
| --- | --- | --- |
|
||||
| `random_password.tofu` | password | length 32 — the user's login password |
|
||||
| `gitea_user.tofu` | Gitea user | username `tofu_module_reader`, email `tofu-module-reader@arcodange.fake`, `restricted = true`, `visibility = private`, `prohibit_login = false` |
|
||||
| `tls_private_key.tofu` | keypair | algorithm **ED25519** |
|
||||
| `gitea_public_key.tofu` | SSH key | public half attached to `tofu_module_reader` |
|
||||
| `vault_kv_secret.gitea_admin_token` | Vault secret | `kvv1/gitea/tofu_module_reader` — `ssh_private_key` + `ssh_public_key` |
|
||||
|
||||
> [!NOTE]
|
||||
> Despite the Terraform resource name `gitea_admin_token`, the stored payload is the **SSH keypair**, not an admin token. The user is `restricted`, so it can only read repos it is explicitly granted access to.
|
||||
|
||||
---
|
||||
|
||||
## Google / GCS — Longhorn backup target
|
||||
|
||||
Defined in [`iac/gcs_backup.tf`](../../../../iac/gcs_backup.tf). A GCS bucket plus an HMAC key wired into Vault so the in-cluster Longhorn controller can pull S3-compatible backup credentials. See [Storage & recovery](../../lab-ecosystem/storage-and-recovery.md) for how this fits the cluster-recovery story.
|
||||
|
||||
| Resource | Type | Value |
|
||||
| --- | --- | --- |
|
||||
| `google_storage_bucket.longhorn_backup` | GCS bucket | name `arcodange-backup`, location `NAM4` (dual-region), `force_destroy = true`, `public_access_prevention = enforced` |
|
||||
| `google_service_account.longhorn_backup` | service account | account_id `longhorn-backup` |
|
||||
| `google_storage_bucket_iam_member.longhorn_backup` | IAM binding | `roles/storage.admin` on the bucket, member = the SA |
|
||||
| `google_storage_hmac_key.longhorn_backup` | HMAC key | S3-compatible access_id + secret for that SA |
|
||||
| `vault_kv_secret_v2.longhorn_gcs_backup` | Vault **KVv2** secret | mount `kvv2`, name `longhorn/gcs-backup`, `cas = 1`, `delete_all_versions = true` — `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_ENDPOINTS = https://storage.googleapis.com` |
|
||||
| `vault_policy.longhorn_gcs_backup` | Vault policy | name `longhorn-gcs-backup` — read on `kvv2/data/longhorn/gcs-backup` |
|
||||
| `vault_kubernetes_auth_backend_role.longhorn` | Vault k8s auth role | role `longhorn`, bound SA `longhorn-vault-secret-reader` in namespace `longhorn-system`, audience `vault`, policy `longhorn-gcs-backup` |
|
||||
|
||||
The bound service-account name `longhorn-vault-secret-reader` must match the `VaultAuth` manifest in-cluster — that's the handshake that lets Longhorn read the HMAC creds at runtime.
|
||||
|
||||
> [!WARNING]
|
||||
> The HMAC key is an **S3-compatible** credential and is weaker than a native GCS service-account key: it is a long-lived static secret with no key rotation built into this config, and `roles/storage.admin` grants full read/write/delete on the backup bucket. Combined with `force_destroy = true`, a state operation that destroys `arcodange-backup` will delete every Longhorn backup without prompting. Treat this bucket as critical and irreplaceable infrastructure.
|
||||
|
||||
---
|
||||
|
||||
## The `cloudflare_token` module
|
||||
|
||||
Source: [`iac/modules/cloudflare_token/`](../../../../iac/modules/cloudflare_token). This local module turns **human-readable permission names** into a working Cloudflare account token, so callers never hard-code permission-group UUIDs.
|
||||
|
||||
How it works ([`main.tf`](../../../../iac/modules/cloudflare_token/main.tf)):
|
||||
|
||||
1. It reads **all** available permission groups via `data.cloudflare_account_api_token_permission_groups_list`, then builds `local.permission_map`: `"<scope>:<name>" => id` (e.g. `"account:Pages Write" => <uuid>`), keyed by the last dotted segment of the group's scope.
|
||||
2. Caller-supplied names (`var.permissions.account` / `var.permissions.bucket`) are looked up against that map; any name with no match lands in `local.missing_permissions` and trips a **`precondition`** that fails the apply with a clear "Permissions introuvables" error.
|
||||
3. Policies are assembled dynamically — an `account` policy targeting `com.cloudflare.api.account.<id>` and, if `var.bucket` is set, a `bucket` policy targeting `com.cloudflare.edge.r2.bucket.<id>_<jurisdiction>_<name>`.
|
||||
4. The `cloudflare_account_token.token` resource sets `expires_on = null` and **ignores** drift on `expires_on` and `policies` (the upstream permission IDs are unstable). Instead, a `null_resource.cloudflare_account_token_replace` hashes the **sorted permission names** into its triggers, and `replace_triggered_by` forces a fresh token whenever the *names* change — surviving id churn while still rotating on a real permission change.
|
||||
5. Outputs ([`outputs.tf`](../../../../iac/modules/cloudflare_token/outputs.tf)): `token` (sensitive), `token_id`, `token_sha256`, and — when `var.bucket` is set — `r2_credentials` mapping `access_key_id = token.id` and `secret_access_key = sha256(token.value)` for S3-compatible R2 access.
|
||||
|
||||
---
|
||||
|
||||
## Vault layout: mixed KVv1 / KVv2
|
||||
|
||||
This root writes to **both** KV engines, which is easy to trip over.
|
||||
|
||||
| Path | Engine | Written by |
|
||||
| --- | --- | --- |
|
||||
| `kvv1/cloudflare/r2/arcodange-tf` | KVv1 (`vault_kv_secret`) | R2 backend token |
|
||||
| `kvv1/cloudflare/cms/cf_arcodange_cms_token` | KVv1 | cms Cloudflare token |
|
||||
| `kvv1/ovh/cms/app` | KVv1 | OVH OAuth2 client |
|
||||
| `kvv1/gitea/tofu_module_reader` | KVv1 | CI user SSH key |
|
||||
| `kvv2/longhorn/gcs-backup` | KVv2 (`vault_kv_secret_v2`) | Longhorn GCS HMAC |
|
||||
|
||||
> [!WARNING]
|
||||
> Most secrets here use the **KVv1** engine (`vault_kv_secret`), but the Longhorn backup secret uses **KVv2** (`vault_kv_secret_v2`). The policy paths differ accordingly — KVv2 reads target `kvv2/data/longhorn/gcs-backup` (note the `/data/` segment), whereas KVv1 policies read the literal path. Mixing the two engines means a policy copied from one secret to another will silently grant nothing. See [Secrets & Vault](../../lab-ecosystem/secrets-and-vault.md) for the engine-level design.
|
||||
|
||||
---
|
||||
|
||||
## Outputs
|
||||
|
||||
The root exposes a single top-level `output "token"` (sensitive) = the cms Cloudflare token ([`iac/cloudflare.tf`](../../../../iac/cloudflare.tf)). Everything else is delivered side-effect-style into Gitea secrets and Vault paths rather than as Terraform outputs.
|
||||
|
||||
---
|
||||
|
||||
## See also
|
||||
|
||||
- [CI apply flow](ci-apply-flow.md) — how `iac/**` changes reach `gs://arcodange-tf/factory/main` via the Vault-JWT exchange and auto-approve apply.
|
||||
- [postgres iac](postgres-iac.md) — the sibling root that provisions in-cluster PostgreSQL.
|
||||
- [Secrets & Vault](../../lab-ecosystem/secrets-and-vault.md) · [Storage & recovery](../../lab-ecosystem/storage-and-recovery.md) · [Naming conventions](../../lab-ecosystem/naming-conventions.md).
|
||||
Reference in New Issue
Block a user