docs(vibe): add factory-provisioning guidebook (Ansible + OpenTofu)
Deep, code-grounded tree-docs guidebook under vibe/guidebooks/factory-provisioning/, explored from the actual playbooks/roles and tofu code: - Hub: the two provisioning engines (operator-run Ansible vs CI-applied OpenTofu), a green-field bring-up flow, master index, maintenance rule. - ansible/ sub-tree: ordered pages 01-system .. 06-recover, an inventory & variables concept page, and a Tier-1/Tier-2 roles reference (hashicorp_vault, step_ca, crowdsec, pihole, deploy_docker_compose + the gitea_* family and helpers). - opentofu/ sub-tree: factory-iac (Cloudflare/OVH/GCP/Gitea/Vault edge + cloudflare_token module), postgres-iac (per-app DB/role/pgbouncer lookup), ci-apply-flow (Gitea OIDC-JWT -> Vault -> auto-approve apply). Cross-linked bidirectionally with the lab-ecosystem guidebook and the safe-env ADR/PRD (the sandbox rehearses exactly these engines). 14 mermaid diagrams MCP-validated; zero dead links. Authored by the Lab Cartographer cohort. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
95
vibe/guidebooks/factory-provisioning/opentofu/README.md
Normal file
95
vibe/guidebooks/factory-provisioning/opentofu/README.md
Normal file
@@ -0,0 +1,95 @@
|
||||
[vibe](../../../README.md) > [Guidebooks](../../README.md) > [Factory provisioning](../README.md) > **OpenTofu**
|
||||
|
||||
# OpenTofu — factory provisioning
|
||||
|
||||
> [!NOTE]
|
||||
> **Status:** ✅ active · **Last Updated:** 2026-06-23
|
||||
> **Upstream:** [Factory provisioning hub](../README.md) · [Lab ecosystem · 01 factory](../../lab-ecosystem/01-factory.md)
|
||||
> **Downstream:** [factory iac](factory-iac.md) · [postgres iac](postgres-iac.md) · [CI apply flow](ci-apply-flow.md)
|
||||
> **Related:** [Secrets & Vault](../../lab-ecosystem/secrets-and-vault.md) · [Storage & recovery](../../lab-ecosystem/storage-and-recovery.md) · [Naming conventions](../../lab-ecosystem/naming-conventions.md) · [ADR-0001 safe prod-like environment](../../../ADR/0001-safe-prod-like-environment.md)
|
||||
|
||||
OpenTofu is the **declarative half** of the factory: it provisions everything that lives *outside* the K3s cluster — Gitea repos & CI users, Vault policies, Cloudflare DNS, OVH domains, a GCS backup bucket, and the in-cluster PostgreSQL roles/databases. The imperative half (the cluster itself) is built by [Ansible](../ansible/README.md).
|
||||
|
||||
OpenTofu is pinned to **`1.8.2`** in CI (`OPENTOFU_VERSION`).
|
||||
|
||||
---
|
||||
|
||||
## Two independent state roots
|
||||
|
||||
There are **two separate Terraform/OpenTofu roots**, each with its own `backend.tf`, its own GCS state prefix, its own provider set, and its own CI workflow. They never share state and can be applied independently.
|
||||
|
||||
| Root | Code path | State backend (GCS) | Triggered by |
|
||||
| --- | --- | --- | --- |
|
||||
| **factory iac** | [`iac/`](../../../../iac) | `gs://arcodange-tf/factory/main` | changes under `iac/**` → [`.gitea/workflows/iac.yaml`](../../../../.gitea/workflows/iac.yaml) |
|
||||
| **postgres iac** | [`postgres/iac/`](../../../../postgres/iac) | `gs://arcodange-tf/factory/postgres` | changes under `postgres/**` → [`.gitea/workflows/postgres.yaml`](../../../../.gitea/workflows/postgres.yaml) |
|
||||
|
||||
> [!NOTE]
|
||||
> Both roots share the same GCS **bucket** (`arcodange-tf`) but live under **distinct prefixes** (`factory/main` vs `factory/postgres`), so their state objects never collide.
|
||||
|
||||
---
|
||||
|
||||
## Providers
|
||||
|
||||
| Provider | Version | Endpoint / scope | Auth |
|
||||
| --- | --- | --- | --- |
|
||||
| `go-gitea/gitea` | `0.6.0` | `https://gitea.arcodange.lab` | `GITEA_TOKEN` env var |
|
||||
| `vault` | `4.4.0` | `https://vault.arcodange.lab` | JWT login — mount `gitea_jwt`, role `gitea_cicd` |
|
||||
| `google` | `7.0.1` | project `arcodange`, region `US-EAST1` | `GOOGLE_CREDENTIALS` (factory) / `GOOGLE_BACKEND_CREDENTIALS` (postgres backend) |
|
||||
| `cloudflare/cloudflare` | `~> 5` | DNS / IAM | `CLOUDFLARE_API_TOKEN` env var |
|
||||
| `ovh/ovh` | `2.8.0` | endpoint `ovh-eu` | `OVH_APPLICATION_KEY` / `OVH_APPLICATION_SECRET` / `OVH_CONSUMER_KEY` |
|
||||
| `cyrilgdn/postgresql` | `1.24.0` | `192.168.1.202` (pi2), `superuser` | `POSTGRES_USERNAME` / `POSTGRES_PASSWORD` (TF vars) |
|
||||
|
||||
The first five providers belong to the **factory iac** root ([`iac/providers.tf`](../../../../iac/providers.tf)); the **postgres iac** root ([`postgres/iac/providers.tf`](../../../../postgres/iac/providers.tf)) declares only `postgresql` + `vault`. Both roots configure the `vault` provider identically (JWT, mount `gitea_jwt`, role `gitea_cicd`).
|
||||
|
||||
---
|
||||
|
||||
## The Vault-JWT auth model
|
||||
|
||||
Neither root carries long-lived Vault credentials. Instead CI mints a short-lived Gitea OIDC token and exchanges it for Vault access:
|
||||
|
||||
1. A first job decodes the base64 secret **`vault_oauth__sh_b64`** and runs it (`base64 -d | bash`), producing a **Gitea OIDC JWT** as a job output (`gitea_vault_jwt`).
|
||||
2. That JWT is exported into the apply job as **`TERRAFORM_VAULT_AUTH_JWT`**.
|
||||
3. The `vault` provider's `auth_login_jwt` block consumes it against mount `gitea_jwt` / role `gitea_cicd`, yielding a scoped Vault token used to read the per-provider secrets (Google creds, Gitea token, Cloudflare token, OVH app keys, Postgres creds).
|
||||
|
||||
See [Secrets & Vault](../../lab-ecosystem/secrets-and-vault.md) for the full Vault policy/mount design and [CI apply flow](ci-apply-flow.md) for the job-by-job walkthrough.
|
||||
|
||||
---
|
||||
|
||||
## CI apply flow
|
||||
|
||||
Both workflows share the same two-job shape: authenticate, then apply. The trigger paths differ (`iac/**` vs `postgres/**`) but the structure is identical.
|
||||
|
||||
```mermaid
|
||||
%%{init: {'theme':'base', 'themeVariables': {'primaryColor':'#1f2937','primaryTextColor':'#f9fafb','lineColor':'#6b7280','fontSize':'14px'}}}%%
|
||||
flowchart TD
|
||||
classDef trigger fill:#1e3a5f,stroke:#3b82f6,color:#f9fafb;
|
||||
classDef job fill:#1e4620,stroke:#22c55e,color:#f0fdf4;
|
||||
classDef danger fill:#5f1e1e,stroke:#ef4444,color:#fef2f2;
|
||||
|
||||
push["push / PR touching<br/> iac/** or postgres/**"]:::trigger
|
||||
auth["job: gitea_vault_auth<br/>decode vault_oauth__sh_b64<br/> mint Gitea OIDC JWT"]:::job
|
||||
tofu["job: tofu<br/>read Vault secrets via JWT<br/> set provider env vars"]:::job
|
||||
apply["dflook/terraform-apply@v1<br/> auto_approve: true"]:::danger
|
||||
|
||||
push --> auth
|
||||
auth -- "gitea_vault_jwt output" --> tofu
|
||||
tofu --> apply
|
||||
```
|
||||
|
||||
1. A **push or PR** that touches files under `iac/**` (factory) or `postgres/**` (postgres) starts the matching workflow; `workflow_dispatch` allows a manual run.
|
||||
2. The **`gitea_vault_auth`** job decodes `vault_oauth__sh_b64` and emits the Gitea OIDC JWT as `gitea_vault_jwt`.
|
||||
3. The **`tofu`** job (`needs: gitea_vault_auth`) sets `TERRAFORM_VAULT_AUTH_JWT` from that output, reads the provider secrets out of Vault, and prepares the homelab CA cert (`VAULT_CACERT`).
|
||||
4. The job runs **`dflook/terraform-apply@v1`** against the root's `path` (`iac` or `postgres/iac`) with **`auto_approve: true`**.
|
||||
|
||||
> [!CAUTION]
|
||||
> **Applies are auto-approve.** There is no manual plan-review gate — once a change to `iac/**` or `postgres/**` lands on `main`, CI applies it to the real Gitea, Vault, Cloudflare, OVH, GCS, and PostgreSQL targets without further confirmation. Treat every merge as a production change and review the diff *before* merging, not after. This trade-off is recorded in [ADR-0001 · safe prod-like environment](../../../ADR/0001-safe-prod-like-environment.md).
|
||||
|
||||
---
|
||||
|
||||
## Index
|
||||
|
||||
| Page | Covers | State |
|
||||
| --- | --- | --- |
|
||||
| [factory iac](factory-iac.md) | `iac/` root — Gitea, Vault, Google/GCS backup, Cloudflare, OVH | ✅ |
|
||||
| [postgres iac](postgres-iac.md) | `postgres/iac/` root — PostgreSQL roles & databases on pi2 | ✅ |
|
||||
| [CI apply flow](ci-apply-flow.md) | Both Gitea workflows, the Vault-JWT exchange, auto-approve apply | ✅ |
|
||||
114
vibe/guidebooks/factory-provisioning/opentofu/ci-apply-flow.md
Normal file
114
vibe/guidebooks/factory-provisioning/opentofu/ci-apply-flow.md
Normal file
@@ -0,0 +1,114 @@
|
||||
[vibe](../../../README.md) > [Guidebooks](../../README.md) > [Factory provisioning](../README.md) > [OpenTofu](README.md) > **CI apply flow**
|
||||
|
||||
# CI apply flow
|
||||
|
||||
> [!NOTE]
|
||||
> **Status:** ✅ active · **Last Updated:** 2026-06-23
|
||||
> **Upstream:** [`.gitea/workflows/iac.yaml`](../../../../.gitea/workflows/iac.yaml), [`.gitea/workflows/postgres.yaml`](../../../../.gitea/workflows/postgres.yaml)
|
||||
> **Downstream:** [factory iac](factory-iac.md), [postgres iac](postgres-iac.md)
|
||||
> **Related:** [Secrets & Vault](../../lab-ecosystem/secrets-and-vault.md) · [ADR-0001 · Safe prod-like environment](../../../ADR/0001-safe-prod-like-environment.md) · [QA strategy](../../../PRD/safe-prod-like-environment/qa-strategy.md)
|
||||
|
||||
Two Gitea Actions workflows turn every commit that touches the OpenTofu code into a live `apply`. `IAC` ([`.gitea/workflows/iac.yaml`](../../../../.gitea/workflows/iac.yaml)) drives the factory infrastructure under [`iac/`](../../../../iac/); `Postgres` ([`.gitea/workflows/postgres.yaml`](../../../../.gitea/workflows/postgres.yaml)) drives the database stack under [`postgres/iac/`](../../../../postgres/). They share the same two-job shape: a short OIDC-auth job feeds a Vault JWT to a `tofu` job that reads secrets and runs `terraform apply`.
|
||||
|
||||
> [!CAUTION]
|
||||
> **`auto_approve: true` means every merge to `main` applies immediately — there is no plan-gate.** The `dflook/terraform-apply@v1` step skips the interactive approval, so any change that lands on `main` (or any matched `push`) rewrites real cloud and homelab state without a human reviewing the plan. Mitigations are entirely upstream of CI: (1) **mandatory code review** on the PR before merge, and (2) **least-privilege Vault policies** on the `gitea_cicd` role so a runaway apply can only touch the resources its token is scoped to. See [ADR-0001](../../../ADR/0001-safe-prod-like-environment.md): the sandbox lane runs the *same* tofu but **plan-only** against a `sandbox/` state prefix and a throwaway DNS zone, so contributors can validate changes without an auto-apply.
|
||||
|
||||
## Triggers
|
||||
|
||||
Both workflows fire on the same three events; only the watched path globs differ.
|
||||
|
||||
| Event | `IAC` (factory) | `Postgres` |
|
||||
| --- | --- | --- |
|
||||
| `push` | `iac/*.tf`, `iac/*.tfvars`, `iac/**/*.tf`, `iac/**/*.tfvars` | `postgres/**/*.tf`, `postgres/**/*.tfvars` |
|
||||
| `pull_request` | same globs (YAML anchor `*tofuPaths`) | same globs (YAML anchor `*postgresTofuPaths`) |
|
||||
| `workflow_dispatch` | manual, no inputs | manual, no inputs |
|
||||
|
||||
> [!IMPORTANT]
|
||||
> `concurrency` is keyed on `${{ github.ref }}-${{ github.workflow }}` with `cancel-in-progress: true`, so a newer push to the same branch cancels an in-flight run. A `pull_request` event triggers the workflow — but the `apply` still runs, so the safety contract is "review **before** merge", not "CI only plans on PRs".
|
||||
|
||||
## Job 1 — `gitea_vault_auth`
|
||||
|
||||
Mints a Gitea OIDC token that Vault will trust. The whole job is one step:
|
||||
|
||||
```bash
|
||||
echo -n "${{ secrets.vault_oauth__sh_b64 }}" | base64 -d | bash
|
||||
```
|
||||
|
||||
| Field | Value |
|
||||
| --- | --- |
|
||||
| Runner | `ubuntu-latest` |
|
||||
| Secret consumed | `vault_oauth__sh_b64` — a base64-encoded shell script |
|
||||
| Step id | `gitea_vault_jwt` |
|
||||
| Output | `gitea_vault_jwt` ← `steps.gitea_vault_jwt.outputs.id_token` |
|
||||
|
||||
The decoded script asks Gitea for an OIDC `id_token` and emits it as a step output. The `tofu` job declares `needs: [gitea_vault_auth]` so it receives `needs.gitea_vault_auth.outputs.gitea_vault_jwt`.
|
||||
|
||||
## Job 2 — `tofu`
|
||||
|
||||
| Field | `IAC` | `Postgres` |
|
||||
| --- | --- | --- |
|
||||
| Job name | `Tofu` | `Tofu - Postgres` |
|
||||
| `needs` | `gitea_vault_auth` | `gitea_vault_auth` |
|
||||
| `OPENTOFU_VERSION` | `1.8.2` | `1.8.2` |
|
||||
| `TERRAFORM_VAULT_AUTH_JWT` | `needs.gitea_vault_auth.outputs.gitea_vault_jwt` | same |
|
||||
| `VAULT_CACERT` | `${{ github.workspace }}/homelab.pem` | same |
|
||||
| Apply path | `iac` | `postgres/iac` |
|
||||
|
||||
Step order inside the job:
|
||||
|
||||
1. **read vault secret** — the shared `*vault_step` anchor (see below).
|
||||
2. **`actions/checkout@v4`** — pull the repo into the workspace.
|
||||
3. **prepare vault self signed cert** — `echo -n "${{ secrets.HOMELAB_CA_CERT }}" | base64 -d > $VAULT_CACERT`, writing the homelab CA to `homelab.pem` so the runner trusts `https://vault.arcodange.lab`.
|
||||
4. **terraform apply** — `dflook/terraform-apply@v1` with the path above and `auto_approve: true`.
|
||||
|
||||
### Vault secret reads (`*vault_step`)
|
||||
|
||||
The `read vault secret` step uses [`arcodange-org/vault-action`](https://gitea.arcodange.lab/arcodange-org/vault-action), authenticating with `method: jwt`, `path: gitea_jwt`, `role: gitea_cicd`, `url: https://vault.arcodange.lab`, `caCertificate: ${{ secrets.HOMELAB_CA_CERT }}`, and `jwtGiteaOIDC` set to the auth job's output. The secrets it exports into the job env differ per workflow:
|
||||
|
||||
| Workflow | Vault path | Selector | Exported as |
|
||||
| --- | --- | --- | --- |
|
||||
| `IAC` | `kvv1/google/credentials` | `credentials` | `GOOGLE_CREDENTIALS` |
|
||||
| `IAC` | `kvv1/admin/gitea` | `token` | `GITEA_TOKEN` |
|
||||
| `IAC` | `kvv1/admin/cloudflare` | `iam_token` | `CLOUDFLARE_API_TOKEN` |
|
||||
| `IAC` | `kvv1/admin/ovh/app` | `*` (all keys) | `OVH_*` |
|
||||
| `Postgres` | `kvv1/google/credentials` | `credentials` | `GOOGLE_BACKEND_CREDENTIALS` |
|
||||
| `Postgres` | `kvv1/postgres/credentials` | `*` (all keys) | `TF_VAR_postgres_*` |
|
||||
|
||||
`GOOGLE_CREDENTIALS` / `GOOGLE_BACKEND_CREDENTIALS` authenticate the GCS state backend; the `TF_VAR_postgres_*` fan-out feeds the Postgres module's input variables directly. See [Secrets & Vault](../../lab-ecosystem/secrets-and-vault.md) for how the `gitea_cicd` role and KV v1 mounts are provisioned.
|
||||
|
||||
## End-to-end flow
|
||||
|
||||
```mermaid
|
||||
%%{init: {'theme': 'base'}}%%
|
||||
flowchart TD
|
||||
push["push / PR / workflow_dispatch<br>on iac/** or postgres/** .tf .tfvars"] --> auth["job: gitea_vault_auth<br>base64 -d | bash -> Gitea OIDC id_token"]
|
||||
auth -->|"gitea_vault_jwt output"| tofu["job: tofu<br>OPENTOFU_VERSION 1.8.2"]
|
||||
tofu --> readvault["read vault secret<br>vault-action jwt role gitea_cicd"]
|
||||
readvault -->|"GOOGLE_CREDENTIALS, TF_VAR_postgres_*, ..."| init["tofu init<br>GCS backend, state prefix"]
|
||||
init --> apply["dflook/terraform-apply@v1<br>auto_approve: true"]
|
||||
apply --> state["state updated in GCS<br>real cloud + homelab mutated"]
|
||||
|
||||
classDef trigger fill:#1f3a5f,stroke:#7fb0ff,color:#eaf2ff;
|
||||
classDef job fill:#3a2f5f,stroke:#b39dff,color:#f3eeff;
|
||||
classDef secret fill:#5f3a2f,stroke:#ffb38a,color:#fff1e8;
|
||||
classDef danger fill:#5f1f2f,stroke:#ff8a9d,color:#ffe8ec;
|
||||
class push trigger;
|
||||
class auth,tofu,init job;
|
||||
class readvault secret;
|
||||
class apply,state danger;
|
||||
```
|
||||
|
||||
1. A **push**, **pull_request**, or **workflow_dispatch** event matching the `iac/**` or `postgres/**` path globs starts the workflow.
|
||||
2. Job **`gitea_vault_auth`** runs `base64 -d | bash` on the `vault_oauth__sh_b64` secret to obtain a Gitea OIDC `id_token`, published as the `gitea_vault_jwt` output.
|
||||
3. Job **`tofu`** (gated by `needs: gitea_vault_auth`) starts on `ubuntu-latest` with `OPENTOFU_VERSION 1.8.2` and `TERRAFORM_VAULT_AUTH_JWT` set to that output.
|
||||
4. The **read vault secret** step exchanges the JWT (role `gitea_cicd`, path `gitea_jwt`) for the workflow's secrets and exports them as env vars (`GOOGLE_CREDENTIALS` / `GOOGLE_BACKEND_CREDENTIALS`, `GITEA_TOKEN`, `CLOUDFLARE_API_TOKEN`, `OVH_*`, or `TF_VAR_postgres_*`).
|
||||
5. **`tofu init`** configures the GCS backend, binding the working dir to its state prefix using the Google credentials just read.
|
||||
6. **`dflook/terraform-apply@v1`** runs against `iac` (or `postgres/iac`) with `auto_approve: true` — no plan-gate.
|
||||
7. The **state** in GCS is updated and the real cloud + homelab resources are mutated to match the committed code.
|
||||
|
||||
## Related pages
|
||||
|
||||
- [factory iac](factory-iac.md) — what the `iac/` stack provisions (the `IAC` workflow's target).
|
||||
- [postgres iac](postgres-iac.md) — the `postgres/iac/` database stack (the `Postgres` workflow's target).
|
||||
- [Secrets & Vault](../../lab-ecosystem/secrets-and-vault.md) — the `gitea_cicd` role, OIDC trust, and KV mounts behind every secret read here.
|
||||
- [ADR-0001 · Safe prod-like environment](../../../ADR/0001-safe-prod-like-environment.md) — the sandbox lane runs the same tofu plan-only against a `sandbox/` state prefix and a throwaway zone.
|
||||
148
vibe/guidebooks/factory-provisioning/opentofu/factory-iac.md
Normal file
148
vibe/guidebooks/factory-provisioning/opentofu/factory-iac.md
Normal file
@@ -0,0 +1,148 @@
|
||||
[vibe](../../../README.md) > [Guidebooks](../../README.md) > [Factory provisioning](../README.md) > [OpenTofu](README.md) > **factory iac**
|
||||
|
||||
# factory iac — the `iac/` state root
|
||||
|
||||
> [!NOTE]
|
||||
> **Status:** ✅ active · **Last Updated:** 2026-06-23
|
||||
> **Code:** [`iac/`](../../../../iac) · **State backend:** `gs://arcodange-tf/factory/main` ([`iac/backend.tf`](../../../../iac/backend.tf))
|
||||
> **Upstream:** [OpenTofu hub](README.md) · [Factory provisioning hub](../README.md) · [Lab ecosystem · 01 factory](../../lab-ecosystem/01-factory.md)
|
||||
> **Related:** [Secrets & Vault](../../lab-ecosystem/secrets-and-vault.md) · [Storage & recovery](../../lab-ecosystem/storage-and-recovery.md) · [Naming conventions](../../lab-ecosystem/naming-conventions.md) · [CI apply flow](ci-apply-flow.md) · [postgres iac](postgres-iac.md) · [ADR-0001 safe prod-like environment](../../../ADR/0001-safe-prod-like-environment.md)
|
||||
|
||||
The `iac/` root provisions everything that lives **outside** the K3s cluster: the Cloudflare R2 backend that holds OpenTofu state itself, the per-service Cloudflare and OVH API tokens consumed by the [cms](https://gitea.arcodange.lab/arcodange-org/cms) repo, a restricted Gitea CI user for reading private module repos, and the GCS bucket that backs up Longhorn volumes. Each provisioned credential is written **both** to a Gitea Actions secret (where the consuming workflow expects it) **and** to a Vault path (the durable source of truth — see [Secrets & Vault](../../lab-ecosystem/secrets-and-vault.md)).
|
||||
|
||||
This root's state lives at `gs://arcodange-tf/factory/main` and is applied by [`.gitea/workflows/iac.yaml`](../../../../.gitea/workflows/iac.yaml) on any change under `iac/**` — see [CI apply flow](ci-apply-flow.md) for the job-by-job walkthrough.
|
||||
|
||||
---
|
||||
|
||||
## Providers
|
||||
|
||||
Declared in [`iac/providers.tf`](../../../../iac/providers.tf).
|
||||
|
||||
| Provider | Source | Version | Endpoint / scope | Auth |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| `gitea` | `go-gitea/gitea` | `0.6.0` | `https://gitea.arcodange.lab` | `GITEA_TOKEN` env var |
|
||||
| `vault` | `vault` | `4.4.0` | `https://vault.arcodange.lab` | JWT login — mount `gitea_jwt`, role `gitea_cicd` |
|
||||
| `google` | `google` | `7.0.1` | project `arcodange`, region `US-EAST1` | `GOOGLE_CREDENTIALS` env var |
|
||||
| `cloudflare` | `cloudflare/cloudflare` | `~> 5` | DNS / Pages / R2 / IAM | `CLOUDFLARE_API_TOKEN` env var |
|
||||
| `ovh` | `ovh/ovh` | `2.8.0` | endpoint `ovh-eu` | `OVH_APPLICATION_KEY` / `OVH_APPLICATION_SECRET` / `OVH_CONSUMER_KEY` |
|
||||
|
||||
> [!NOTE]
|
||||
> The Cloudflare account ID is **not** hard-coded — it is resolved at plan time from `data.cloudflare_account.arcodange` filtered on the account name `arcodange@gmail.com` ([`iac/cloudflare.tf`](../../../../iac/cloudflare.tf)) and exposed as `local.cloudflare_account_id`.
|
||||
|
||||
---
|
||||
|
||||
## Cloudflare — R2 backend bucket & service tokens
|
||||
|
||||
Defined in [`iac/cloudflare.tf`](../../../../iac/cloudflare.tf). Two tokens are minted through the [`modules/cloudflare_token`](#the-cloudflare_token-module) mechanism: one scoped to the R2 state bucket, one broad token handed to the cms repo.
|
||||
|
||||
| Resource | Type | Identity / scope | Secret destination |
|
||||
| --- | --- | --- | --- |
|
||||
| `cloudflare_r2_bucket.arcodange_tf` | R2 bucket | name `arcodange-tf`, jurisdiction `eu` | — (holds the *cms* repo's own OpenTofu state) |
|
||||
| `module.cf_r2_arcodange_tf_token` | module → `cloudflare_account_token` | account: `Workers R2 Storage Read`, `Account Settings Read`; bucket: `Workers R2 Storage Bucket Item Write` | `vault_kv_secret.cf_r2_arcodange_tf` → `kvv1/cloudflare/r2/arcodange-tf` (S3 access key, secret, `https://<account_id>.eu.r2.cloudflarestorage.com` endpoint) |
|
||||
| `vault_policy.cf_r2_arcodange_tf` | Vault policy | name `factory__cf_r2_arcodange_tf` | read on `kvv1/cloudflare/r2/arcodange-tf` **and** `kvv1/zoho/self_client` (the Zoho mail client is created manually) |
|
||||
| `module.cf_arcodange_cms_token` | module → `cloudflare_account_token` | account-scope: `Pages Write`, `Account DNS Settings Write`, `Account Settings Read`, `Zone Write`, `Zone Settings Write`, `DNS Write`, `Cloudflare Tunnel Write`, `Turnstile Sites Write` | Gitea secrets `CLOUDFLARE_API_TOKEN` + `CLOUDFLARE_ACCOUNT_ID` on the `cms` repo; Vault `kvv1/cloudflare/cms/cf_arcodange_cms_token` |
|
||||
|
||||
The `cms` repo (`data.gitea_repo.cms`, owner `arcodange-org`) receives the broad token because it manages the public site end to end: Cloudflare Pages deploys, DNS records, zone settings, the Tunnel, and Turnstile.
|
||||
|
||||
> [!CAUTION]
|
||||
> Both tokens are minted with **`expires_on = null`** — they never expire. A leaked `cf_arcodange_cms_token` grants standing DNS/Pages/Tunnel/Turnstile write on the whole account until manually revoked. There is no automatic rotation; rotation means tainting the module's `cloudflare_account_token` and re-applying.
|
||||
|
||||
---
|
||||
|
||||
## OVH — OAuth2 client for the cms domain
|
||||
|
||||
Defined in [`iac/ovh.tf`](../../../../iac/ovh.tf). A `CLIENT_CREDENTIALS` OAuth2 client lets the cms workflow edit DNS nameservers for `arcodange.fr`, constrained by an IAM policy.
|
||||
|
||||
| Resource | Type | Scope |
|
||||
| --- | --- | --- |
|
||||
| `ovh_me_api_oauth2_client.cms` | OAuth2 client | name `cms repo`, flow `CLIENT_CREDENTIALS` — "arcodange.fr management" |
|
||||
| `ovh_iam_policy.cms` | IAM policy | name `cms_manager`; identity = the OAuth2 client; resources = account URN + `urn:v1:eu:resource:domain:arcodange.fr`; allow = a handful of `me/*` reads, all domain **READ** reference-actions (computed via `data.ovh_iam_reference_actions.domain`), plus `domain:apiovh:nameServer/edit` |
|
||||
| `gitea_repository_actions_secret.ovh_cms_client_id` | Gitea secret | `OVH_CLIENT_ID` on the `cms` repo |
|
||||
| `gitea_repository_actions_secret.ovh_cms_client_secret` | Gitea secret | `OVH_CLIENT_SECRET` on the `cms` repo |
|
||||
| `vault_kv_secret.ovh_cms_token` | Vault secret | `kvv1/ovh/cms/app` — `client_id`, `client_secret`, `urn` |
|
||||
|
||||
> [!NOTE]
|
||||
> The write surface is deliberately narrow: the policy grants **only** `nameServer/edit` for writes; everything else is read-only. This lets the cms pipeline point `arcodange.fr` at Cloudflare nameservers without exposing the broader OVH account.
|
||||
|
||||
---
|
||||
|
||||
## Gitea — restricted CI module-reader user
|
||||
|
||||
Defined in [`iac/gitea_tofu_ci_user.tf`](../../../../iac/gitea_tofu_ci_user.tf). A locked-down Gitea account whose SSH key lets CI clone private Terraform module repos without exposing a privileged token.
|
||||
|
||||
| Resource | Type | Notes |
|
||||
| --- | --- | --- |
|
||||
| `random_password.tofu` | password | length 32 — the user's login password |
|
||||
| `gitea_user.tofu` | Gitea user | username `tofu_module_reader`, email `tofu-module-reader@arcodange.fake`, `restricted = true`, `visibility = private`, `prohibit_login = false` |
|
||||
| `tls_private_key.tofu` | keypair | algorithm **ED25519** |
|
||||
| `gitea_public_key.tofu` | SSH key | public half attached to `tofu_module_reader` |
|
||||
| `vault_kv_secret.gitea_admin_token` | Vault secret | `kvv1/gitea/tofu_module_reader` — `ssh_private_key` + `ssh_public_key` |
|
||||
|
||||
> [!NOTE]
|
||||
> Despite the Terraform resource name `gitea_admin_token`, the stored payload is the **SSH keypair**, not an admin token. The user is `restricted`, so it can only read repos it is explicitly granted access to.
|
||||
|
||||
---
|
||||
|
||||
## Google / GCS — Longhorn backup target
|
||||
|
||||
Defined in [`iac/gcs_backup.tf`](../../../../iac/gcs_backup.tf). A GCS bucket plus an HMAC key wired into Vault so the in-cluster Longhorn controller can pull S3-compatible backup credentials. See [Storage & recovery](../../lab-ecosystem/storage-and-recovery.md) for how this fits the cluster-recovery story.
|
||||
|
||||
| Resource | Type | Value |
|
||||
| --- | --- | --- |
|
||||
| `google_storage_bucket.longhorn_backup` | GCS bucket | name `arcodange-backup`, location `NAM4` (dual-region), `force_destroy = true`, `public_access_prevention = enforced` |
|
||||
| `google_service_account.longhorn_backup` | service account | account_id `longhorn-backup` |
|
||||
| `google_storage_bucket_iam_member.longhorn_backup` | IAM binding | `roles/storage.admin` on the bucket, member = the SA |
|
||||
| `google_storage_hmac_key.longhorn_backup` | HMAC key | S3-compatible access_id + secret for that SA |
|
||||
| `vault_kv_secret_v2.longhorn_gcs_backup` | Vault **KVv2** secret | mount `kvv2`, name `longhorn/gcs-backup`, `cas = 1`, `delete_all_versions = true` — `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_ENDPOINTS = https://storage.googleapis.com` |
|
||||
| `vault_policy.longhorn_gcs_backup` | Vault policy | name `longhorn-gcs-backup` — read on `kvv2/data/longhorn/gcs-backup` |
|
||||
| `vault_kubernetes_auth_backend_role.longhorn` | Vault k8s auth role | role `longhorn`, bound SA `longhorn-vault-secret-reader` in namespace `longhorn-system`, audience `vault`, policy `longhorn-gcs-backup` |
|
||||
|
||||
The bound service-account name `longhorn-vault-secret-reader` must match the `VaultAuth` manifest in-cluster — that's the handshake that lets Longhorn read the HMAC creds at runtime.
|
||||
|
||||
> [!WARNING]
|
||||
> The HMAC key is an **S3-compatible** credential and is weaker than a native GCS service-account key: it is a long-lived static secret with no key rotation built into this config, and `roles/storage.admin` grants full read/write/delete on the backup bucket. Combined with `force_destroy = true`, a state operation that destroys `arcodange-backup` will delete every Longhorn backup without prompting. Treat this bucket as critical and irreplaceable infrastructure.
|
||||
|
||||
---
|
||||
|
||||
## The `cloudflare_token` module
|
||||
|
||||
Source: [`iac/modules/cloudflare_token/`](../../../../iac/modules/cloudflare_token). This local module turns **human-readable permission names** into a working Cloudflare account token, so callers never hard-code permission-group UUIDs.
|
||||
|
||||
How it works ([`main.tf`](../../../../iac/modules/cloudflare_token/main.tf)):
|
||||
|
||||
1. It reads **all** available permission groups via `data.cloudflare_account_api_token_permission_groups_list`, then builds `local.permission_map`: `"<scope>:<name>" => id` (e.g. `"account:Pages Write" => <uuid>`), keyed by the last dotted segment of the group's scope.
|
||||
2. Caller-supplied names (`var.permissions.account` / `var.permissions.bucket`) are looked up against that map; any name with no match lands in `local.missing_permissions` and trips a **`precondition`** that fails the apply with a clear "Permissions introuvables" error.
|
||||
3. Policies are assembled dynamically — an `account` policy targeting `com.cloudflare.api.account.<id>` and, if `var.bucket` is set, a `bucket` policy targeting `com.cloudflare.edge.r2.bucket.<id>_<jurisdiction>_<name>`.
|
||||
4. The `cloudflare_account_token.token` resource sets `expires_on = null` and **ignores** drift on `expires_on` and `policies` (the upstream permission IDs are unstable). Instead, a `null_resource.cloudflare_account_token_replace` hashes the **sorted permission names** into its triggers, and `replace_triggered_by` forces a fresh token whenever the *names* change — surviving id churn while still rotating on a real permission change.
|
||||
5. Outputs ([`outputs.tf`](../../../../iac/modules/cloudflare_token/outputs.tf)): `token` (sensitive), `token_id`, `token_sha256`, and — when `var.bucket` is set — `r2_credentials` mapping `access_key_id = token.id` and `secret_access_key = sha256(token.value)` for S3-compatible R2 access.
|
||||
|
||||
---
|
||||
|
||||
## Vault layout: mixed KVv1 / KVv2
|
||||
|
||||
This root writes to **both** KV engines, which is easy to trip over.
|
||||
|
||||
| Path | Engine | Written by |
|
||||
| --- | --- | --- |
|
||||
| `kvv1/cloudflare/r2/arcodange-tf` | KVv1 (`vault_kv_secret`) | R2 backend token |
|
||||
| `kvv1/cloudflare/cms/cf_arcodange_cms_token` | KVv1 | cms Cloudflare token |
|
||||
| `kvv1/ovh/cms/app` | KVv1 | OVH OAuth2 client |
|
||||
| `kvv1/gitea/tofu_module_reader` | KVv1 | CI user SSH key |
|
||||
| `kvv2/longhorn/gcs-backup` | KVv2 (`vault_kv_secret_v2`) | Longhorn GCS HMAC |
|
||||
|
||||
> [!WARNING]
|
||||
> Most secrets here use the **KVv1** engine (`vault_kv_secret`), but the Longhorn backup secret uses **KVv2** (`vault_kv_secret_v2`). The policy paths differ accordingly — KVv2 reads target `kvv2/data/longhorn/gcs-backup` (note the `/data/` segment), whereas KVv1 policies read the literal path. Mixing the two engines means a policy copied from one secret to another will silently grant nothing. See [Secrets & Vault](../../lab-ecosystem/secrets-and-vault.md) for the engine-level design.
|
||||
|
||||
---
|
||||
|
||||
## Outputs
|
||||
|
||||
The root exposes a single top-level `output "token"` (sensitive) = the cms Cloudflare token ([`iac/cloudflare.tf`](../../../../iac/cloudflare.tf)). Everything else is delivered side-effect-style into Gitea secrets and Vault paths rather than as Terraform outputs.
|
||||
|
||||
---
|
||||
|
||||
## See also
|
||||
|
||||
- [CI apply flow](ci-apply-flow.md) — how `iac/**` changes reach `gs://arcodange-tf/factory/main` via the Vault-JWT exchange and auto-approve apply.
|
||||
- [postgres iac](postgres-iac.md) — the sibling root that provisions in-cluster PostgreSQL.
|
||||
- [Secrets & Vault](../../lab-ecosystem/secrets-and-vault.md) · [Storage & recovery](../../lab-ecosystem/storage-and-recovery.md) · [Naming conventions](../../lab-ecosystem/naming-conventions.md).
|
||||
116
vibe/guidebooks/factory-provisioning/opentofu/postgres-iac.md
Normal file
116
vibe/guidebooks/factory-provisioning/opentofu/postgres-iac.md
Normal file
@@ -0,0 +1,116 @@
|
||||
[vibe](../../../README.md) > [Guidebooks](../../README.md) > [Factory provisioning](../README.md) > [OpenTofu](README.md) > **postgres iac**
|
||||
|
||||
# postgres iac — the `postgres/iac/` state root
|
||||
|
||||
> [!NOTE]
|
||||
> **Status:** ✅ active · **Last Updated:** 2026-06-23
|
||||
> **Code:** [`postgres/iac/`](../../../../postgres/iac) · **State backend:** `gs://arcodange-tf/factory/postgres` ([`postgres/iac/backend.tf`](../../../../postgres/iac/backend.tf))
|
||||
> **Upstream:** [OpenTofu hub](README.md) · [Factory provisioning hub](../README.md) · [Lab ecosystem · 01 factory](../../lab-ecosystem/01-factory.md)
|
||||
> **Related:** [Naming conventions](../../lab-ecosystem/naming-conventions.md) · [Secrets & Vault](../../lab-ecosystem/secrets-and-vault.md) · [CI apply flow](ci-apply-flow.md) · [factory iac](factory-iac.md) · [ADR-0001 safe prod-like environment](../../../ADR/0001-safe-prod-like-environment.md)
|
||||
|
||||
The `postgres/iac/` root provisions **PostgreSQL roles, databases, and the pgbouncer auth function** on the live cluster database — one strand of the per-application `<app>` join key described in [Naming conventions](../../lab-ecosystem/naming-conventions.md). For each application it creates a non-login owner role, an `<app>` database owned by that role, and a `user_lookup()` function that lets PgBouncer authenticate against `pg_shadow`. A single `credentials_editor` login role (whose password is stored in Vault) is granted admin over every per-app role so that downstream tooling can mint application credentials without superuser rights.
|
||||
|
||||
This root's state lives at `gs://arcodange-tf/factory/postgres` and is applied by [`.gitea/workflows/postgres.yaml`](../../../../.gitea/workflows/postgres.yaml) on any change under `postgres/**` — see [CI apply flow](ci-apply-flow.md).
|
||||
|
||||
> [!CAUTION]
|
||||
> This root runs as a **PostgreSQL superuser** ([`postgres/iac/providers.tf`](../../../../postgres/iac/providers.tf): `superuser = true`) pinned to the live database at **`192.168.1.202`** (pi2) **through PgBouncer**, with `sslmode = disable`. The provider can therefore **drop or alter live application databases** — an errant `terraform destroy` or a renamed `applications` entry will delete real data. And because the only route to Postgres is via PgBouncer on that host, **if PgBouncer is down OpenTofu cannot connect and no apply can run.** Treat every `postgres/**` merge as a production database change ([ADR-0001](../../../ADR/0001-safe-prod-like-environment.md)).
|
||||
|
||||
---
|
||||
|
||||
## Providers
|
||||
|
||||
Declared in [`postgres/iac/providers.tf`](../../../../postgres/iac/providers.tf).
|
||||
|
||||
| Provider | Source | Version | Connection | Auth |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| `postgresql` | `cyrilgdn/postgresql` | `1.24.0` | host `192.168.1.202` (pi2), via PgBouncer, `sslmode = disable`, `superuser = true` | `var.POSTGRES_USERNAME` / `var.POSTGRES_PASSWORD` (TF vars from `TF_VAR_POSTGRES_*`, sourced from Vault in CI) |
|
||||
| `vault` | `vault` | `4.4.0` | `https://vault.arcodange.lab` | JWT login — mount `gitea_jwt`, role `gitea_cicd` |
|
||||
|
||||
The two `POSTGRES_*` variables are declared `sensitive` in the same file; CI populates them from Vault as `TF_VAR_POSTGRES_USERNAME` / `TF_VAR_POSTGRES_PASSWORD` (see [CI apply flow](ci-apply-flow.md)).
|
||||
|
||||
---
|
||||
|
||||
## The application set
|
||||
|
||||
Everything in this root fans out over one variable. `var.applications` is a `set(string)` ([`variables.tf`](../../../../postgres/iac/variables.tf)) whose members are listed in [`terraform.tfvars`](../../../../postgres/iac/terraform.tfvars):
|
||||
|
||||
| `applications` member |
|
||||
| --- |
|
||||
| `webapp` |
|
||||
| `erp` |
|
||||
| `crowdsec` |
|
||||
| `plausible` |
|
||||
| `dance-lessons-coach` |
|
||||
|
||||
Adding an app to that list creates a full role + database + lookup-function bundle on the next apply; **removing** one would `DROP` the live database (see the caution above).
|
||||
|
||||
---
|
||||
|
||||
## The `credentials_editor` role
|
||||
|
||||
Defined in [`postgres/iac/main.tf`](../../../../postgres/iac/main.tf). A single login role, granted admin over every per-app role, whose credentials downstream tooling uses to provision application logins.
|
||||
|
||||
| Resource | Type | Detail |
|
||||
| --- | --- | --- |
|
||||
| `random_password.credentials_editor` | password | length 24, `override_special = "-:!+<>"` |
|
||||
| `postgresql_role.credentials_editor` | role | `login = true`, `create_role = true`; `lifecycle { ignore_changes = [roles] }` so its grant membership isn't reverted |
|
||||
| `vault_kv_secret.postgres_admin_credentials` | Vault **KVv1** secret | `kvv1/postgres/credentials_editor/credentials` — `username` + `password` |
|
||||
|
||||
---
|
||||
|
||||
## Per-application resources
|
||||
|
||||
For each member of `var.applications`, `main.tf` creates the following (all `for_each` over the set):
|
||||
|
||||
| Resource | Type | What it creates |
|
||||
| --- | --- | --- |
|
||||
| `postgresql_role.app_role["<app>"]` | role | non-login role `<app>_role` (`login = false`) — owns the database |
|
||||
| `postgresql_grant_role.credentials_editor_app_role["<app>"]` | grant | `credentials_editor` → `<app>_role` **WITH ADMIN OPTION** |
|
||||
| `postgresql_database.app_db["<app>"]` | database | database `<app>`, owner `<app>_role`, `template = template0`, `alter_object_ownership = true` |
|
||||
| `postgresql_function.pgbouncer_user_lookup["<app>"]` | function | `user_lookup(i_username text)` in db `<app>` — see below |
|
||||
| `postgresql_grant.pgbouncer_user_lookup_public_revoke["<app>"]` | grant | revoke (empty `privileges`) of `user_lookup` from role `public` in schema `public` |
|
||||
| `postgresql_grant.pgbouncer_user_lookup["<app>"]` | grant | `EXECUTE` on `user_lookup` to role `pgbouncer_auth`; `depends_on` the public-revoke (the two grants can't run in parallel) |
|
||||
|
||||
So `webapp` yields role `webapp_role`, database `webapp`, function `webapp.user_lookup`, and the matching grants; likewise for `erp`, `crowdsec`, `plausible`, and `dance-lessons-coach`.
|
||||
|
||||
### The pgbouncer `user_lookup()` function
|
||||
|
||||
`postgresql_function.pgbouncer_user_lookup` defines a `plpgsql` function with **`security_definer = true`** and `parallel = "SAFE"`. It takes `i_username` (IN, text) and returns a record of `uname` + `phash`:
|
||||
|
||||
```sql
|
||||
BEGIN
|
||||
SELECT usename, passwd FROM pg_catalog.pg_shadow
|
||||
WHERE usename = i_username INTO uname, phash;
|
||||
RETURN;
|
||||
END;
|
||||
```
|
||||
|
||||
PgBouncer's `auth_query` calls this to fetch the stored password hash. Because reading `pg_shadow` is privileged, the function is `SECURITY DEFINER` (runs as its owner). Access is locked down in two steps: first **revoke** the default `public` execute grant, then **grant** `EXECUTE` only to the `pgbouncer_auth` role — the `pgbouncer_auth` role itself is expected to already exist on the server (it is not created by this root).
|
||||
|
||||
> [!NOTE]
|
||||
> The two grants are ordered with an explicit `depends_on`: `postgresql_grant.pgbouncer_user_lookup` waits for `postgresql_grant.pgbouncer_user_lookup_public_revoke` because the provider can't apply both grants on the same object concurrently.
|
||||
|
||||
---
|
||||
|
||||
## Vault layout
|
||||
|
||||
This root writes a single KVv1 secret.
|
||||
|
||||
| Path | Engine | Contents |
|
||||
| --- | --- | --- |
|
||||
| `kvv1/postgres/credentials_editor/credentials` | KVv1 (`vault_kv_secret`) | `username`, `password` of the `credentials_editor` login role |
|
||||
|
||||
---
|
||||
|
||||
## No outputs
|
||||
|
||||
There is **no `outputs.tf`** in this root. Nothing is exported as a Terraform output — the `credentials_editor` credentials are delivered into Vault, and the per-app roles/databases/functions are side effects on the live server. Consumers read the credentials from `kvv1/postgres/credentials_editor/credentials`, not from state outputs.
|
||||
|
||||
---
|
||||
|
||||
## See also
|
||||
|
||||
- [Naming conventions](../../lab-ecosystem/naming-conventions.md) — the `<app>` databases here are one strand of the per-application `<app>` join key (alongside namespaces, Vault paths, and repos).
|
||||
- [CI apply flow](ci-apply-flow.md) — how `postgres/**` changes reach `gs://arcodange-tf/factory/postgres` and where `TF_VAR_POSTGRES_*` come from.
|
||||
- [factory iac](factory-iac.md) — the sibling root for everything outside the cluster.
|
||||
- [Secrets & Vault](../../lab-ecosystem/secrets-and-vault.md).
|
||||
Reference in New Issue
Block a user