Files

Gabriel Radureau 548dacfc44 docs(vibe): add tools/ and cms/ guidebooks

Two code-grounded tree-docs guidebooks under vibe/guidebooks/, drilling into the
lab-ecosystem 02-tools and 03-cms pages (bidirectional):

- tools/  : hub + components.md (Vault+VSO, Prometheus, Grafana, CrowdSec,
  pgbouncer, Redis/KeyDB, Plausible, ClickHouse; pgcat/tool as Tier-2) +
  secrets-and-vso.md (Vault engines/auth, the app_roles/app_policy modules =
  the <app> join-key machinery, VSO CRDs, secret-paths inventory).
- cms/    : hub + site.md (Nuxt + dual Pages/k3s deploy) + cloudflare.md
  (zone via OVH->CF, Pages, cloudflared tunnel, Turnstile, R2 state) +
  zoho-email.md (OAuth, MX/SPF/DKIM/DMARC/BIMI, the 7 aliases).

Sibling-repo code linked via full gitea URLs; vibe-internal links bidirectional.
Reconciled the cloudflared tunnel token path to kvv2 cms/cloudflared (the chart
VaultStaticSecret is kv-v2; the kvv1 tofu reference is a commented-out stub).
6 mermaid diagrams MCP-validated; zero dead links. Lab Cartographer cohort.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-23 21:41:15 +02:00

22 KiB

Raw Blame History

vibe > Guidebooks > Tools > Secrets & VSO

Tools — Secrets & VSO

Status: ✅ Active Last Updated: 2026-06-23 Upstream: Tools · Components Downstream: consumed by every tools-namespace pod and by every app's CI/CD Related: secrets-and-vault concept · naming-conventions concept · storage-and-recovery · tofu CI apply flow · postgres IaC · safe-env ADR

This page maps how secrets live in HashiCorp Vault (engines, auth backends) and how they reach Kubernetes pods via the Vault Secrets Operator (VSO). The keystone is the app_policy + app_roles module pair: the machinery that turns a single <app> name into a matched set of Vault policies, roles, and CI identities — the same <app> join key documented in the naming-conventions concept.

Vault itself runs as a component in the tools namespace; see the Components page for its deploy shape. The admin/bootstrap layer (the kvv1 engine, the gitea_jwt auth backend, the base gitea_cicd role, the Kubernetes auth backend mount) is created by factory's Ansible-managed Vault Terraform in hashicorp_vault.tf; everything in this page that is per-app is created by the IaC under hashicorp-vault/iac.

Caution

Vault runs standalone with file/raft storage and starts sealed after any restart or node reboot. Until it is unsealed, every VSO read fails and no app can fetch DB creds or config — pods that depend on a VaultDynamicSecret will not start. Unseal procedure and key custody live in storage-and-recovery.

1) Vault engines & auth backends

All engines below are mounted by hashicorp-vault/iac/main.tf except kvv1, which is bootstrapped by factory's Ansible Vault Terraform.

Mount	Type	Holds	Defined in
`kvv1`	KV v1	Admin / cloud secrets: `kvv1/google/credentials`, `kvv1/gitea/`, `kvv1/cloudflare/`, `kvv1/ovh/`, `kvv1/postgres/credentials`, `kvv1/admin/`	factory `hashicorp_vault.tf`
`kvv2`	KV v2 (versioned)	Per-app config secrets under `kvv2/<app>/*`	`main.tf`
`transit`	transit	The VSO client-cache encryption key `vso-client-cache` — lets VSO persist its client cache encrypted so it survives an operator restart without re-auth storms	`main.tf`
`postgres`	database	Dynamic Postgres creds at `postgres/creds/<app>`; connects to the DB through `pgbouncer.tools:5432` using the `credentials_editor` root account	`main.tf`

The postgres connection is configured with allowed_roles = ["*"] and a root-rotation statement (ALTER USER … WITH PASSWORD); the editor username/password come from the sensitive POSTGRES_CREDENTIALS_EDITOR_* variables.

Auth backends

Backend	Mount	Who uses it	Role(s)
`kubernetes`	`kubernetes`	VSO controller + every app pod's ServiceAccount	`vault-secret-operator` (VSO itself), `<app>` (one per app), `factory_crowdsec_conf`
`gitea_jwt`	`gitea_jwt`	CI/OpenTofu jobs running in Gitea Actions	`gitea_cicd` (base, factory-bootstrapped) + per-app `gitea_cicd_<app>`

kubernetes auth (main.tf) is configured against https://kubernetes.default.svc:443. The VSO role vault-secret-operator binds SA hashicorp-vault-vault-secrets-operator-controller-manager in ns tools, audience = vault, and carries the edit-vso-client-cache policy (encrypt/decrypt on transit/.../vso-client-cache).
gitea_jwt is the OIDC/JWT backend for CI. Its backend, default_role = gitea_cicd, and the base gitea_cicd role are created by factory's Vault bootstrap; the Vault provider in each IaC project logs in via auth_login_jwt { mount = "gitea_jwt", role = "gitea_cicd[_<app>]" } using the TERRAFORM_VAULT_AUTH_JWT env var. See the tofu CI apply flow for how the token is minted in the pipeline.

Terraform state

Each IaC project keeps its state in the arcodange-tf GCS bucket under a distinct prefix:

Project	GCS prefix
Vault admin/app machinery	`tools/hashicorp_vault/main`
Plausible	`tools/plausible/main`
CrowdSec	`tools/crowdsec/main`

2) The `app_policy` + `app_roles` modules — the `<app>` join-key machinery

Important

These two modules are the heart of the secrets layer. Given a single <app> name they emit a matched, name-derived set of Vault objects so that an app's runtime, its CI, and its database identity all line up on the same key. This is the Vault half of the lab-wide naming convention: the same <app> string also names the Kubernetes namespace, the ServiceAccount, the Postgres <app>_role, and the Gitea repo.

The two modules live on opposite sides of the trust boundary:

modules/app_policy is declared once, centrally, in the Vault admin project (main.tf, for_each over var.applications). It creates the policies and the CI identity — the privileged bits — so the app's own repo never holds them.
modules/app_roles is declared by the subordinate app project (pulled over SSH as a Git module), running under the <app>-ops policy. It creates the roles the app needs.

`app_roles` — runtime roles (declared by the app repo)

For <app>, app_roles/main.tf creates:

Resource	Path	Key settings
Kubernetes auth role	`auth/kubernetes/role/<app>`	`bound_service_account_names = [<app>] + extras`, `bound_service_account_namespaces = [<app>] + extras`, `token_ttl = 3600` (1h), `token_policies = [default, <app>]`, `audience = vault`
Postgres dynamic role	`postgres/roles/<app>`	`db_name = postgres`; creation SQL: `CREATE ROLE "{{name}}" WITH LOGIN PASSWORD … VALID UNTIL …` then `GRANT <app>_role TO "{{name}}"`; revocation: `REASSIGN OWNED BY "{{name}}" TO <app>_role` then `REVOKE ALL ON DATABASE <app> FROM "{{name}}"`

Important

The Postgres dynamic role's creation SQL does GRANT <app>_role TO {{name}} and its revocation does REASSIGN OWNED BY {{name}} TO <app>_role. The non-login <app>_role must already exist in Postgres — it is created by factory's postgres IaC (postgresql_role.app_role["<app>"], owner of the <app> database). If that role is missing, every ephemeral-user creation/revocation fails. This is the ordering dependency between the two repos: factory postgres/iac before tools app_roles.

Note

The Kubernetes auth role binds both SA names and namespaces — the check is an AND. A token presenting SA <app> from the wrong namespace (or any other SA from ns <app>) is rejected. The default binding is SA <app> in ns <app>; the service_account_names / service_account_namespaces inputs widen it (e.g. CrowdSec/Plausible run in ns tools, not a namespace named after the app).

The Postgres role can be skipped with disable_database = true; the DB name defaults to <app> but can be overridden via database.

`app_policy` — policies + CI identity (declared centrally)

For <app>, app_policy/main.tf creates:

Resource	Name	Grants
App policy	`<app>`	`read,list` on `kvv2/data/<app>/`; `read` on `postgres/creds/<app>` — what the runtime pod can do
Ops policy	`<app>-ops`	The CI bundle (below)
JWT role	`gitea_cicd_<app>` (mount `gitea_jwt`)	`token_policies = [default] + <app>'s ops_policies`, `bound_audiences = [gitea_app_id]`, `user_claim = email`, `role_type = jwt`
Identity group	`<app>-ops`	Internal group carrying the `<app>-ops` policy, so Vault users mapped to their Gitea entity inherit ops rights

The <app>-ops policy is the privilege set a CI job needs to manage the app's own corner of Vault and the clouds:

create/update on auth/token/create; read on sys/mounts/auth/* (so the Vault provider works);
full CRUD on postgres/roles/<app>* and on auth/kubernetes/role/<app>* (so app_roles can apply) — the k8s-role rule is parameter-constrained: it may only set bound_service_account_names/bound_service_account_namespaces to the whitelisted [<app>] + extras lists and token_policies to ["default","<app>"], preventing a CI job from minting a role with broader bindings;
full CRUD on the app's KV-v2 data, delete/undelete/destroy, and metadata (kvv2/data|delete|undelete|destroy|metadata/<app>/*);
read on kvv1/google/credentials (the GCS backend SA), kvv1/gitea/tofu_module_reader (the bot SSH key that lets CI pull the app_roles Git module);
CRUD on kvv1/cloudflare/<app>* and kvv1/ovh/<app>* (cloud DNS/edge secrets scoped to the app).

Note

The policy document is post-processed with two replace() calls. The Vault provider serializes the whitelisted list parameters as a JSON-encoded string ("["webapp"]"); the replaces strip the outer quotes so Vault receives a real list. If you change those allowed_parameter blocks, keep the replaces in sync.

Apps wired in `terraform.tfvars`

terraform.tfvars declares the applications set the central app_policy for_each walks:

`<app>`	Extra SA	Extra ns	Extra ops policy	Notes
`webapp`	—	—	—	defaults: SA `webapp` / ns `webapp`
`erp`	—	—	—	defaults
`cms`	`cloudflared`	—	`factory__cf_r2_arcodange_tf`	extra SA for the Cloudflare tunnel; extra ops policy for the CF R2 Terraform-state bucket
`crowdsec`	—	`tools`	—	runs in ns `tools`
`plausible`	—	`tools`	—	runs in ns `tools`

Note

terraform.tfvars uses the key ops_policies for the CMS extra policy while variables.tf declares the optional attribute as policies; the central main.tf passes each.value.policies into the module's ops_policies input. Read these together when adding a new app so the extra-policy list actually lands on the JWT role.

3) VSO CRDs — how a secret becomes a Kubernetes Secret

The Vault Secrets Operator watches three custom resources and writes plain Kubernetes Secret objects that pods consume normally (env / volume). The app repo ships the CRDs; the operator does the Vault round-trips.

CRD	What it does	Refresh / rotation
`VaultAuth`	Picks the auth method (`kubernetes`), the `mount`, the Vault `role` (= `<app>`), and the pod ServiceAccount (= `<app>`) used to log in; references a `VaultConnection` (here the in-cluster `default` → `http://hashicorp-vault.tools.svc.cluster.local:8200`)	n/a — used by the other two CRDs via `vaultAuthRef`
`VaultStaticSecret`	Reads a KV-v2 path → writes a k8s `Secret`	`refreshAfter` (the lab uses `30s`)
`VaultDynamicSecret`	Reads `postgres/creds/<app>` (a dynamic lease) → writes a k8s `Secret`; `rolloutRestartTargets` lists Deployments to restart when creds rotate	follows the Vault lease TTL (1h); VSO renews/re-issues and restarts the targets

Worked example — Plausible (`tools` namespace)

Files under plausible/resources:

VaultAuth plausible (vaultauth.yaml) — method: kubernetes, role: plausible, serviceAccount: plausible, audiences: [vault]. This is the Vault role app_roles created in plausible/iac/main.tf.
VaultStaticSecret plausible (vaultsecret.yaml) — kvv2 path plausible/config → Secret plausible-config (refreshAfter: 30s). The config payload holds SECRET_KEY_BASE and TOTP_VAULT_KEY, both generated by Terraform (random_password, base64-encoded) and written to kvv2/plausible/config via vault_kv_secret_v2 in the plausible IaC.
VaultStaticSecret plausible-geoip (geoipsecret.yaml) — kvv2 path plausible/geoip → Secret plausible-geoip exposing LICENSE_KEY (the MaxMind GeoIP license, an admin-seeded value, fed to the geoipupdate sidecar via env GEOIPUPDATE_LICENSE_KEY).
VaultDynamicSecret plausible-db-credentials (vaultdynamicsecret.yaml) — postgres/creds/plausible → Secret plausible-db-credentials; rolloutRestartTargets restarts Deployment plausible. An init container (add-initcontainer.yaml) reads username/password from that Secret and writes DATABASE_URL (postgres://${DB_USER}:${DB_PASS}@${DB_HOST}:${DB_PORT}/${DB_NAME}) into a shared generated-secrets volume the app reads.

Worked example — CrowdSec (`tools` namespace)

Templates under crowdsec/templates:

VaultAuth crowdsec (vaultauth.yaml) — role: crowdsec, serviceAccount: crowdsec.
VaultDynamicSecret crowdsec-db-credentials (vaultdynamicsecret.yaml) — postgres/creds/crowdsec → Secret crowdsec-db-credentials; rolloutRestartTargets restarts Deployment crowdsec-lapi (the Local API that owns the DB connection).

`factory_auth.tf` — the Ansible CrowdSec/Traefik plugin reader

Separately from the per-app machinery, factory_auth.tf wires a Kubernetes auth role factory_crowdsec_conf for SA factory-ansible-tool-crowdsec-traefik-plugin in ns kube-system (token_ttl = 3600). It carries policy factory_crowdsec_conf, which grants read,list on kvv2/data/cms/factory/*. This is how the Ansible-deployed CrowdSec/Traefik bouncer plugin reads the Turnstile configuration that the cms repo writes into kvv2/cms/factory/* — a cross-repo handoff entirely through Vault, with no shared file. The producer side (the Turnstile widget and the vault_kv_secret_v2 write) is documented on the CMS Cloudflare page.

4) Secret-paths inventory

Path	Engine	Holds	Producer	Consumer
`kvv2/<app>/config`	KV v2	App runtime config	app CI (KV CRUD via `<app>-ops`)	`VaultStaticSecret` → pod
`kvv2/plausible/config`	KV v2	`SECRET_KEY_BASE`, `TOTP_VAULT_KEY`	Plausible IaC (`random_password` → `vault_kv_secret_v2`)	`VaultStaticSecret plausible` → `plausible-config`
`kvv2/plausible/geoip`	KV v2	`LICENSE_KEY` (MaxMind)	admin-seeded	`VaultStaticSecret plausible-geoip` → `geoipupdate` sidecar
`kvv2/cms/factory/turnstile`	KV v2	Cloudflare Turnstile config	`cms` repo IaC	`factory_crowdsec_conf` k8s role → Ansible CrowdSec/Traefik plugin
`postgres/creds/<app>`	database	Ephemeral DB user (`username`/`password`, 1h lease)	Vault on demand (role `<app>`, `GRANT <app>_role`)	`VaultDynamicSecret` → pod (e.g. `plausible-db-credentials`, `crowdsec-db-credentials`)
`transit/.../vso-client-cache`	transit	VSO client-cache encryption key	Vault admin IaC	VSO controller (encrypt/decrypt its cache)
`kvv1/cloudflare/<app>*`	KV v1	Cloudflare DNS/edge secrets	admin	app CI (`<app>-ops` CRUD)
`kvv1/ovh/<app>*`	KV v1	OVH secrets	admin	app CI (`<app>-ops` CRUD)
`kvv1/gitea/tofu_module_reader`	KV v1	Bot SSH key to pull the `app_roles` Git module	admin	app CI (`<app>-ops` read)
`kvv1/google/credentials`	KV v1	GCS Terraform-backend SA key	admin	every IaC CI job (read)

5) Secrets flow

%%{init: {'theme': 'base'}}%%
flowchart TB
    classDef eng fill:#7c3aed,stroke:#5b21b6,color:#ffffff
    classDef auth fill:#b45309,stroke:#92400e,color:#ffffff
    classDef crd fill:#059669,stroke:#047857,color:#ffffff
    classDef k8s fill:#2563eb,stroke:#1e40af,color:#ffffff
    classDef ci fill:#be123c,stroke:#9f1239,color:#ffffff

    subgraph VAULT["Vault (tools ns)"]
        KV2["kvv2 engine<br>kvv2/&lt;app&gt;/*"]:::eng
        PG["postgres engine<br>postgres/creds/&lt;app&gt;"]:::eng
        TR["transit<br>vso-client-cache"]:::eng
        KKUB["kubernetes auth<br>role &lt;app&gt; (SA AND ns)"]:::auth
        KJWT["gitea_jwt auth<br>gitea_cicd_&lt;app&gt;"]:::auth
    end

    subgraph RUNTIME["Runtime path"]
        VA["VaultAuth<br>role &lt;app&gt;, SA &lt;app&gt;"]:::crd
        VSS["VaultStaticSecret<br>kvv2/&lt;app&gt;/config"]:::crd
        VDS["VaultDynamicSecret<br>postgres/creds/&lt;app&gt;"]:::crd
        SEC["k8s Secret<br>&lt;app&gt;-config / -db-credentials"]:::k8s
        POD["App pod<br>(SA &lt;app&gt;)"]:::k8s
    end

    subgraph CICD["CI path"]
        GHA["Gitea Actions<br>OpenTofu job"]:::ci
        TOFU["apply app_roles<br>(under &lt;app&gt;-ops)"]:::ci
    end

    KKUB --> VA
    VA --> VSS
    VA --> VDS
    KV2 --> VSS
    PG --> VDS
    VSS --> SEC
    VDS -- "rolloutRestart on rotation" --> SEC
    SEC --> POD
    TR -. "encrypts client cache" .-> VA

    GHA -- "JWT login" --> KJWT
    KJWT --> TOFU
    TOFU -- "creates" --> KKUB
    TOFU -- "creates" --> PG

Vault mounts the engines (kvv2, postgres, transit) and the two auth backends (kubernetes, gitea_jwt), all in the tools namespace.
A pod's VaultAuth logs in through the kubernetes backend with SA <app> against role <app>; the role accepts only when both the SA name and its namespace match (AND).
VaultStaticSecret reads kvv2/<app>/config and VaultDynamicSecret reads postgres/creds/<app> using that auth; VSO writes the values into ordinary k8s Secret objects.
The pod consumes the Secret (env or volume); on a dynamic-cred rotation VSO restarts the rolloutRestartTargets Deployment so it picks up the new credentials.
The transit key vso-client-cache encrypts VSO's client cache so an operator restart doesn't trigger a re-auth storm.
On the CI side, a Gitea Actions OpenTofu job logs into the gitea_jwt backend as gitea_cicd_<app> (audience = the Gitea OAuth app id, identity from the email claim).
Running under the <app>-ops policy, that job applies the app_roles module, creating/updating the Kubernetes auth role and the Postgres dynamic role for <app> — closing the loop so the runtime path in steps 2-4 works.

Gotchas

Vault must be unsealed after every restart. Sealed Vault → all VSO reads fail → dynamic-secret consumers won't start. See storage-and-recovery.
The Kubernetes auth role binds SA and namespace (AND). The wrong namespace, or a different SA in the right namespace, is rejected. Apps in ns tools (CrowdSec, Plausible) widen the binding via service_account_namespaces.
The Postgres dynamic role depends on <app>_role existing. GRANT <app>_role TO {{name}} (create) and REASSIGN OWNED BY {{name}} TO <app>_role (revoke) both fail if factory's postgres IaC hasn't created the <app>_role non-login role first. Order: factory postgres/iac → tools app_roles.
The ops_policies vs policies key mismatch in terraform.tfvars / variables.tf (see §2) — read both when adding an app's extra ops policy.
The sandbox uses a separate Vault. Per the safe-env ADR, the prod-like sandbox stands up its own Vault instance; none of the paths or roles above are shared with it. Don't assume a secret seeded in prod exists in the sandbox.

22 KiB Raw Blame History

Tools — Secrets & VSO

1) Vault engines & auth backends

Auth backends

Terraform state

2) The app_policy + app_roles modules — the <app> join-key machinery

app_roles — runtime roles (declared by the app repo)

app_policy — policies + CI identity (declared centrally)

Apps wired in terraform.tfvars