Files
factory/vibe/guidebooks/tools/secrets-and-vso.md
Gabriel Radureau 548dacfc44 docs(vibe): add tools/ and cms/ guidebooks
Two code-grounded tree-docs guidebooks under vibe/guidebooks/, drilling into the
lab-ecosystem 02-tools and 03-cms pages (bidirectional):

- tools/  : hub + components.md (Vault+VSO, Prometheus, Grafana, CrowdSec,
  pgbouncer, Redis/KeyDB, Plausible, ClickHouse; pgcat/tool as Tier-2) +
  secrets-and-vso.md (Vault engines/auth, the app_roles/app_policy modules =
  the <app> join-key machinery, VSO CRDs, secret-paths inventory).
- cms/    : hub + site.md (Nuxt + dual Pages/k3s deploy) + cloudflare.md
  (zone via OVH->CF, Pages, cloudflared tunnel, Turnstile, R2 state) +
  zoho-email.md (OAuth, MX/SPF/DKIM/DMARC/BIMI, the 7 aliases).

Sibling-repo code linked via full gitea URLs; vibe-internal links bidirectional.
Reconciled the cloudflared tunnel token path to kvv2 cms/cloudflared (the chart
VaultStaticSecret is kv-v2; the kvv1 tofu reference is a commented-out stub).
6 mermaid diagrams MCP-validated; zero dead links. Lab Cartographer cohort.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 21:41:15 +02:00

22 KiB

vibe > Guidebooks > Tools > Secrets & VSO

Tools — Secrets & VSO

Status: Active Last Updated: 2026-06-23 Upstream: Tools · Components Downstream: consumed by every tools-namespace pod and by every app's CI/CD Related: secrets-and-vault concept · naming-conventions concept · storage-and-recovery · tofu CI apply flow · postgres IaC · safe-env ADR

This page maps how secrets live in HashiCorp Vault (engines, auth backends) and how they reach Kubernetes pods via the Vault Secrets Operator (VSO). The keystone is the app_policy + app_roles module pair: the machinery that turns a single <app> name into a matched set of Vault policies, roles, and CI identities — the same <app> join key documented in the naming-conventions concept.

Vault itself runs as a component in the tools namespace; see the Components page for its deploy shape. The admin/bootstrap layer (the kvv1 engine, the gitea_jwt auth backend, the base gitea_cicd role, the Kubernetes auth backend mount) is created by factory's Ansible-managed Vault Terraform in hashicorp_vault.tf; everything in this page that is per-app is created by the IaC under hashicorp-vault/iac.

Caution

Vault runs standalone with file/raft storage and starts sealed after any restart or node reboot. Until it is unsealed, every VSO read fails and no app can fetch DB creds or config — pods that depend on a VaultDynamicSecret will not start. Unseal procedure and key custody live in storage-and-recovery.


1) Vault engines & auth backends

All engines below are mounted by hashicorp-vault/iac/main.tf except kvv1, which is bootstrapped by factory's Ansible Vault Terraform.

Mount Type Holds Defined in
kvv1 KV v1 Admin / cloud secrets: kvv1/google/credentials, kvv1/gitea/*, kvv1/cloudflare/*, kvv1/ovh/*, kvv1/postgres/credentials, kvv1/admin/* factory hashicorp_vault.tf
kvv2 KV v2 (versioned) Per-app config secrets under kvv2/<app>/* main.tf
transit transit The VSO client-cache encryption key vso-client-cache — lets VSO persist its client cache encrypted so it survives an operator restart without re-auth storms main.tf
postgres database Dynamic Postgres creds at postgres/creds/<app>; connects to the DB through pgbouncer.tools:5432 using the credentials_editor root account main.tf

The postgres connection is configured with allowed_roles = ["*"] and a root-rotation statement (ALTER USER … WITH PASSWORD); the editor username/password come from the sensitive POSTGRES_CREDENTIALS_EDITOR_* variables.

Auth backends

Backend Mount Who uses it Role(s)
kubernetes kubernetes VSO controller + every app pod's ServiceAccount vault-secret-operator (VSO itself), <app> (one per app), factory_crowdsec_conf
gitea_jwt gitea_jwt CI/OpenTofu jobs running in Gitea Actions gitea_cicd (base, factory-bootstrapped) + per-app gitea_cicd_<app>
  • kubernetes auth (main.tf) is configured against https://kubernetes.default.svc:443. The VSO role vault-secret-operator binds SA hashicorp-vault-vault-secrets-operator-controller-manager in ns tools, audience = vault, and carries the edit-vso-client-cache policy (encrypt/decrypt on transit/.../vso-client-cache).
  • gitea_jwt is the OIDC/JWT backend for CI. Its backend, default_role = gitea_cicd, and the base gitea_cicd role are created by factory's Vault bootstrap; the Vault provider in each IaC project logs in via auth_login_jwt { mount = "gitea_jwt", role = "gitea_cicd[_<app>]" } using the TERRAFORM_VAULT_AUTH_JWT env var. See the tofu CI apply flow for how the token is minted in the pipeline.

Terraform state

Each IaC project keeps its state in the arcodange-tf GCS bucket under a distinct prefix:

Project GCS prefix
Vault admin/app machinery tools/hashicorp_vault/main
Plausible tools/plausible/main
CrowdSec tools/crowdsec/main

2) The app_policy + app_roles modules — the <app> join-key machinery

Important

These two modules are the heart of the secrets layer. Given a single <app> name they emit a matched, name-derived set of Vault objects so that an app's runtime, its CI, and its database identity all line up on the same key. This is the Vault half of the lab-wide naming convention: the same <app> string also names the Kubernetes namespace, the ServiceAccount, the Postgres <app>_role, and the Gitea repo.

The two modules live on opposite sides of the trust boundary:

  • modules/app_policy is declared once, centrally, in the Vault admin project (main.tf, for_each over var.applications). It creates the policies and the CI identity — the privileged bits — so the app's own repo never holds them.
  • modules/app_roles is declared by the subordinate app project (pulled over SSH as a Git module), running under the <app>-ops policy. It creates the roles the app needs.

app_roles — runtime roles (declared by the app repo)

For <app>, app_roles/main.tf creates:

Resource Path Key settings
Kubernetes auth role auth/kubernetes/role/<app> bound_service_account_names = [<app>] + extras, bound_service_account_namespaces = [<app>] + extras, token_ttl = 3600 (1h), token_policies = [default, <app>], audience = vault
Postgres dynamic role postgres/roles/<app> db_name = postgres; creation SQL: CREATE ROLE "{{name}}" WITH LOGIN PASSWORD … VALID UNTIL … then GRANT <app>_role TO "{{name}}"; revocation: REASSIGN OWNED BY "{{name}}" TO <app>_role then REVOKE ALL ON DATABASE <app> FROM "{{name}}"

Important

The Postgres dynamic role's creation SQL does GRANT <app>_role TO {{name}} and its revocation does REASSIGN OWNED BY {{name}} TO <app>_role. The non-login <app>_role must already exist in Postgres — it is created by factory's postgres IaC (postgresql_role.app_role["<app>"], owner of the <app> database). If that role is missing, every ephemeral-user creation/revocation fails. This is the ordering dependency between the two repos: factory postgres/iac before tools app_roles.

Note

The Kubernetes auth role binds both SA names and namespaces — the check is an AND. A token presenting SA <app> from the wrong namespace (or any other SA from ns <app>) is rejected. The default binding is SA <app> in ns <app>; the service_account_names / service_account_namespaces inputs widen it (e.g. CrowdSec/Plausible run in ns tools, not a namespace named after the app).

The Postgres role can be skipped with disable_database = true; the DB name defaults to <app> but can be overridden via database.

app_policy — policies + CI identity (declared centrally)

For <app>, app_policy/main.tf creates:

Resource Name Grants
App policy <app> read,list on kvv2/data/<app>/*; read on postgres/creds/<app>* — what the runtime pod can do
Ops policy <app>-ops The CI bundle (below)
JWT role gitea_cicd_<app> (mount gitea_jwt) token_policies = [default] + <app>'s ops_policies, bound_audiences = [gitea_app_id], user_claim = email, role_type = jwt
Identity group <app>-ops Internal group carrying the <app>-ops policy, so Vault users mapped to their Gitea entity inherit ops rights

The <app>-ops policy is the privilege set a CI job needs to manage the app's own corner of Vault and the clouds:

  • create/update on auth/token/create; read on sys/mounts/auth/* (so the Vault provider works);
  • full CRUD on postgres/roles/<app>* and on auth/kubernetes/role/<app>* (so app_roles can apply) — the k8s-role rule is parameter-constrained: it may only set bound_service_account_names/bound_service_account_namespaces to the whitelisted [<app>] + extras lists and token_policies to ["default","<app>"], preventing a CI job from minting a role with broader bindings;
  • full CRUD on the app's KV-v2 data, delete/undelete/destroy, and metadata (kvv2/data|delete|undelete|destroy|metadata/<app>/*);
  • read on kvv1/google/credentials (the GCS backend SA), kvv1/gitea/tofu_module_reader (the bot SSH key that lets CI pull the app_roles Git module);
  • CRUD on kvv1/cloudflare/<app>* and kvv1/ovh/<app>* (cloud DNS/edge secrets scoped to the app).

Note

The policy document is post-processed with two replace() calls. The Vault provider serializes the whitelisted list parameters as a JSON-encoded string ("["webapp"]"); the replaces strip the outer quotes so Vault receives a real list. If you change those allowed_parameter blocks, keep the replaces in sync.

Apps wired in terraform.tfvars

terraform.tfvars declares the applications set the central app_policy for_each walks:

<app> Extra SA Extra ns Extra ops policy Notes
webapp defaults: SA webapp / ns webapp
erp defaults
cms cloudflared factory__cf_r2_arcodange_tf extra SA for the Cloudflare tunnel; extra ops policy for the CF R2 Terraform-state bucket
crowdsec tools runs in ns tools
plausible tools runs in ns tools

Note

terraform.tfvars uses the key ops_policies for the CMS extra policy while variables.tf declares the optional attribute as policies; the central main.tf passes each.value.policies into the module's ops_policies input. Read these together when adding a new app so the extra-policy list actually lands on the JWT role.


3) VSO CRDs — how a secret becomes a Kubernetes Secret

The Vault Secrets Operator watches three custom resources and writes plain Kubernetes Secret objects that pods consume normally (env / volume). The app repo ships the CRDs; the operator does the Vault round-trips.

CRD What it does Refresh / rotation
VaultAuth Picks the auth method (kubernetes), the mount, the Vault role (= <app>), and the pod ServiceAccount (= <app>) used to log in; references a VaultConnection (here the in-cluster defaulthttp://hashicorp-vault.tools.svc.cluster.local:8200) n/a — used by the other two CRDs via vaultAuthRef
VaultStaticSecret Reads a KV-v2 path → writes a k8s Secret refreshAfter (the lab uses 30s)
VaultDynamicSecret Reads postgres/creds/<app> (a dynamic lease) → writes a k8s Secret; rolloutRestartTargets lists Deployments to restart when creds rotate follows the Vault lease TTL (1h); VSO renews/re-issues and restarts the targets

Worked example — Plausible (tools namespace)

Files under plausible/resources:

  1. VaultAuth plausible (vaultauth.yaml) — method: kubernetes, role: plausible, serviceAccount: plausible, audiences: [vault]. This is the Vault role app_roles created in plausible/iac/main.tf.
  2. VaultStaticSecret plausible (vaultsecret.yaml) — kvv2 path plausible/config → Secret plausible-config (refreshAfter: 30s). The config payload holds SECRET_KEY_BASE and TOTP_VAULT_KEY, both generated by Terraform (random_password, base64-encoded) and written to kvv2/plausible/config via vault_kv_secret_v2 in the plausible IaC.
  3. VaultStaticSecret plausible-geoip (geoipsecret.yaml) — kvv2 path plausible/geoip → Secret plausible-geoip exposing LICENSE_KEY (the MaxMind GeoIP license, an admin-seeded value, fed to the geoipupdate sidecar via env GEOIPUPDATE_LICENSE_KEY).
  4. VaultDynamicSecret plausible-db-credentials (vaultdynamicsecret.yaml) — postgres/creds/plausible → Secret plausible-db-credentials; rolloutRestartTargets restarts Deployment plausible. An init container (add-initcontainer.yaml) reads username/password from that Secret and writes DATABASE_URL (postgres://${DB_USER}:${DB_PASS}@${DB_HOST}:${DB_PORT}/${DB_NAME}) into a shared generated-secrets volume the app reads.

Worked example — CrowdSec (tools namespace)

Templates under crowdsec/templates:

  1. VaultAuth crowdsec (vaultauth.yaml) — role: crowdsec, serviceAccount: crowdsec.
  2. VaultDynamicSecret crowdsec-db-credentials (vaultdynamicsecret.yaml) — postgres/creds/crowdsec → Secret crowdsec-db-credentials; rolloutRestartTargets restarts Deployment crowdsec-lapi (the Local API that owns the DB connection).

factory_auth.tf — the Ansible CrowdSec/Traefik plugin reader

Separately from the per-app machinery, factory_auth.tf wires a Kubernetes auth role factory_crowdsec_conf for SA factory-ansible-tool-crowdsec-traefik-plugin in ns kube-system (token_ttl = 3600). It carries policy factory_crowdsec_conf, which grants read,list on kvv2/data/cms/factory/*. This is how the Ansible-deployed CrowdSec/Traefik bouncer plugin reads the Turnstile configuration that the cms repo writes into kvv2/cms/factory/* — a cross-repo handoff entirely through Vault, with no shared file. The producer side (the Turnstile widget and the vault_kv_secret_v2 write) is documented on the CMS Cloudflare page.


4) Secret-paths inventory

Path Engine Holds Producer Consumer
kvv2/<app>/config KV v2 App runtime config app CI (KV CRUD via <app>-ops) VaultStaticSecret → pod
kvv2/plausible/config KV v2 SECRET_KEY_BASE, TOTP_VAULT_KEY Plausible IaC (random_passwordvault_kv_secret_v2) VaultStaticSecret plausibleplausible-config
kvv2/plausible/geoip KV v2 LICENSE_KEY (MaxMind) admin-seeded VaultStaticSecret plausible-geoipgeoipupdate sidecar
kvv2/cms/factory/turnstile KV v2 Cloudflare Turnstile config cms repo IaC factory_crowdsec_conf k8s role → Ansible CrowdSec/Traefik plugin
postgres/creds/<app> database Ephemeral DB user (username/password, 1h lease) Vault on demand (role <app>, GRANT <app>_role) VaultDynamicSecret → pod (e.g. plausible-db-credentials, crowdsec-db-credentials)
transit/.../vso-client-cache transit VSO client-cache encryption key Vault admin IaC VSO controller (encrypt/decrypt its cache)
kvv1/cloudflare/<app>* KV v1 Cloudflare DNS/edge secrets admin app CI (<app>-ops CRUD)
kvv1/ovh/<app>* KV v1 OVH secrets admin app CI (<app>-ops CRUD)
kvv1/gitea/tofu_module_reader KV v1 Bot SSH key to pull the app_roles Git module admin app CI (<app>-ops read)
kvv1/google/credentials KV v1 GCS Terraform-backend SA key admin every IaC CI job (read)

5) Secrets flow

%%{init: {'theme': 'base'}}%%
flowchart TB
    classDef eng fill:#7c3aed,stroke:#5b21b6,color:#ffffff
    classDef auth fill:#b45309,stroke:#92400e,color:#ffffff
    classDef crd fill:#059669,stroke:#047857,color:#ffffff
    classDef k8s fill:#2563eb,stroke:#1e40af,color:#ffffff
    classDef ci fill:#be123c,stroke:#9f1239,color:#ffffff

    subgraph VAULT["Vault (tools ns)"]
        KV2["kvv2 engine<br>kvv2/&lt;app&gt;/*"]:::eng
        PG["postgres engine<br>postgres/creds/&lt;app&gt;"]:::eng
        TR["transit<br>vso-client-cache"]:::eng
        KKUB["kubernetes auth<br>role &lt;app&gt; (SA AND ns)"]:::auth
        KJWT["gitea_jwt auth<br>gitea_cicd_&lt;app&gt;"]:::auth
    end

    subgraph RUNTIME["Runtime path"]
        VA["VaultAuth<br>role &lt;app&gt;, SA &lt;app&gt;"]:::crd
        VSS["VaultStaticSecret<br>kvv2/&lt;app&gt;/config"]:::crd
        VDS["VaultDynamicSecret<br>postgres/creds/&lt;app&gt;"]:::crd
        SEC["k8s Secret<br>&lt;app&gt;-config / -db-credentials"]:::k8s
        POD["App pod<br>(SA &lt;app&gt;)"]:::k8s
    end

    subgraph CICD["CI path"]
        GHA["Gitea Actions<br>OpenTofu job"]:::ci
        TOFU["apply app_roles<br>(under &lt;app&gt;-ops)"]:::ci
    end

    KKUB --> VA
    VA --> VSS
    VA --> VDS
    KV2 --> VSS
    PG --> VDS
    VSS --> SEC
    VDS -- "rolloutRestart on rotation" --> SEC
    SEC --> POD
    TR -. "encrypts client cache" .-> VA

    GHA -- "JWT login" --> KJWT
    KJWT --> TOFU
    TOFU -- "creates" --> KKUB
    TOFU -- "creates" --> PG
  1. Vault mounts the engines (kvv2, postgres, transit) and the two auth backends (kubernetes, gitea_jwt), all in the tools namespace.
  2. A pod's VaultAuth logs in through the kubernetes backend with SA <app> against role <app>; the role accepts only when both the SA name and its namespace match (AND).
  3. VaultStaticSecret reads kvv2/<app>/config and VaultDynamicSecret reads postgres/creds/<app> using that auth; VSO writes the values into ordinary k8s Secret objects.
  4. The pod consumes the Secret (env or volume); on a dynamic-cred rotation VSO restarts the rolloutRestartTargets Deployment so it picks up the new credentials.
  5. The transit key vso-client-cache encrypts VSO's client cache so an operator restart doesn't trigger a re-auth storm.
  6. On the CI side, a Gitea Actions OpenTofu job logs into the gitea_jwt backend as gitea_cicd_<app> (audience = the Gitea OAuth app id, identity from the email claim).
  7. Running under the <app>-ops policy, that job applies the app_roles module, creating/updating the Kubernetes auth role and the Postgres dynamic role for <app> — closing the loop so the runtime path in steps 2-4 works.

Gotchas

  • Vault must be unsealed after every restart. Sealed Vault → all VSO reads fail → dynamic-secret consumers won't start. See storage-and-recovery.
  • The Kubernetes auth role binds SA and namespace (AND). The wrong namespace, or a different SA in the right namespace, is rejected. Apps in ns tools (CrowdSec, Plausible) widen the binding via service_account_namespaces.
  • The Postgres dynamic role depends on <app>_role existing. GRANT <app>_role TO {{name}} (create) and REASSIGN OWNED BY {{name}} TO <app>_role (revoke) both fail if factory's postgres IaC hasn't created the <app>_role non-login role first. Order: factory postgres/iac → tools app_roles.
  • The ops_policies vs policies key mismatch in terraform.tfvars / variables.tf (see §2) — read both when adding an app's extra ops policy.
  • The sandbox uses a separate Vault. Per the safe-env ADR, the prod-like sandbox stands up its own Vault instance; none of the paths or roles above are shared with it. Don't assume a secret seeded in prod exists in the sandbox.