Two code-grounded tree-docs guidebooks under vibe/guidebooks/, drilling into the lab-ecosystem 02-tools and 03-cms pages (bidirectional): - tools/ : hub + components.md (Vault+VSO, Prometheus, Grafana, CrowdSec, pgbouncer, Redis/KeyDB, Plausible, ClickHouse; pgcat/tool as Tier-2) + secrets-and-vso.md (Vault engines/auth, the app_roles/app_policy modules = the <app> join-key machinery, VSO CRDs, secret-paths inventory). - cms/ : hub + site.md (Nuxt + dual Pages/k3s deploy) + cloudflare.md (zone via OVH->CF, Pages, cloudflared tunnel, Turnstile, R2 state) + zoho-email.md (OAuth, MX/SPF/DKIM/DMARC/BIMI, the 7 aliases). Sibling-repo code linked via full gitea URLs; vibe-internal links bidirectional. Reconciled the cloudflared tunnel token path to kvv2 cms/cloudflared (the chart VaultStaticSecret is kv-v2; the kvv1 tofu reference is a commented-out stub). 6 mermaid diagrams MCP-validated; zero dead links. Lab Cartographer cohort. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
22 KiB
vibe > Guidebooks > Tools > Secrets & VSO
Tools — Secrets & VSO
Status: ✅ Active Last Updated: 2026-06-23 Upstream: Tools · Components Downstream: consumed by every
tools-namespace pod and by every app's CI/CD Related: secrets-and-vault concept · naming-conventions concept · storage-and-recovery · tofu CI apply flow · postgres IaC · safe-env ADR
This page maps how secrets live in HashiCorp Vault (engines, auth backends) and how they reach Kubernetes pods via the Vault Secrets Operator (VSO). The keystone is the app_policy + app_roles module pair: the machinery that turns a single <app> name into a matched set of Vault policies, roles, and CI identities — the same <app> join key documented in the naming-conventions concept.
Vault itself runs as a component in the tools namespace; see the Components page for its deploy shape. The admin/bootstrap layer (the kvv1 engine, the gitea_jwt auth backend, the base gitea_cicd role, the Kubernetes auth backend mount) is created by factory's Ansible-managed Vault Terraform in hashicorp_vault.tf; everything in this page that is per-app is created by the IaC under hashicorp-vault/iac.
Caution
Vault runs standalone with file/raft storage and starts sealed after any restart or node reboot. Until it is unsealed, every VSO read fails and no app can fetch DB creds or config — pods that depend on a
VaultDynamicSecretwill not start. Unseal procedure and key custody live in storage-and-recovery.
1) Vault engines & auth backends
All engines below are mounted by hashicorp-vault/iac/main.tf except kvv1, which is bootstrapped by factory's Ansible Vault Terraform.
| Mount | Type | Holds | Defined in |
|---|---|---|---|
kvv1 |
KV v1 | Admin / cloud secrets: kvv1/google/credentials, kvv1/gitea/*, kvv1/cloudflare/*, kvv1/ovh/*, kvv1/postgres/credentials, kvv1/admin/* |
factory hashicorp_vault.tf |
kvv2 |
KV v2 (versioned) | Per-app config secrets under kvv2/<app>/* |
main.tf |
transit |
transit | The VSO client-cache encryption key vso-client-cache — lets VSO persist its client cache encrypted so it survives an operator restart without re-auth storms |
main.tf |
postgres |
database | Dynamic Postgres creds at postgres/creds/<app>; connects to the DB through pgbouncer.tools:5432 using the credentials_editor root account |
main.tf |
The postgres connection is configured with allowed_roles = ["*"] and a root-rotation statement (ALTER USER … WITH PASSWORD); the editor username/password come from the sensitive POSTGRES_CREDENTIALS_EDITOR_* variables.
Auth backends
| Backend | Mount | Who uses it | Role(s) |
|---|---|---|---|
kubernetes |
kubernetes |
VSO controller + every app pod's ServiceAccount | vault-secret-operator (VSO itself), <app> (one per app), factory_crowdsec_conf |
gitea_jwt |
gitea_jwt |
CI/OpenTofu jobs running in Gitea Actions | gitea_cicd (base, factory-bootstrapped) + per-app gitea_cicd_<app> |
kubernetesauth (main.tf) is configured againsthttps://kubernetes.default.svc:443. The VSO rolevault-secret-operatorbinds SAhashicorp-vault-vault-secrets-operator-controller-managerin nstools,audience = vault, and carries theedit-vso-client-cachepolicy (encrypt/decrypt ontransit/.../vso-client-cache).gitea_jwtis the OIDC/JWT backend for CI. Its backend,default_role = gitea_cicd, and the basegitea_cicdrole are created by factory's Vault bootstrap; the Vault provider in each IaC project logs in viaauth_login_jwt { mount = "gitea_jwt", role = "gitea_cicd[_<app>]" }using theTERRAFORM_VAULT_AUTH_JWTenv var. See the tofu CI apply flow for how the token is minted in the pipeline.
Terraform state
Each IaC project keeps its state in the arcodange-tf GCS bucket under a distinct prefix:
| Project | GCS prefix |
|---|---|
| Vault admin/app machinery | tools/hashicorp_vault/main |
| Plausible | tools/plausible/main |
| CrowdSec | tools/crowdsec/main |
2) The app_policy + app_roles modules — the <app> join-key machinery
Important
These two modules are the heart of the secrets layer. Given a single
<app>name they emit a matched, name-derived set of Vault objects so that an app's runtime, its CI, and its database identity all line up on the same key. This is the Vault half of the lab-wide naming convention: the same<app>string also names the Kubernetes namespace, the ServiceAccount, the Postgres<app>_role, and the Gitea repo.
The two modules live on opposite sides of the trust boundary:
modules/app_policyis declared once, centrally, in the Vault admin project (main.tf,for_eachovervar.applications). It creates the policies and the CI identity — the privileged bits — so the app's own repo never holds them.modules/app_rolesis declared by the subordinate app project (pulled over SSH as a Git module), running under the<app>-ops policy. It creates the roles the app needs.
app_roles — runtime roles (declared by the app repo)
For <app>, app_roles/main.tf creates:
| Resource | Path | Key settings |
|---|---|---|
| Kubernetes auth role | auth/kubernetes/role/<app> |
bound_service_account_names = [<app>] + extras, bound_service_account_namespaces = [<app>] + extras, token_ttl = 3600 (1h), token_policies = [default, <app>], audience = vault |
| Postgres dynamic role | postgres/roles/<app> |
db_name = postgres; creation SQL: CREATE ROLE "{{name}}" WITH LOGIN PASSWORD … VALID UNTIL … then GRANT <app>_role TO "{{name}}"; revocation: REASSIGN OWNED BY "{{name}}" TO <app>_role then REVOKE ALL ON DATABASE <app> FROM "{{name}}" |
Important
The Postgres dynamic role's creation SQL does
GRANT <app>_role TO {{name}}and its revocation doesREASSIGN OWNED BY {{name}} TO <app>_role. The non-login<app>_rolemust already exist in Postgres — it is created by factory's postgres IaC (postgresql_role.app_role["<app>"], owner of the<app>database). If that role is missing, every ephemeral-user creation/revocation fails. This is the ordering dependency between the two repos: factory postgres/iac before tools app_roles.
Note
The Kubernetes auth role binds both SA names and namespaces — the check is an AND. A token presenting SA
<app>from the wrong namespace (or any other SA from ns<app>) is rejected. The default binding is SA<app>in ns<app>; theservice_account_names/service_account_namespacesinputs widen it (e.g. CrowdSec/Plausible run in nstools, not a namespace named after the app).
The Postgres role can be skipped with disable_database = true; the DB name defaults to <app> but can be overridden via database.
app_policy — policies + CI identity (declared centrally)
For <app>, app_policy/main.tf creates:
| Resource | Name | Grants |
|---|---|---|
| App policy | <app> |
read,list on kvv2/data/<app>/*; read on postgres/creds/<app>* — what the runtime pod can do |
| Ops policy | <app>-ops |
The CI bundle (below) |
| JWT role | gitea_cicd_<app> (mount gitea_jwt) |
token_policies = [default] + <app>'s ops_policies, bound_audiences = [gitea_app_id], user_claim = email, role_type = jwt |
| Identity group | <app>-ops |
Internal group carrying the <app>-ops policy, so Vault users mapped to their Gitea entity inherit ops rights |
The <app>-ops policy is the privilege set a CI job needs to manage the app's own corner of Vault and the clouds:
create/updateonauth/token/create;readonsys/mounts/auth/*(so the Vault provider works);- full CRUD on
postgres/roles/<app>*and onauth/kubernetes/role/<app>*(soapp_rolescan apply) — the k8s-role rule is parameter-constrained: it may only setbound_service_account_names/bound_service_account_namespacesto the whitelisted[<app>] + extraslists andtoken_policiesto["default","<app>"], preventing a CI job from minting a role with broader bindings; - full CRUD on the app's KV-v2 data, delete/undelete/destroy, and
metadata(kvv2/data|delete|undelete|destroy|metadata/<app>/*); readonkvv1/google/credentials(the GCS backend SA),kvv1/gitea/tofu_module_reader(the bot SSH key that lets CI pull theapp_rolesGit module);- CRUD on
kvv1/cloudflare/<app>*andkvv1/ovh/<app>*(cloud DNS/edge secrets scoped to the app).
Note
The policy document is post-processed with two
replace()calls. The Vault provider serializes the whitelisted list parameters as a JSON-encoded string ("["webapp"]"); the replaces strip the outer quotes so Vault receives a real list. If you change thoseallowed_parameterblocks, keep the replaces in sync.
Apps wired in terraform.tfvars
terraform.tfvars declares the applications set the central app_policy for_each walks:
<app> |
Extra SA | Extra ns | Extra ops policy | Notes |
|---|---|---|---|---|
webapp |
— | — | — | defaults: SA webapp / ns webapp |
erp |
— | — | — | defaults |
cms |
cloudflared |
— | factory__cf_r2_arcodange_tf |
extra SA for the Cloudflare tunnel; extra ops policy for the CF R2 Terraform-state bucket |
crowdsec |
— | tools |
— | runs in ns tools |
plausible |
— | tools |
— | runs in ns tools |
Note
terraform.tfvarsuses the keyops_policiesfor the CMS extra policy whilevariables.tfdeclares the optional attribute aspolicies; the centralmain.tfpasseseach.value.policiesinto the module'sops_policiesinput. Read these together when adding a new app so the extra-policy list actually lands on the JWT role.
3) VSO CRDs — how a secret becomes a Kubernetes Secret
The Vault Secrets Operator watches three custom resources and writes plain Kubernetes Secret objects that pods consume normally (env / volume). The app repo ships the CRDs; the operator does the Vault round-trips.
| CRD | What it does | Refresh / rotation |
|---|---|---|
VaultAuth |
Picks the auth method (kubernetes), the mount, the Vault role (= <app>), and the pod ServiceAccount (= <app>) used to log in; references a VaultConnection (here the in-cluster default → http://hashicorp-vault.tools.svc.cluster.local:8200) |
n/a — used by the other two CRDs via vaultAuthRef |
VaultStaticSecret |
Reads a KV-v2 path → writes a k8s Secret |
refreshAfter (the lab uses 30s) |
VaultDynamicSecret |
Reads postgres/creds/<app> (a dynamic lease) → writes a k8s Secret; rolloutRestartTargets lists Deployments to restart when creds rotate |
follows the Vault lease TTL (1h); VSO renews/re-issues and restarts the targets |
Worked example — Plausible (tools namespace)
Files under plausible/resources:
VaultAuthplausible(vaultauth.yaml) —method: kubernetes,role: plausible,serviceAccount: plausible,audiences: [vault]. This is the Vault roleapp_rolescreated inplausible/iac/main.tf.VaultStaticSecretplausible(vaultsecret.yaml) —kvv2pathplausible/config→ Secretplausible-config(refreshAfter: 30s). The config payload holdsSECRET_KEY_BASEandTOTP_VAULT_KEY, both generated by Terraform (random_password, base64-encoded) and written tokvv2/plausible/configviavault_kv_secret_v2in the plausible IaC.VaultStaticSecretplausible-geoip(geoipsecret.yaml) —kvv2pathplausible/geoip→ Secretplausible-geoipexposingLICENSE_KEY(the MaxMind GeoIP license, an admin-seeded value, fed to thegeoipupdatesidecar via envGEOIPUPDATE_LICENSE_KEY).VaultDynamicSecretplausible-db-credentials(vaultdynamicsecret.yaml) —postgres/creds/plausible→ Secretplausible-db-credentials;rolloutRestartTargetsrestarts Deploymentplausible. An init container (add-initcontainer.yaml) readsusername/passwordfrom that Secret and writesDATABASE_URL(postgres://${DB_USER}:${DB_PASS}@${DB_HOST}:${DB_PORT}/${DB_NAME}) into a sharedgenerated-secretsvolume the app reads.
Worked example — CrowdSec (tools namespace)
Templates under crowdsec/templates:
VaultAuthcrowdsec(vaultauth.yaml) —role: crowdsec,serviceAccount: crowdsec.VaultDynamicSecretcrowdsec-db-credentials(vaultdynamicsecret.yaml) —postgres/creds/crowdsec→ Secretcrowdsec-db-credentials;rolloutRestartTargetsrestarts Deploymentcrowdsec-lapi(the Local API that owns the DB connection).
factory_auth.tf — the Ansible CrowdSec/Traefik plugin reader
Separately from the per-app machinery, factory_auth.tf wires a Kubernetes auth role factory_crowdsec_conf for SA factory-ansible-tool-crowdsec-traefik-plugin in ns kube-system (token_ttl = 3600). It carries policy factory_crowdsec_conf, which grants read,list on kvv2/data/cms/factory/*. This is how the Ansible-deployed CrowdSec/Traefik bouncer plugin reads the Turnstile configuration that the cms repo writes into kvv2/cms/factory/* — a cross-repo handoff entirely through Vault, with no shared file. The producer side (the Turnstile widget and the vault_kv_secret_v2 write) is documented on the CMS Cloudflare page.
4) Secret-paths inventory
| Path | Engine | Holds | Producer | Consumer |
|---|---|---|---|---|
kvv2/<app>/config |
KV v2 | App runtime config | app CI (KV CRUD via <app>-ops) |
VaultStaticSecret → pod |
kvv2/plausible/config |
KV v2 | SECRET_KEY_BASE, TOTP_VAULT_KEY |
Plausible IaC (random_password → vault_kv_secret_v2) |
VaultStaticSecret plausible → plausible-config |
kvv2/plausible/geoip |
KV v2 | LICENSE_KEY (MaxMind) |
admin-seeded | VaultStaticSecret plausible-geoip → geoipupdate sidecar |
kvv2/cms/factory/turnstile |
KV v2 | Cloudflare Turnstile config | cms repo IaC |
factory_crowdsec_conf k8s role → Ansible CrowdSec/Traefik plugin |
postgres/creds/<app> |
database | Ephemeral DB user (username/password, 1h lease) |
Vault on demand (role <app>, GRANT <app>_role) |
VaultDynamicSecret → pod (e.g. plausible-db-credentials, crowdsec-db-credentials) |
transit/.../vso-client-cache |
transit | VSO client-cache encryption key | Vault admin IaC | VSO controller (encrypt/decrypt its cache) |
kvv1/cloudflare/<app>* |
KV v1 | Cloudflare DNS/edge secrets | admin | app CI (<app>-ops CRUD) |
kvv1/ovh/<app>* |
KV v1 | OVH secrets | admin | app CI (<app>-ops CRUD) |
kvv1/gitea/tofu_module_reader |
KV v1 | Bot SSH key to pull the app_roles Git module |
admin | app CI (<app>-ops read) |
kvv1/google/credentials |
KV v1 | GCS Terraform-backend SA key | admin | every IaC CI job (read) |
5) Secrets flow
%%{init: {'theme': 'base'}}%%
flowchart TB
classDef eng fill:#7c3aed,stroke:#5b21b6,color:#ffffff
classDef auth fill:#b45309,stroke:#92400e,color:#ffffff
classDef crd fill:#059669,stroke:#047857,color:#ffffff
classDef k8s fill:#2563eb,stroke:#1e40af,color:#ffffff
classDef ci fill:#be123c,stroke:#9f1239,color:#ffffff
subgraph VAULT["Vault (tools ns)"]
KV2["kvv2 engine<br>kvv2/<app>/*"]:::eng
PG["postgres engine<br>postgres/creds/<app>"]:::eng
TR["transit<br>vso-client-cache"]:::eng
KKUB["kubernetes auth<br>role <app> (SA AND ns)"]:::auth
KJWT["gitea_jwt auth<br>gitea_cicd_<app>"]:::auth
end
subgraph RUNTIME["Runtime path"]
VA["VaultAuth<br>role <app>, SA <app>"]:::crd
VSS["VaultStaticSecret<br>kvv2/<app>/config"]:::crd
VDS["VaultDynamicSecret<br>postgres/creds/<app>"]:::crd
SEC["k8s Secret<br><app>-config / -db-credentials"]:::k8s
POD["App pod<br>(SA <app>)"]:::k8s
end
subgraph CICD["CI path"]
GHA["Gitea Actions<br>OpenTofu job"]:::ci
TOFU["apply app_roles<br>(under <app>-ops)"]:::ci
end
KKUB --> VA
VA --> VSS
VA --> VDS
KV2 --> VSS
PG --> VDS
VSS --> SEC
VDS -- "rolloutRestart on rotation" --> SEC
SEC --> POD
TR -. "encrypts client cache" .-> VA
GHA -- "JWT login" --> KJWT
KJWT --> TOFU
TOFU -- "creates" --> KKUB
TOFU -- "creates" --> PG
- Vault mounts the engines (
kvv2,postgres,transit) and the two auth backends (kubernetes,gitea_jwt), all in thetoolsnamespace. - A pod's
VaultAuthlogs in through thekubernetesbackend with SA<app>against role<app>; the role accepts only when both the SA name and its namespace match (AND). VaultStaticSecretreadskvv2/<app>/configandVaultDynamicSecretreadspostgres/creds/<app>using that auth; VSO writes the values into ordinary k8sSecretobjects.- The pod consumes the Secret (env or volume); on a dynamic-cred rotation VSO restarts the
rolloutRestartTargetsDeployment so it picks up the new credentials. - The
transitkeyvso-client-cacheencrypts VSO's client cache so an operator restart doesn't trigger a re-auth storm. - On the CI side, a Gitea Actions OpenTofu job logs into the
gitea_jwtbackend asgitea_cicd_<app>(audience = the Gitea OAuth app id, identity from theemailclaim). - Running under the
<app>-opspolicy, that job applies theapp_rolesmodule, creating/updating the Kubernetes auth role and the Postgres dynamic role for<app>— closing the loop so the runtime path in steps 2-4 works.
Gotchas
- Vault must be unsealed after every restart. Sealed Vault → all VSO reads fail → dynamic-secret consumers won't start. See storage-and-recovery.
- The Kubernetes auth role binds SA and namespace (AND). The wrong namespace, or a different SA in the right namespace, is rejected. Apps in ns
tools(CrowdSec, Plausible) widen the binding viaservice_account_namespaces. - The Postgres dynamic role depends on
<app>_roleexisting.GRANT <app>_role TO {{name}}(create) andREASSIGN OWNED BY {{name}} TO <app>_role(revoke) both fail if factory's postgres IaC hasn't created the<app>_rolenon-login role first. Order: factory postgres/iac → tools app_roles. - The
ops_policiesvspolicieskey mismatch interraform.tfvars/variables.tf(see §2) — read both when adding an app's extra ops policy. - The sandbox uses a separate Vault. Per the safe-env ADR, the prod-like sandbox stands up its own Vault instance; none of the paths or roles above are shared with it. Don't assume a secret seeded in prod exists in the sandbox.