Two code-grounded tree-docs guidebooks under vibe/guidebooks/, drilling into the lab-ecosystem 02-tools and 03-cms pages (bidirectional): - tools/ : hub + components.md (Vault+VSO, Prometheus, Grafana, CrowdSec, pgbouncer, Redis/KeyDB, Plausible, ClickHouse; pgcat/tool as Tier-2) + secrets-and-vso.md (Vault engines/auth, the app_roles/app_policy modules = the <app> join-key machinery, VSO CRDs, secret-paths inventory). - cms/ : hub + site.md (Nuxt + dual Pages/k3s deploy) + cloudflare.md (zone via OVH->CF, Pages, cloudflared tunnel, Turnstile, R2 state) + zoho-email.md (OAuth, MX/SPF/DKIM/DMARC/BIMI, the 7 aliases). Sibling-repo code linked via full gitea URLs; vibe-internal links bidirectional. Reconciled the cloudflared tunnel token path to kvv2 cms/cloudflared (the chart VaultStaticSecret is kv-v2; the kvv1 tofu reference is a commented-out stub). 6 mermaid diagrams MCP-validated; zero dead links. Lab Cartographer cohort. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
235 lines
22 KiB
Markdown
235 lines
22 KiB
Markdown
[vibe](../../README.md) > [Guidebooks](../README.md) > [Tools](README.md) > **Secrets & VSO**
|
|
|
|
# Tools — Secrets & VSO
|
|
|
|
> **Status:** ✅ Active
|
|
> **Last Updated:** 2026-06-23
|
|
> **Upstream:** [Tools](README.md) · [Components](components.md)
|
|
> **Downstream:** consumed by every `tools`-namespace pod and by every app's CI/CD
|
|
> **Related:** [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) · [naming-conventions concept](../lab-ecosystem/naming-conventions.md) · [storage-and-recovery](../lab-ecosystem/storage-and-recovery.md) · [tofu CI apply flow](../factory-provisioning/opentofu/ci-apply-flow.md) · [postgres IaC](../factory-provisioning/opentofu/postgres-iac.md) · [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md)
|
|
|
|
This page maps how secrets live in **HashiCorp Vault** (engines, auth backends) and how they reach **Kubernetes pods** via the **Vault Secrets Operator (VSO)**. The keystone is the **`app_policy` + `app_roles` module pair**: the machinery that turns a single `<app>` name into a matched set of Vault policies, roles, and CI identities — the same `<app>` join key documented in the [naming-conventions concept](../lab-ecosystem/naming-conventions.md).
|
|
|
|
Vault itself runs as a component in the `tools` namespace; see the [Components](components.md) page for its deploy shape. The admin/bootstrap layer (the `kvv1` engine, the `gitea_jwt` auth backend, the base `gitea_cicd` role, the Kubernetes auth backend mount) is created **by factory's Ansible-managed Vault Terraform** in [`hashicorp_vault.tf`](https://gitea.arcodange.lab/arcodange-org/factory/src/branch/main/ansible/arcodange/factory/playbooks/tools/roles/hashicorp_vault/files/hashicorp_vault.tf); everything in this page that is *per-app* is created by the IaC under [`hashicorp-vault/iac`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac).
|
|
|
|
> [!CAUTION]
|
|
> Vault runs **standalone** with file/raft storage and starts **sealed** after any restart or node reboot. Until it is unsealed, every VSO read fails and no app can fetch DB creds or config — pods that depend on a `VaultDynamicSecret` will not start. Unseal procedure and key custody live in [storage-and-recovery](../lab-ecosystem/storage-and-recovery.md).
|
|
|
|
---
|
|
|
|
## 1) Vault engines & auth backends
|
|
|
|
All engines below are mounted by [`hashicorp-vault/iac/main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf) except `kvv1`, which is bootstrapped by factory's Ansible Vault Terraform.
|
|
|
|
| Mount | Type | Holds | Defined in |
|
|
|---|---|---|---|
|
|
| `kvv1` | KV **v1** | Admin / cloud secrets: `kvv1/google/credentials`, `kvv1/gitea/*`, `kvv1/cloudflare/*`, `kvv1/ovh/*`, `kvv1/postgres/credentials`, `kvv1/admin/*` | factory [`hashicorp_vault.tf`](https://gitea.arcodange.lab/arcodange-org/factory/src/branch/main/ansible/arcodange/factory/playbooks/tools/roles/hashicorp_vault/files/hashicorp_vault.tf) |
|
|
| `kvv2` | KV **v2** (versioned) | Per-app config secrets under `kvv2/<app>/*` | [`main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf) |
|
|
| `transit` | transit | The **VSO client-cache encryption key** `vso-client-cache` — lets VSO persist its client cache encrypted so it survives an operator restart without re-auth storms | [`main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf) |
|
|
| `postgres` | database | **Dynamic** Postgres creds at `postgres/creds/<app>`; connects to the DB through `pgbouncer.tools:5432` using the `credentials_editor` root account | [`main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf) |
|
|
|
|
The `postgres` connection is configured with `allowed_roles = ["*"]` and a root-rotation statement (`ALTER USER … WITH PASSWORD`); the editor username/password come from the sensitive `POSTGRES_CREDENTIALS_EDITOR_*` variables.
|
|
|
|
### Auth backends
|
|
|
|
| Backend | Mount | Who uses it | Role(s) |
|
|
|---|---|---|---|
|
|
| `kubernetes` | `kubernetes` | VSO controller + every app pod's ServiceAccount | `vault-secret-operator` (VSO itself), `<app>` (one per app), `factory_crowdsec_conf` |
|
|
| `gitea_jwt` | `gitea_jwt` | CI/OpenTofu jobs running in Gitea Actions | `gitea_cicd` (base, factory-bootstrapped) + per-app `gitea_cicd_<app>` |
|
|
|
|
- **`kubernetes`** auth ([`main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf)) is configured against `https://kubernetes.default.svc:443`. The VSO role `vault-secret-operator` binds SA `hashicorp-vault-vault-secrets-operator-controller-manager` in ns `tools`, `audience = vault`, and carries the `edit-vso-client-cache` policy (encrypt/decrypt on `transit/.../vso-client-cache`).
|
|
- **`gitea_jwt`** is the OIDC/JWT backend for CI. Its backend, `default_role = gitea_cicd`, and the base `gitea_cicd` role are created by factory's Vault bootstrap; the Vault provider in each IaC project logs in via `auth_login_jwt { mount = "gitea_jwt", role = "gitea_cicd[_<app>]" }` using the `TERRAFORM_VAULT_AUTH_JWT` env var. See the [tofu CI apply flow](../factory-provisioning/opentofu/ci-apply-flow.md) for how the token is minted in the pipeline.
|
|
|
|
### Terraform state
|
|
|
|
Each IaC project keeps its state in the **`arcodange-tf` GCS bucket** under a distinct prefix:
|
|
|
|
| Project | GCS prefix |
|
|
|---|---|
|
|
| Vault admin/app machinery | `tools/hashicorp_vault/main` |
|
|
| Plausible | `tools/plausible/main` |
|
|
| CrowdSec | `tools/crowdsec/main` |
|
|
|
|
---
|
|
|
|
## 2) The `app_policy` + `app_roles` modules — the `<app>` join-key machinery
|
|
|
|
> [!IMPORTANT]
|
|
> These two modules are the heart of the secrets layer. Given a single `<app>` name they emit a **matched, name-derived** set of Vault objects so that an app's runtime, its CI, and its database identity all line up on the same key. This is the Vault half of the lab-wide [naming convention](../lab-ecosystem/naming-conventions.md): the same `<app>` string also names the Kubernetes namespace, the ServiceAccount, the Postgres `<app>_role`, and the Gitea repo.
|
|
|
|
The two modules live on **opposite sides of the trust boundary**:
|
|
|
|
- [`modules/app_policy`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/modules/app_policy) is declared **once, centrally**, in the Vault admin project ([`main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf), `for_each` over `var.applications`). It creates the **policies and the CI identity** — the privileged bits — so the app's own repo never holds them.
|
|
- [`modules/app_roles`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/modules/app_roles) is declared **by the subordinate app project** (pulled over SSH as a Git module), running under the `<app>`-ops policy. It creates the **roles** the app needs.
|
|
|
|
### `app_roles` — runtime roles (declared by the app repo)
|
|
|
|
For `<app>`, [`app_roles/main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/modules/app_roles/main.tf) creates:
|
|
|
|
| Resource | Path | Key settings |
|
|
|---|---|---|
|
|
| Kubernetes auth role | `auth/kubernetes/role/<app>` | `bound_service_account_names = [<app>] + extras`, `bound_service_account_namespaces = [<app>] + extras`, `token_ttl = 3600` (1h), `token_policies = [default, <app>]`, `audience = vault` |
|
|
| Postgres dynamic role | `postgres/roles/<app>` | `db_name = postgres`; creation SQL: `CREATE ROLE "{{name}}" WITH LOGIN PASSWORD … VALID UNTIL …` then `GRANT <app>_role TO "{{name}}"`; revocation: `REASSIGN OWNED BY "{{name}}" TO <app>_role` then `REVOKE ALL ON DATABASE <app> FROM "{{name}}"` |
|
|
|
|
> [!IMPORTANT]
|
|
> The Postgres dynamic role's creation SQL does `GRANT <app>_role TO {{name}}` and its revocation does `REASSIGN OWNED BY {{name}} TO <app>_role`. **The non-login `<app>_role` must already exist in Postgres** — it is created by factory's [postgres IaC](../factory-provisioning/opentofu/postgres-iac.md) (`postgresql_role.app_role["<app>"]`, owner of the `<app>` database). If that role is missing, every ephemeral-user creation/revocation fails. This is the ordering dependency between the two repos: **factory postgres/iac before tools app_roles**.
|
|
|
|
> [!NOTE]
|
|
> The Kubernetes auth role binds **both** SA names **and** namespaces — the check is an **AND**. A token presenting SA `<app>` from the wrong namespace (or any other SA from ns `<app>`) is rejected. The default binding is SA `<app>` in ns `<app>`; the `service_account_names` / `service_account_namespaces` inputs widen it (e.g. CrowdSec/Plausible run in ns `tools`, not a namespace named after the app).
|
|
|
|
The Postgres role can be skipped with `disable_database = true`; the DB name defaults to `<app>` but can be overridden via `database`.
|
|
|
|
### `app_policy` — policies + CI identity (declared centrally)
|
|
|
|
For `<app>`, [`app_policy/main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/modules/app_policy/main.tf) creates:
|
|
|
|
| Resource | Name | Grants |
|
|
|---|---|---|
|
|
| **App policy** | `<app>` | `read,list` on `kvv2/data/<app>/*`; `read` on `postgres/creds/<app>*` — what the runtime pod can do |
|
|
| **Ops policy** | `<app>-ops` | The CI bundle (below) |
|
|
| **JWT role** | `gitea_cicd_<app>` (mount `gitea_jwt`) | `token_policies = [default] + <app>'s ops_policies`, `bound_audiences = [gitea_app_id]`, `user_claim = email`, `role_type = jwt` |
|
|
| **Identity group** | `<app>-ops` | Internal group carrying the `<app>-ops` policy, so Vault users mapped to their Gitea entity inherit ops rights |
|
|
|
|
The **`<app>-ops` policy** is the privilege set a CI job needs to *manage* the app's own corner of Vault and the clouds:
|
|
|
|
- `create/update` on `auth/token/create`; `read` on `sys/mounts/auth/*` (so the Vault provider works);
|
|
- full CRUD on `postgres/roles/<app>*` and on `auth/kubernetes/role/<app>*` (so `app_roles` can apply) — the k8s-role rule is **parameter-constrained**: it may only set `bound_service_account_names`/`bound_service_account_namespaces` to the whitelisted `[<app>] + extras` lists and `token_policies` to `["default","<app>"]`, preventing a CI job from minting a role with broader bindings;
|
|
- full CRUD on the app's KV-v2 data, delete/undelete/destroy, and `metadata` (`kvv2/data|delete|undelete|destroy|metadata/<app>/*`);
|
|
- `read` on `kvv1/google/credentials` (the GCS backend SA), `kvv1/gitea/tofu_module_reader` (the bot SSH key that lets CI pull the `app_roles` Git module);
|
|
- CRUD on `kvv1/cloudflare/<app>*` and `kvv1/ovh/<app>*` (cloud DNS/edge secrets scoped to the app).
|
|
|
|
> [!NOTE]
|
|
> The policy document is post-processed with two `replace()` calls. The Vault provider serializes the whitelisted list parameters as a JSON-encoded string (`"["webapp"]"`); the replaces strip the outer quotes so Vault receives a real list. If you change those `allowed_parameter` blocks, keep the replaces in sync.
|
|
|
|
### Apps wired in `terraform.tfvars`
|
|
|
|
[`terraform.tfvars`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/terraform.tfvars) declares the `applications` set the central `app_policy` `for_each` walks:
|
|
|
|
| `<app>` | Extra SA | Extra ns | Extra ops policy | Notes |
|
|
|---|---|---|---|---|
|
|
| `webapp` | — | — | — | defaults: SA `webapp` / ns `webapp` |
|
|
| `erp` | — | — | — | defaults |
|
|
| `cms` | `cloudflared` | — | `factory__cf_r2_arcodange_tf` | extra SA for the Cloudflare tunnel; extra ops policy for the CF R2 Terraform-state bucket |
|
|
| `crowdsec` | — | `tools` | — | runs in ns `tools` |
|
|
| `plausible` | — | `tools` | — | runs in ns `tools` |
|
|
|
|
> [!NOTE]
|
|
> `terraform.tfvars` uses the key `ops_policies` for the CMS extra policy while `variables.tf` declares the optional attribute as `policies`; the central `main.tf` passes `each.value.policies` into the module's `ops_policies` input. Read these together when adding a new app so the extra-policy list actually lands on the JWT role.
|
|
|
|
---
|
|
|
|
## 3) VSO CRDs — how a secret becomes a Kubernetes Secret
|
|
|
|
The [Vault Secrets Operator](https://developer.hashicorp.com/vault/docs/platform/k8s/vso) watches three custom resources and writes plain Kubernetes `Secret` objects that pods consume normally (env / volume). The app repo ships the CRDs; the operator does the Vault round-trips.
|
|
|
|
| CRD | What it does | Refresh / rotation |
|
|
|---|---|---|
|
|
| `VaultAuth` | Picks the auth method (`kubernetes`), the `mount`, the Vault `role` (= `<app>`), and the pod **ServiceAccount** (= `<app>`) used to log in; references a `VaultConnection` (here the in-cluster `default` → `http://hashicorp-vault.tools.svc.cluster.local:8200`) | n/a — used by the other two CRDs via `vaultAuthRef` |
|
|
| `VaultStaticSecret` | Reads a **KV-v2** path → writes a k8s `Secret` | `refreshAfter` (the lab uses `30s`) |
|
|
| `VaultDynamicSecret` | Reads `postgres/creds/<app>` (a **dynamic** lease) → writes a k8s `Secret`; `rolloutRestartTargets` lists Deployments to restart when creds rotate | follows the Vault lease TTL (1h); VSO renews/re-issues and restarts the targets |
|
|
|
|
### Worked example — Plausible (`tools` namespace)
|
|
|
|
Files under [`plausible/resources`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/resources):
|
|
|
|
1. **`VaultAuth` `plausible`** ([`vaultauth.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/resources/vaultauth.yaml)) — `method: kubernetes`, `role: plausible`, `serviceAccount: plausible`, `audiences: [vault]`. This is the Vault role `app_roles` created in [`plausible/iac/main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/iac/main.tf).
|
|
2. **`VaultStaticSecret` `plausible`** ([`vaultsecret.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/resources/vaultsecret.yaml)) — `kvv2` path `plausible/config` → Secret `plausible-config` (`refreshAfter: 30s`). The config payload holds **`SECRET_KEY_BASE`** and **`TOTP_VAULT_KEY`**, both **generated by Terraform** (`random_password`, base64-encoded) and written to `kvv2/plausible/config` via `vault_kv_secret_v2` in the plausible IaC.
|
|
3. **`VaultStaticSecret` `plausible-geoip`** ([`geoipsecret.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/resources/geoipsecret.yaml)) — `kvv2` path `plausible/geoip` → Secret `plausible-geoip` exposing **`LICENSE_KEY`** (the MaxMind GeoIP license, an admin-seeded value, fed to the `geoipupdate` sidecar via env `GEOIPUPDATE_LICENSE_KEY`).
|
|
4. **`VaultDynamicSecret` `plausible-db-credentials`** ([`vaultdynamicsecret.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/resources/vaultdynamicsecret.yaml)) — `postgres/creds/plausible` → Secret `plausible-db-credentials`; `rolloutRestartTargets` restarts Deployment `plausible`. An **init container** ([`add-initcontainer.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/add-initcontainer.yaml)) reads `username`/`password` from that Secret and writes `DATABASE_URL` (`postgres://${DB_USER}:${DB_PASS}@${DB_HOST}:${DB_PORT}/${DB_NAME}`) into a shared `generated-secrets` volume the app reads.
|
|
|
|
### Worked example — CrowdSec (`tools` namespace)
|
|
|
|
Templates under [`crowdsec/templates`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec/templates):
|
|
|
|
1. **`VaultAuth` `crowdsec`** ([`vaultauth.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec/templates/vaultauth.yaml)) — `role: crowdsec`, `serviceAccount: crowdsec`.
|
|
2. **`VaultDynamicSecret` `crowdsec-db-credentials`** ([`vaultdynamicsecret.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec/templates/vaultdynamicsecret.yaml)) — `postgres/creds/crowdsec` → Secret `crowdsec-db-credentials`; `rolloutRestartTargets` restarts Deployment **`crowdsec-lapi`** (the Local API that owns the DB connection).
|
|
|
|
### `factory_auth.tf` — the Ansible CrowdSec/Traefik plugin reader
|
|
|
|
Separately from the per-app machinery, [`factory_auth.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/factory_auth.tf) wires a Kubernetes auth role **`factory_crowdsec_conf`** for SA **`factory-ansible-tool-crowdsec-traefik-plugin`** in ns **`kube-system`** (`token_ttl = 3600`). It carries policy `factory_crowdsec_conf`, which grants `read,list` on **`kvv2/data/cms/factory/*`**. This is how the Ansible-deployed CrowdSec/Traefik bouncer plugin reads the **Turnstile** configuration that the [`cms` repo](https://gitea.arcodange.lab/arcodange-org/cms) writes into `kvv2/cms/factory/*` — a cross-repo handoff entirely through Vault, with no shared file. The producer side (the Turnstile widget and the `vault_kv_secret_v2` write) is documented on the [CMS Cloudflare page](../cms/cloudflare.md).
|
|
|
|
---
|
|
|
|
## 4) Secret-paths inventory
|
|
|
|
| Path | Engine | Holds | Producer | Consumer |
|
|
|---|---|---|---|---|
|
|
| `kvv2/<app>/config` | KV v2 | App runtime config | app CI (KV CRUD via `<app>-ops`) | `VaultStaticSecret` → pod |
|
|
| `kvv2/plausible/config` | KV v2 | `SECRET_KEY_BASE`, `TOTP_VAULT_KEY` | Plausible IaC (`random_password` → `vault_kv_secret_v2`) | `VaultStaticSecret plausible` → `plausible-config` |
|
|
| `kvv2/plausible/geoip` | KV v2 | `LICENSE_KEY` (MaxMind) | admin-seeded | `VaultStaticSecret plausible-geoip` → `geoipupdate` sidecar |
|
|
| `kvv2/cms/factory/turnstile` | KV v2 | Cloudflare Turnstile config | `cms` repo IaC | `factory_crowdsec_conf` k8s role → Ansible CrowdSec/Traefik plugin |
|
|
| `postgres/creds/<app>` | database | Ephemeral DB user (`username`/`password`, 1h lease) | Vault on demand (role `<app>`, `GRANT <app>_role`) | `VaultDynamicSecret` → pod (e.g. `plausible-db-credentials`, `crowdsec-db-credentials`) |
|
|
| `transit/.../vso-client-cache` | transit | VSO client-cache encryption key | Vault admin IaC | VSO controller (encrypt/decrypt its cache) |
|
|
| `kvv1/cloudflare/<app>*` | KV v1 | Cloudflare DNS/edge secrets | admin | app CI (`<app>-ops` CRUD) |
|
|
| `kvv1/ovh/<app>*` | KV v1 | OVH secrets | admin | app CI (`<app>-ops` CRUD) |
|
|
| `kvv1/gitea/tofu_module_reader` | KV v1 | Bot SSH key to pull the `app_roles` Git module | admin | app CI (`<app>-ops` read) |
|
|
| `kvv1/google/credentials` | KV v1 | GCS Terraform-backend SA key | admin | every IaC CI job (read) |
|
|
|
|
---
|
|
|
|
## 5) Secrets flow
|
|
|
|
```mermaid
|
|
%%{init: {'theme': 'base'}}%%
|
|
flowchart TB
|
|
classDef eng fill:#7c3aed,stroke:#5b21b6,color:#ffffff
|
|
classDef auth fill:#b45309,stroke:#92400e,color:#ffffff
|
|
classDef crd fill:#059669,stroke:#047857,color:#ffffff
|
|
classDef k8s fill:#2563eb,stroke:#1e40af,color:#ffffff
|
|
classDef ci fill:#be123c,stroke:#9f1239,color:#ffffff
|
|
|
|
subgraph VAULT["Vault (tools ns)"]
|
|
KV2["kvv2 engine<br>kvv2/<app>/*"]:::eng
|
|
PG["postgres engine<br>postgres/creds/<app>"]:::eng
|
|
TR["transit<br>vso-client-cache"]:::eng
|
|
KKUB["kubernetes auth<br>role <app> (SA AND ns)"]:::auth
|
|
KJWT["gitea_jwt auth<br>gitea_cicd_<app>"]:::auth
|
|
end
|
|
|
|
subgraph RUNTIME["Runtime path"]
|
|
VA["VaultAuth<br>role <app>, SA <app>"]:::crd
|
|
VSS["VaultStaticSecret<br>kvv2/<app>/config"]:::crd
|
|
VDS["VaultDynamicSecret<br>postgres/creds/<app>"]:::crd
|
|
SEC["k8s Secret<br><app>-config / -db-credentials"]:::k8s
|
|
POD["App pod<br>(SA <app>)"]:::k8s
|
|
end
|
|
|
|
subgraph CICD["CI path"]
|
|
GHA["Gitea Actions<br>OpenTofu job"]:::ci
|
|
TOFU["apply app_roles<br>(under <app>-ops)"]:::ci
|
|
end
|
|
|
|
KKUB --> VA
|
|
VA --> VSS
|
|
VA --> VDS
|
|
KV2 --> VSS
|
|
PG --> VDS
|
|
VSS --> SEC
|
|
VDS -- "rolloutRestart on rotation" --> SEC
|
|
SEC --> POD
|
|
TR -. "encrypts client cache" .-> VA
|
|
|
|
GHA -- "JWT login" --> KJWT
|
|
KJWT --> TOFU
|
|
TOFU -- "creates" --> KKUB
|
|
TOFU -- "creates" --> PG
|
|
```
|
|
|
|
1. **Vault** mounts the engines (`kvv2`, `postgres`, `transit`) and the two auth backends (`kubernetes`, `gitea_jwt`), all in the `tools` namespace.
|
|
2. A pod's `VaultAuth` logs in through the **`kubernetes`** backend with SA `<app>` against role `<app>`; the role accepts only when **both** the SA name **and** its namespace match (AND).
|
|
3. `VaultStaticSecret` reads `kvv2/<app>/config` and `VaultDynamicSecret` reads `postgres/creds/<app>` using that auth; VSO writes the values into ordinary k8s `Secret` objects.
|
|
4. The pod consumes the Secret (env or volume); on a dynamic-cred **rotation** VSO restarts the `rolloutRestartTargets` Deployment so it picks up the new credentials.
|
|
5. The **`transit`** key `vso-client-cache` encrypts VSO's client cache so an operator restart doesn't trigger a re-auth storm.
|
|
6. On the CI side, a **Gitea Actions** OpenTofu job logs into the **`gitea_jwt`** backend as `gitea_cicd_<app>` (audience = the Gitea OAuth app id, identity from the `email` claim).
|
|
7. Running under the `<app>-ops` policy, that job **applies the `app_roles` module**, creating/updating the Kubernetes auth role and the Postgres dynamic role for `<app>` — closing the loop so the runtime path in steps 2-4 works.
|
|
|
|
---
|
|
|
|
## Gotchas
|
|
|
|
- **Vault must be unsealed after every restart.** Sealed Vault → all VSO reads fail → dynamic-secret consumers won't start. See [storage-and-recovery](../lab-ecosystem/storage-and-recovery.md).
|
|
- **The Kubernetes auth role binds SA *and* namespace (AND).** The wrong namespace, or a different SA in the right namespace, is rejected. Apps in ns `tools` (CrowdSec, Plausible) widen the binding via `service_account_namespaces`.
|
|
- **The Postgres dynamic role depends on `<app>_role` existing.** `GRANT <app>_role TO {{name}}` (create) and `REASSIGN OWNED BY {{name}} TO <app>_role` (revoke) both fail if factory's [postgres IaC](../factory-provisioning/opentofu/postgres-iac.md) hasn't created the `<app>_role` non-login role first. Order: **factory postgres/iac → tools app_roles**.
|
|
- **The `ops_policies` vs `policies` key mismatch** in `terraform.tfvars` / `variables.tf` (see §2) — read both when adding an app's extra ops policy.
|
|
- **The sandbox uses a separate Vault.** Per the [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md), the prod-like sandbox stands up its own Vault instance; none of the paths or roles above are shared with it. Don't assume a secret seeded in prod exists in the sandbox.
|