Files
factory/vibe/guidebooks/lab-ecosystem/secrets-and-vault.md
Gabriel Radureau 7647a68cdc docs(vibe): bootstrap vibe/ knowledge tree + ecosystem AGENTS.md
Add a root AGENTS.md (ecosystem map of factory/tools/cms + agent operating
rules + the persona cohort & workflow) and a new vibe/ knowledge base for LLM
agents, modeled on tree-docs conventions and the factory house style.

vibe/ folders (each with a README hub + contribution rules):
- ADR/      optimized MADR-lite; canonical home going forward (doc/adr stays historical)
- PRD/      one subfolder per PRD, mandatory STATUS.md, QA strategy for big ones
- investigations/  single INV-NNN-slug.md, or stub + folder w/ notebooks
- guidebooks/      tree-docs maps; lab-ecosystem guidebook of factory+tools+cms
- runbooks/        [AGENT]/[HUMAN] step procedures (EN; doc/runbooks stays FR)
- shareouts/       dated FR handouts (decks/mp4)

Seed content (first ADR + PRD): a safe, production-like environment to rehearse
risky changes and recovery without touching real prod — local-only sandbox
(k3d + arm64 VMs) with a hard prod/sandbox isolation boundary. Includes
INV-001 (prod blast-radius couplings), the ecosystem guidebook, and a FR shareout.

Conventions enforced: no-tombstone rule, breadcrumb spine, bidirectional
cross-links, theme:base mermaid (MCP-validated) + ordered-list-after-diagram.
Built with a Workflow + persona cohort; 24 files, zero dead links.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 11:52:37 +02:00

8.7 KiB

vibe > Guidebooks > Lab ecosystem > Secrets & Vault

Secrets & Vault

Status: 🟢 Active Last Updated: 2026-06-23 Related: Lab ecosystem · Tools brick · Storage & recovery · Naming conventions Decision: ADR 0001 — Safe, production-like environment

TL;DR

HashiCorp Vault is the single source of truth for every secret in the lab. There is no sops, no age, no secret files in git — if a credential exists, Vault either stores it or mints it on demand. Two parties consume secrets, and each authenticates a different way: pods use the Kubernetes auth backend (via the Vault Secrets Operator), and CI / OpenTofu use Gitea OIDC JWT (one role gitea_cicd_<app> per app). Vault holds static config in KV, encryption keys in transit, and issues short-lived, dynamic PostgreSQL credentials so no long-lived DB password is ever written down. The trade-off: Vault is sealed on every restart and must be manually unsealed (1 key, threshold 1) before anything that needs a secret can come back.

Why Vault, and only Vault

The lab made a deliberate choice: one secret store, accessed over the network, rather than encrypted secret files scattered through the repos. The consequences are structuring:

  • No secret material in git. Charts and OpenTofu reference Vault paths, never values. A leaked repo leaks no credentials.
  • One revocation point. Rotating or revoking a credential happens in Vault; consumers pick up the change on their next read or lease renewal.
  • Dynamic over static. Where a backend supports it (Postgres), Vault issues a fresh, time-boxed credential per consumer instead of a shared static password.

Vault itself runs as the hashicorp-vault chart in the tools namespace. Its full configuration — engines, auth backends, policies, the per-app role/policy modules — lives in the tools repo; see the Tools brick for the deployment context.

What Vault mounts

Mount Type Purpose
kvv2/ KV v2 (versioned) Application static config, e.g. kvv2/<app>/config. Versioned so a bad write can be rolled back.
KV v1 KV v1 (unversioned) Flat secrets that don't need history.
transit/ Transit Encryption-as-a-service: encrypt/decrypt and sign without exposing the key.
postgres/ Database (dynamic) Issues short-lived PostgreSQL credentials on demand: postgres/creds/<app> hands out a fresh login user, granted <app>_role, with a lease that expires.

The <app> slug threads through every one of these paths — kvv2/<app>/config, postgres/creds/<app> — exactly as described in Naming conventions.

The two auth backends

Vault doesn't trust callers by static token. Each class of consumer proves its identity through a backend matched to where it runs:

  • Kubernetes auth — for pods. The Vault Secrets Operator (VSO) and workloads present their Kubernetes ServiceAccount token; Vault validates it against the cluster's API and maps the SA to the Vault role <app>, which carries the runtime policy <app>.
  • Gitea OIDC / JWT auth — for CI and OpenTofu. A Gitea Actions workflow obtains an OIDC token; Vault validates it and maps it to the JWT role gitea_cicd_<app>, which carries the CI/ops policy <app>-ops. This is how tofu apply in CI reads and writes the secrets it manages without any pre-shared Vault token.

The split matters: pods get only what they need at runtime (the <app> policy), while CI gets the broader provisioning rights (<app>-ops) needed to create the very secrets the pods will later read.

How VSO delivers secrets to pods

Inside the cluster, the Vault Secrets Operator is the bridge between Vault and Kubernetes. It watches two CRDs:

  • VaultAuth — declares how to authenticate to Vault (the Kubernetes auth mount + the <app> role).
  • VaultDynamicSecret (and VaultStaticSecret) — declares what to fetch (e.g. postgres/creds/<app>) and which Kubernetes Secret to materialise it into. For dynamic secrets, VSO also renews the lease and rotates the Secret before it expires.

The pod then mounts the resulting Kubernetes Secret as it would any other — it never speaks to Vault directly, and never sees a static DB password.

The secret flow, end to end

%%{init: {'theme':'base'}}%%
flowchart LR
    subgraph CI["CI / Provisioning path"]
        GHA["Gitea Actions<br/>workflow"]:::src
        TOFU["OpenTofu<br/>tofu apply"]:::proc
    end

    subgraph RT["Runtime path (in-cluster)"]
        VSO["Vault Secrets<br/>Operator (VSO)"]:::proc
        POD["App pod<br/>(ServiceAccount &lt;app&gt;)"]:::proc
    end

    VAULT["Vault<br/>KV v1/v2 · transit · postgres dynamic"]:::store

    GHA -->|"OIDC JWT<br/>role gitea_cicd_&lt;app&gt;"| VAULT
    VAULT -->|"policy &lt;app&gt;-ops<br/>read/write secrets"| TOFU
    TOFU -->|"writes config to<br/>kvv2/&lt;app&gt;/config"| VAULT

    VSO -->|"k8s auth<br/>role &lt;app&gt; (SA token)"| VAULT
    VAULT -->|"dynamic creds<br/>postgres/creds/&lt;app&gt;"| VSO
    VSO -->|"materialises +<br/>renews K8s Secret"| POD

    classDef src fill:#2563eb,stroke:#1e40af,color:#fff
    classDef proc fill:#059669,stroke:#047857,color:#fff
    classDef store fill:#7c3aed,stroke:#6d28d9,color:#fff
  1. CI path: a Gitea Actions workflow requests an OIDC JWT and presents it to Vault under the role gitea_cicd_<app>. Vault validates the token and grants the <app>-ops policy.
  2. With that policy, OpenTofu (tofu apply, running in CI) reads the secrets it needs and writes the app's static config back to kvv2/<app>/config. No pre-shared Vault token is ever stored — the trust is established per-run via OIDC.
  3. Runtime path: in the cluster, the Vault Secrets Operator authenticates with the Kubernetes auth backend, presenting the app's ServiceAccount token mapped to the Vault role <app>.
  4. Vault issues a short-lived, dynamic PostgreSQL credential from postgres/creds/<app> back to VSO.
  5. VSO materialises that credential into a Kubernetes Secret in the app's namespace, then renews the lease and rotates the Secret before it expires.
  6. The app pod mounts the Kubernetes Secret like any other — it never talks to Vault, and never holds a long-lived database password.

The unseal model

Vault encrypts its storage with a master key that is never persisted in usable form. On every start — a fresh deploy, a pod reschedule, or a full cluster recovery — Vault comes up sealed and refuses every request until it is unsealed.

  • Shamir config: 1 unseal key, threshold 1 (a single-operator lab, so no key-splitting ceremony).
  • Where the key lives: on the control node (the MacBook), at ~/.arcodange/cluster-keys.json. It is not in git, not in Kubernetes, not in Vault.
  • Operational consequence: nothing that needs a secret recovers until a human unseals Vault. This is the chokepoint baked into the recovery order — VSO cannot re-auth, dynamic DB creds cannot be issued, and dependent apps cannot start, until the unseal happens. See Storage & recovery for where unseal sits in the tested startup sequence.

Caution

If ~/.arcodange/cluster-keys.json is lost, Vault's data is unrecoverable — there is no second copy of the unseal key and no key-recovery path. Treat that file as the most critical secret in the lab.

Sandbox implications

A production-like sandbox does not share the production Vault. It runs its own Vault instance with its own unseal key and its own policies, so that exercising secret flows, rotating credentials, or testing a broken unseal cannot touch production secrets. Because the <app> join key is environment-relative (see Naming conventions), the sandbox can keep identical role and policy names — gitea_cicd_<app>, <app>, <app>-ops — while remaining fully isolated. The rationale for that separate-Vault, separate-unseal posture is recorded in ADR 0001 — Safe, production-like environment.

See also