Files
factory/vibe/guidebooks/lab-ecosystem/secrets-and-vault.md
Gabriel Radureau 7647a68cdc docs(vibe): bootstrap vibe/ knowledge tree + ecosystem AGENTS.md
Add a root AGENTS.md (ecosystem map of factory/tools/cms + agent operating
rules + the persona cohort & workflow) and a new vibe/ knowledge base for LLM
agents, modeled on tree-docs conventions and the factory house style.

vibe/ folders (each with a README hub + contribution rules):
- ADR/      optimized MADR-lite; canonical home going forward (doc/adr stays historical)
- PRD/      one subfolder per PRD, mandatory STATUS.md, QA strategy for big ones
- investigations/  single INV-NNN-slug.md, or stub + folder w/ notebooks
- guidebooks/      tree-docs maps; lab-ecosystem guidebook of factory+tools+cms
- runbooks/        [AGENT]/[HUMAN] step procedures (EN; doc/runbooks stays FR)
- shareouts/       dated FR handouts (decks/mp4)

Seed content (first ADR + PRD): a safe, production-like environment to rehearse
risky changes and recovery without touching real prod — local-only sandbox
(k3d + arm64 VMs) with a hard prod/sandbox isolation boundary. Includes
INV-001 (prod blast-radius couplings), the ecosystem guidebook, and a FR shareout.

Conventions enforced: no-tombstone rule, breadcrumb spine, bidirectional
cross-links, theme:base mermaid (MCP-validated) + ordered-list-after-diagram.
Built with a Workflow + persona cohort; 24 files, zero dead links.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 11:52:37 +02:00

111 lines
8.7 KiB
Markdown

[vibe](../../README.md) > [Guidebooks](../README.md) > [Lab ecosystem](README.md) > **Secrets & Vault**
# Secrets & Vault
> **Status**: 🟢 Active
> **Last Updated**: 2026-06-23
> **Related**: [Lab ecosystem](README.md) · [Tools brick](02-tools.md) · [Storage & recovery](storage-and-recovery.md) · [Naming conventions](naming-conventions.md)
> **Decision**: [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md)
## TL;DR
**HashiCorp Vault is the single source of truth for every secret in the lab.** There is no sops, no age, no secret files in git — if a credential exists, Vault either stores it or mints it on demand. Two parties consume secrets, and each authenticates a different way: **pods** use the Kubernetes auth backend (via the Vault Secrets Operator), and **CI / OpenTofu** use Gitea OIDC JWT (one role `gitea_cicd_<app>` per app). Vault holds static config in KV, encryption keys in transit, and issues **short-lived, dynamic** PostgreSQL credentials so no long-lived DB password is ever written down. The trade-off: Vault is sealed on every restart and must be **manually unsealed** (1 key, threshold 1) before anything that needs a secret can come back.
## Why Vault, and only Vault
The lab made a deliberate choice: **one** secret store, accessed over the network, rather than encrypted secret files scattered through the repos. The consequences are structuring:
- **No secret material in git.** Charts and OpenTofu reference Vault *paths*, never values. A leaked repo leaks no credentials.
- **One revocation point.** Rotating or revoking a credential happens in Vault; consumers pick up the change on their next read or lease renewal.
- **Dynamic over static.** Where a backend supports it (Postgres), Vault issues a fresh, time-boxed credential per consumer instead of a shared static password.
Vault itself runs as the `hashicorp-vault` chart in the **tools** namespace. Its full configuration — engines, auth backends, policies, the per-app role/policy modules — lives in the tools repo; see the [Tools brick](02-tools.md) for the deployment context.
## What Vault mounts
| Mount | Type | Purpose |
| --- | --- | --- |
| `kvv2/` | KV v2 (versioned) | Application static config, e.g. `kvv2/<app>/config`. Versioned so a bad write can be rolled back. |
| KV v1 | KV v1 (unversioned) | Flat secrets that don't need history. |
| `transit/` | Transit | Encryption-as-a-service: encrypt/decrypt and sign without exposing the key. |
| `postgres/` | Database (dynamic) | Issues **short-lived** PostgreSQL credentials on demand: `postgres/creds/<app>` hands out a fresh login user, granted `<app>_role`, with a lease that expires. |
The `<app>` slug threads through every one of these paths — `kvv2/<app>/config`, `postgres/creds/<app>` — exactly as described in [Naming conventions](naming-conventions.md).
## The two auth backends
Vault doesn't trust callers by static token. Each class of consumer proves its identity through a backend matched to where it runs:
- **Kubernetes auth** — for **pods**. The Vault Secrets Operator (VSO) and workloads present their Kubernetes ServiceAccount token; Vault validates it against the cluster's API and maps the SA to the Vault role `<app>`, which carries the runtime policy `<app>`.
- **Gitea OIDC / JWT auth** — for **CI and OpenTofu**. A Gitea Actions workflow obtains an OIDC token; Vault validates it and maps it to the JWT role `gitea_cicd_<app>`, which carries the CI/ops policy `<app>-ops`. This is how `tofu apply` in CI reads and writes the secrets it manages without any pre-shared Vault token.
The split matters: pods get only what they need at runtime (the `<app>` policy), while CI gets the broader provisioning rights (`<app>-ops`) needed to *create* the very secrets the pods will later read.
## How VSO delivers secrets to pods
Inside the cluster, the **Vault Secrets Operator** is the bridge between Vault and Kubernetes. It watches two CRDs:
- **`VaultAuth`** — declares *how* to authenticate to Vault (the Kubernetes auth mount + the `<app>` role).
- **`VaultDynamicSecret`** (and `VaultStaticSecret`) — declares *what* to fetch (e.g. `postgres/creds/<app>`) and which Kubernetes Secret to materialise it into. For dynamic secrets, VSO also **renews the lease** and rotates the Secret before it expires.
The pod then mounts the resulting Kubernetes Secret as it would any other — it never speaks to Vault directly, and never sees a static DB password.
## The secret flow, end to end
```mermaid
%%{init: {'theme':'base'}}%%
flowchart LR
subgraph CI["CI / Provisioning path"]
GHA["Gitea Actions<br/>workflow"]:::src
TOFU["OpenTofu<br/>tofu apply"]:::proc
end
subgraph RT["Runtime path (in-cluster)"]
VSO["Vault Secrets<br/>Operator (VSO)"]:::proc
POD["App pod<br/>(ServiceAccount &lt;app&gt;)"]:::proc
end
VAULT["Vault<br/>KV v1/v2 · transit · postgres dynamic"]:::store
GHA -->|"OIDC JWT<br/>role gitea_cicd_&lt;app&gt;"| VAULT
VAULT -->|"policy &lt;app&gt;-ops<br/>read/write secrets"| TOFU
TOFU -->|"writes config to<br/>kvv2/&lt;app&gt;/config"| VAULT
VSO -->|"k8s auth<br/>role &lt;app&gt; (SA token)"| VAULT
VAULT -->|"dynamic creds<br/>postgres/creds/&lt;app&gt;"| VSO
VSO -->|"materialises +<br/>renews K8s Secret"| POD
classDef src fill:#2563eb,stroke:#1e40af,color:#fff
classDef proc fill:#059669,stroke:#047857,color:#fff
classDef store fill:#7c3aed,stroke:#6d28d9,color:#fff
```
1. **CI path:** a Gitea Actions workflow requests an OIDC JWT and presents it to Vault under the role `gitea_cicd_<app>`. Vault validates the token and grants the `<app>-ops` policy.
2. With that policy, OpenTofu (`tofu apply`, running in CI) reads the secrets it needs and writes the app's static config back to `kvv2/<app>/config`. No pre-shared Vault token is ever stored — the trust is established per-run via OIDC.
3. **Runtime path:** in the cluster, the Vault Secrets Operator authenticates with the Kubernetes auth backend, presenting the app's ServiceAccount token mapped to the Vault role `<app>`.
4. Vault issues a **short-lived, dynamic** PostgreSQL credential from `postgres/creds/<app>` back to VSO.
5. VSO materialises that credential into a Kubernetes Secret in the app's namespace, then **renews the lease** and rotates the Secret before it expires.
6. The app pod mounts the Kubernetes Secret like any other — it never talks to Vault, and never holds a long-lived database password.
## The unseal model
Vault encrypts its storage with a master key that is **never persisted in usable form**. On every start — a fresh deploy, a pod reschedule, or a full cluster recovery — Vault comes up **sealed** and refuses every request until it is unsealed.
- **Shamir config:** 1 unseal key, threshold 1 (a single-operator lab, so no key-splitting ceremony).
- **Where the key lives:** on the control node (the MacBook), at `~/.arcodange/cluster-keys.json`. It is *not* in git, *not* in Kubernetes, *not* in Vault.
- **Operational consequence:** **nothing that needs a secret recovers until a human unseals Vault.** This is the chokepoint baked into the recovery order — VSO cannot re-auth, dynamic DB creds cannot be issued, and dependent apps cannot start, until the unseal happens. See [Storage & recovery](storage-and-recovery.md) for where unseal sits in the tested startup sequence.
> [!CAUTION]
> If `~/.arcodange/cluster-keys.json` is lost, Vault's data is **unrecoverable** — there is no second copy of the unseal key and no key-recovery path. Treat that file as the most critical secret in the lab.
## Sandbox implications
A production-like sandbox does **not** share the production Vault. It runs its **own** Vault instance with its **own** unseal key and its **own** policies, so that exercising secret flows, rotating credentials, or testing a broken unseal cannot touch production secrets. Because the `<app>` join key is environment-relative (see [Naming conventions](naming-conventions.md)), the sandbox can keep identical role and policy names — `gitea_cicd_<app>`, `<app>`, `<app>-ops` — while remaining fully isolated. The rationale for that separate-Vault, separate-unseal posture is recorded in [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md).
## See also
- [Tools brick](02-tools.md) — where the `hashicorp-vault` chart, VSO, and the per-app Vault IaC modules are deployed.
- [Storage & recovery](storage-and-recovery.md) — Vault unseal as a step in the tested power-cut recovery order.
- [Naming conventions](naming-conventions.md) — how `gitea_cicd_<app>`, `<app>`, and `<app>-ops` derive from the join key.
- [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md) — the sandbox's separate-Vault decision.