docs(vibe): bootstrap vibe/ knowledge tree + ecosystem AGENTS.md

Add a root AGENTS.md (ecosystem map of factory/tools/cms + agent operating rules + the persona cohort & workflow) and a new vibe/ knowledge base for LLM agents, modeled on tree-docs conventions and the factory house style. vibe/ folders (each with a README hub + contribution rules): - ADR/ optimized MADR-lite; canonical home going forward (doc/adr stays historical) - PRD/ one subfolder per PRD, mandatory STATUS.md, QA strategy for big ones - investigations/ single INV-NNN-slug.md, or stub + folder w/ notebooks - guidebooks/ tree-docs maps; lab-ecosystem guidebook of factory+tools+cms - runbooks/ [AGENT]/[HUMAN] step procedures (EN; doc/runbooks stays FR) - shareouts/ dated FR handouts (decks/mp4) Seed content (first ADR + PRD): a safe, production-like environment to rehearse risky changes and recovery without touching real prod — local-only sandbox (k3d + arm64 VMs) with a hard prod/sandbox isolation boundary. Includes INV-001 (prod blast-radius couplings), the ecosystem guidebook, and a FR shareout. Conventions enforced: no-tombstone rule, breadcrumb spine, bidirectional cross-links, theme:base mermaid (MCP-validated) + ordered-list-after-diagram. Built with a Workflow + persona cohort; 24 files, zero dead links. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 11:52:37 +02:00
parent 827af6b392
commit 7647a68cdc
25 changed files with 1878 additions and 0 deletions
--- a/vibe/guidebooks/lab-ecosystem/secrets-and-vault.md
+++ b/vibe/guidebooks/lab-ecosystem/secrets-and-vault.md
@@ -0,0 +1,110 @@
+[vibe](../../README.md) > [Guidebooks](../README.md) > [Lab ecosystem](README.md) > **Secrets & Vault**
+
+# Secrets & Vault
+
+> **Status**: 🟢 Active
+> **Last Updated**: 2026-06-23
+> **Related**: [Lab ecosystem](README.md) · [Tools brick](02-tools.md) · [Storage & recovery](storage-and-recovery.md) · [Naming conventions](naming-conventions.md)
+> **Decision**: [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md)
+
+## TL;DR
+
+**HashiCorp Vault is the single source of truth for every secret in the lab.** There is no sops, no age, no secret files in git — if a credential exists, Vault either stores it or mints it on demand. Two parties consume secrets, and each authenticates a different way: **pods** use the Kubernetes auth backend (via the Vault Secrets Operator), and **CI / OpenTofu** use Gitea OIDC JWT (one role `gitea_cicd_<app>` per app). Vault holds static config in KV, encryption keys in transit, and issues **short-lived, dynamic** PostgreSQL credentials so no long-lived DB password is ever written down. The trade-off: Vault is sealed on every restart and must be **manually unsealed** (1 key, threshold 1) before anything that needs a secret can come back.
+
+## Why Vault, and only Vault
+
+The lab made a deliberate choice: **one** secret store, accessed over the network, rather than encrypted secret files scattered through the repos. The consequences are structuring:
+
+- **No secret material in git.** Charts and OpenTofu reference Vault *paths*, never values. A leaked repo leaks no credentials.
+- **One revocation point.** Rotating or revoking a credential happens in Vault; consumers pick up the change on their next read or lease renewal.
+- **Dynamic over static.** Where a backend supports it (Postgres), Vault issues a fresh, time-boxed credential per consumer instead of a shared static password.
+
+Vault itself runs as the `hashicorp-vault` chart in the **tools** namespace. Its full configuration — engines, auth backends, policies, the per-app role/policy modules — lives in the tools repo; see the [Tools brick](02-tools.md) for the deployment context.
+
+## What Vault mounts
+
+| Mount | Type | Purpose |
+| --- | --- | --- |
+| `kvv2/` | KV v2 (versioned) | Application static config, e.g. `kvv2/<app>/config`. Versioned so a bad write can be rolled back. |
+| KV v1 | KV v1 (unversioned) | Flat secrets that don't need history. |
+| `transit/` | Transit | Encryption-as-a-service: encrypt/decrypt and sign without exposing the key. |
+| `postgres/` | Database (dynamic) | Issues **short-lived** PostgreSQL credentials on demand: `postgres/creds/<app>` hands out a fresh login user, granted `<app>_role`, with a lease that expires. |
+
+The `<app>` slug threads through every one of these paths — `kvv2/<app>/config`, `postgres/creds/<app>` — exactly as described in [Naming conventions](naming-conventions.md).
+
+## The two auth backends
+
+Vault doesn't trust callers by static token. Each class of consumer proves its identity through a backend matched to where it runs:
+
+- **Kubernetes auth** — for **pods**. The Vault Secrets Operator (VSO) and workloads present their Kubernetes ServiceAccount token; Vault validates it against the cluster's API and maps the SA to the Vault role `<app>`, which carries the runtime policy `<app>`.
+- **Gitea OIDC / JWT auth** — for **CI and OpenTofu**. A Gitea Actions workflow obtains an OIDC token; Vault validates it and maps it to the JWT role `gitea_cicd_<app>`, which carries the CI/ops policy `<app>-ops`. This is how `tofu apply` in CI reads and writes the secrets it manages without any pre-shared Vault token.
+
+The split matters: pods get only what they need at runtime (the `<app>` policy), while CI gets the broader provisioning rights (`<app>-ops`) needed to *create* the very secrets the pods will later read.
+
+## How VSO delivers secrets to pods
+
+Inside the cluster, the **Vault Secrets Operator** is the bridge between Vault and Kubernetes. It watches two CRDs:
+
+- **`VaultAuth`** — declares *how* to authenticate to Vault (the Kubernetes auth mount + the `<app>` role).
+- **`VaultDynamicSecret`** (and `VaultStaticSecret`) — declares *what* to fetch (e.g. `postgres/creds/<app>`) and which Kubernetes Secret to materialise it into. For dynamic secrets, VSO also **renews the lease** and rotates the Secret before it expires.
+
+The pod then mounts the resulting Kubernetes Secret as it would any other — it never speaks to Vault directly, and never sees a static DB password.
+
+## The secret flow, end to end
+
+```mermaid
+%%{init: {'theme':'base'}}%%
+flowchart LR
+    subgraph CI["CI / Provisioning path"]
+        GHA["Gitea Actions<br/>workflow"]:::src
+        TOFU["OpenTofu<br/>tofu apply"]:::proc
+    end
+
+    subgraph RT["Runtime path (in-cluster)"]
+        VSO["Vault Secrets<br/>Operator (VSO)"]:::proc
+        POD["App pod<br/>(ServiceAccount &lt;app&gt;)"]:::proc
+    end
+
+    VAULT["Vault<br/>KV v1/v2 · transit · postgres dynamic"]:::store
+
+    GHA -->|"OIDC JWT<br/>role gitea_cicd_&lt;app&gt;"| VAULT
+    VAULT -->|"policy &lt;app&gt;-ops<br/>read/write secrets"| TOFU
+    TOFU -->|"writes config to<br/>kvv2/&lt;app&gt;/config"| VAULT
+
+    VSO -->|"k8s auth<br/>role &lt;app&gt; (SA token)"| VAULT
+    VAULT -->|"dynamic creds<br/>postgres/creds/&lt;app&gt;"| VSO
+    VSO -->|"materialises +<br/>renews K8s Secret"| POD
+
+    classDef src fill:#2563eb,stroke:#1e40af,color:#fff
+    classDef proc fill:#059669,stroke:#047857,color:#fff
+    classDef store fill:#7c3aed,stroke:#6d28d9,color:#fff
+```
+
+1. **CI path:** a Gitea Actions workflow requests an OIDC JWT and presents it to Vault under the role `gitea_cicd_<app>`. Vault validates the token and grants the `<app>-ops` policy.
+2. With that policy, OpenTofu (`tofu apply`, running in CI) reads the secrets it needs and writes the app's static config back to `kvv2/<app>/config`. No pre-shared Vault token is ever stored — the trust is established per-run via OIDC.
+3. **Runtime path:** in the cluster, the Vault Secrets Operator authenticates with the Kubernetes auth backend, presenting the app's ServiceAccount token mapped to the Vault role `<app>`.
+4. Vault issues a **short-lived, dynamic** PostgreSQL credential from `postgres/creds/<app>` back to VSO.
+5. VSO materialises that credential into a Kubernetes Secret in the app's namespace, then **renews the lease** and rotates the Secret before it expires.
+6. The app pod mounts the Kubernetes Secret like any other — it never talks to Vault, and never holds a long-lived database password.
+
+## The unseal model
+
+Vault encrypts its storage with a master key that is **never persisted in usable form**. On every start — a fresh deploy, a pod reschedule, or a full cluster recovery — Vault comes up **sealed** and refuses every request until it is unsealed.
+
+- **Shamir config:** 1 unseal key, threshold 1 (a single-operator lab, so no key-splitting ceremony).
+- **Where the key lives:** on the control node (the MacBook), at `~/.arcodange/cluster-keys.json`. It is *not* in git, *not* in Kubernetes, *not* in Vault.
+- **Operational consequence:** **nothing that needs a secret recovers until a human unseals Vault.** This is the chokepoint baked into the recovery order — VSO cannot re-auth, dynamic DB creds cannot be issued, and dependent apps cannot start, until the unseal happens. See [Storage & recovery](storage-and-recovery.md) for where unseal sits in the tested startup sequence.
+
+> [!CAUTION]
+> If `~/.arcodange/cluster-keys.json` is lost, Vault's data is **unrecoverable** — there is no second copy of the unseal key and no key-recovery path. Treat that file as the most critical secret in the lab.
+
+## Sandbox implications
+
+A production-like sandbox does **not** share the production Vault. It runs its **own** Vault instance with its **own** unseal key and its **own** policies, so that exercising secret flows, rotating credentials, or testing a broken unseal cannot touch production secrets. Because the `<app>` join key is environment-relative (see [Naming conventions](naming-conventions.md)), the sandbox can keep identical role and policy names — `gitea_cicd_<app>`, `<app>`, `<app>-ops` — while remaining fully isolated. The rationale for that separate-Vault, separate-unseal posture is recorded in [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md).
+
+## See also
+
+- [Tools brick](02-tools.md) — where the `hashicorp-vault` chart, VSO, and the per-app Vault IaC modules are deployed.
+- [Storage & recovery](storage-and-recovery.md) — Vault unseal as a step in the tested power-cut recovery order.
+- [Naming conventions](naming-conventions.md) — how `gitea_cicd_<app>`, `<app>`, and `<app>-ops` derive from the join key.
+- [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md) — the sandbox's separate-Vault decision.