docs(vibe): bootstrap vibe/ knowledge tree + ecosystem AGENTS.md
Add a root AGENTS.md (ecosystem map of factory/tools/cms + agent operating rules + the persona cohort & workflow) and a new vibe/ knowledge base for LLM agents, modeled on tree-docs conventions and the factory house style. vibe/ folders (each with a README hub + contribution rules): - ADR/ optimized MADR-lite; canonical home going forward (doc/adr stays historical) - PRD/ one subfolder per PRD, mandatory STATUS.md, QA strategy for big ones - investigations/ single INV-NNN-slug.md, or stub + folder w/ notebooks - guidebooks/ tree-docs maps; lab-ecosystem guidebook of factory+tools+cms - runbooks/ [AGENT]/[HUMAN] step procedures (EN; doc/runbooks stays FR) - shareouts/ dated FR handouts (decks/mp4) Seed content (first ADR + PRD): a safe, production-like environment to rehearse risky changes and recovery without touching real prod — local-only sandbox (k3d + arm64 VMs) with a hard prod/sandbox isolation boundary. Includes INV-001 (prod blast-radius couplings), the ecosystem guidebook, and a FR shareout. Conventions enforced: no-tombstone rule, breadcrumb spine, bidirectional cross-links, theme:base mermaid (MCP-validated) + ordered-list-after-diagram. Built with a Workflow + persona cohort; 24 files, zero dead links. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
110
vibe/guidebooks/lab-ecosystem/secrets-and-vault.md
Normal file
110
vibe/guidebooks/lab-ecosystem/secrets-and-vault.md
Normal file
@@ -0,0 +1,110 @@
|
||||
[vibe](../../README.md) > [Guidebooks](../README.md) > [Lab ecosystem](README.md) > **Secrets & Vault**
|
||||
|
||||
# Secrets & Vault
|
||||
|
||||
> **Status**: 🟢 Active
|
||||
> **Last Updated**: 2026-06-23
|
||||
> **Related**: [Lab ecosystem](README.md) · [Tools brick](02-tools.md) · [Storage & recovery](storage-and-recovery.md) · [Naming conventions](naming-conventions.md)
|
||||
> **Decision**: [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md)
|
||||
|
||||
## TL;DR
|
||||
|
||||
**HashiCorp Vault is the single source of truth for every secret in the lab.** There is no sops, no age, no secret files in git — if a credential exists, Vault either stores it or mints it on demand. Two parties consume secrets, and each authenticates a different way: **pods** use the Kubernetes auth backend (via the Vault Secrets Operator), and **CI / OpenTofu** use Gitea OIDC JWT (one role `gitea_cicd_<app>` per app). Vault holds static config in KV, encryption keys in transit, and issues **short-lived, dynamic** PostgreSQL credentials so no long-lived DB password is ever written down. The trade-off: Vault is sealed on every restart and must be **manually unsealed** (1 key, threshold 1) before anything that needs a secret can come back.
|
||||
|
||||
## Why Vault, and only Vault
|
||||
|
||||
The lab made a deliberate choice: **one** secret store, accessed over the network, rather than encrypted secret files scattered through the repos. The consequences are structuring:
|
||||
|
||||
- **No secret material in git.** Charts and OpenTofu reference Vault *paths*, never values. A leaked repo leaks no credentials.
|
||||
- **One revocation point.** Rotating or revoking a credential happens in Vault; consumers pick up the change on their next read or lease renewal.
|
||||
- **Dynamic over static.** Where a backend supports it (Postgres), Vault issues a fresh, time-boxed credential per consumer instead of a shared static password.
|
||||
|
||||
Vault itself runs as the `hashicorp-vault` chart in the **tools** namespace. Its full configuration — engines, auth backends, policies, the per-app role/policy modules — lives in the tools repo; see the [Tools brick](02-tools.md) for the deployment context.
|
||||
|
||||
## What Vault mounts
|
||||
|
||||
| Mount | Type | Purpose |
|
||||
| --- | --- | --- |
|
||||
| `kvv2/` | KV v2 (versioned) | Application static config, e.g. `kvv2/<app>/config`. Versioned so a bad write can be rolled back. |
|
||||
| KV v1 | KV v1 (unversioned) | Flat secrets that don't need history. |
|
||||
| `transit/` | Transit | Encryption-as-a-service: encrypt/decrypt and sign without exposing the key. |
|
||||
| `postgres/` | Database (dynamic) | Issues **short-lived** PostgreSQL credentials on demand: `postgres/creds/<app>` hands out a fresh login user, granted `<app>_role`, with a lease that expires. |
|
||||
|
||||
The `<app>` slug threads through every one of these paths — `kvv2/<app>/config`, `postgres/creds/<app>` — exactly as described in [Naming conventions](naming-conventions.md).
|
||||
|
||||
## The two auth backends
|
||||
|
||||
Vault doesn't trust callers by static token. Each class of consumer proves its identity through a backend matched to where it runs:
|
||||
|
||||
- **Kubernetes auth** — for **pods**. The Vault Secrets Operator (VSO) and workloads present their Kubernetes ServiceAccount token; Vault validates it against the cluster's API and maps the SA to the Vault role `<app>`, which carries the runtime policy `<app>`.
|
||||
- **Gitea OIDC / JWT auth** — for **CI and OpenTofu**. A Gitea Actions workflow obtains an OIDC token; Vault validates it and maps it to the JWT role `gitea_cicd_<app>`, which carries the CI/ops policy `<app>-ops`. This is how `tofu apply` in CI reads and writes the secrets it manages without any pre-shared Vault token.
|
||||
|
||||
The split matters: pods get only what they need at runtime (the `<app>` policy), while CI gets the broader provisioning rights (`<app>-ops`) needed to *create* the very secrets the pods will later read.
|
||||
|
||||
## How VSO delivers secrets to pods
|
||||
|
||||
Inside the cluster, the **Vault Secrets Operator** is the bridge between Vault and Kubernetes. It watches two CRDs:
|
||||
|
||||
- **`VaultAuth`** — declares *how* to authenticate to Vault (the Kubernetes auth mount + the `<app>` role).
|
||||
- **`VaultDynamicSecret`** (and `VaultStaticSecret`) — declares *what* to fetch (e.g. `postgres/creds/<app>`) and which Kubernetes Secret to materialise it into. For dynamic secrets, VSO also **renews the lease** and rotates the Secret before it expires.
|
||||
|
||||
The pod then mounts the resulting Kubernetes Secret as it would any other — it never speaks to Vault directly, and never sees a static DB password.
|
||||
|
||||
## The secret flow, end to end
|
||||
|
||||
```mermaid
|
||||
%%{init: {'theme':'base'}}%%
|
||||
flowchart LR
|
||||
subgraph CI["CI / Provisioning path"]
|
||||
GHA["Gitea Actions<br/>workflow"]:::src
|
||||
TOFU["OpenTofu<br/>tofu apply"]:::proc
|
||||
end
|
||||
|
||||
subgraph RT["Runtime path (in-cluster)"]
|
||||
VSO["Vault Secrets<br/>Operator (VSO)"]:::proc
|
||||
POD["App pod<br/>(ServiceAccount <app>)"]:::proc
|
||||
end
|
||||
|
||||
VAULT["Vault<br/>KV v1/v2 · transit · postgres dynamic"]:::store
|
||||
|
||||
GHA -->|"OIDC JWT<br/>role gitea_cicd_<app>"| VAULT
|
||||
VAULT -->|"policy <app>-ops<br/>read/write secrets"| TOFU
|
||||
TOFU -->|"writes config to<br/>kvv2/<app>/config"| VAULT
|
||||
|
||||
VSO -->|"k8s auth<br/>role <app> (SA token)"| VAULT
|
||||
VAULT -->|"dynamic creds<br/>postgres/creds/<app>"| VSO
|
||||
VSO -->|"materialises +<br/>renews K8s Secret"| POD
|
||||
|
||||
classDef src fill:#2563eb,stroke:#1e40af,color:#fff
|
||||
classDef proc fill:#059669,stroke:#047857,color:#fff
|
||||
classDef store fill:#7c3aed,stroke:#6d28d9,color:#fff
|
||||
```
|
||||
|
||||
1. **CI path:** a Gitea Actions workflow requests an OIDC JWT and presents it to Vault under the role `gitea_cicd_<app>`. Vault validates the token and grants the `<app>-ops` policy.
|
||||
2. With that policy, OpenTofu (`tofu apply`, running in CI) reads the secrets it needs and writes the app's static config back to `kvv2/<app>/config`. No pre-shared Vault token is ever stored — the trust is established per-run via OIDC.
|
||||
3. **Runtime path:** in the cluster, the Vault Secrets Operator authenticates with the Kubernetes auth backend, presenting the app's ServiceAccount token mapped to the Vault role `<app>`.
|
||||
4. Vault issues a **short-lived, dynamic** PostgreSQL credential from `postgres/creds/<app>` back to VSO.
|
||||
5. VSO materialises that credential into a Kubernetes Secret in the app's namespace, then **renews the lease** and rotates the Secret before it expires.
|
||||
6. The app pod mounts the Kubernetes Secret like any other — it never talks to Vault, and never holds a long-lived database password.
|
||||
|
||||
## The unseal model
|
||||
|
||||
Vault encrypts its storage with a master key that is **never persisted in usable form**. On every start — a fresh deploy, a pod reschedule, or a full cluster recovery — Vault comes up **sealed** and refuses every request until it is unsealed.
|
||||
|
||||
- **Shamir config:** 1 unseal key, threshold 1 (a single-operator lab, so no key-splitting ceremony).
|
||||
- **Where the key lives:** on the control node (the MacBook), at `~/.arcodange/cluster-keys.json`. It is *not* in git, *not* in Kubernetes, *not* in Vault.
|
||||
- **Operational consequence:** **nothing that needs a secret recovers until a human unseals Vault.** This is the chokepoint baked into the recovery order — VSO cannot re-auth, dynamic DB creds cannot be issued, and dependent apps cannot start, until the unseal happens. See [Storage & recovery](storage-and-recovery.md) for where unseal sits in the tested startup sequence.
|
||||
|
||||
> [!CAUTION]
|
||||
> If `~/.arcodange/cluster-keys.json` is lost, Vault's data is **unrecoverable** — there is no second copy of the unseal key and no key-recovery path. Treat that file as the most critical secret in the lab.
|
||||
|
||||
## Sandbox implications
|
||||
|
||||
A production-like sandbox does **not** share the production Vault. It runs its **own** Vault instance with its **own** unseal key and its **own** policies, so that exercising secret flows, rotating credentials, or testing a broken unseal cannot touch production secrets. Because the `<app>` join key is environment-relative (see [Naming conventions](naming-conventions.md)), the sandbox can keep identical role and policy names — `gitea_cicd_<app>`, `<app>`, `<app>-ops` — while remaining fully isolated. The rationale for that separate-Vault, separate-unseal posture is recorded in [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md).
|
||||
|
||||
## See also
|
||||
|
||||
- [Tools brick](02-tools.md) — where the `hashicorp-vault` chart, VSO, and the per-app Vault IaC modules are deployed.
|
||||
- [Storage & recovery](storage-and-recovery.md) — Vault unseal as a step in the tested power-cut recovery order.
|
||||
- [Naming conventions](naming-conventions.md) — how `gitea_cicd_<app>`, `<app>`, and `<app>-ops` derive from the join key.
|
||||
- [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md) — the sandbox's separate-Vault decision.
|
||||
Reference in New Issue
Block a user