docs(vibe): add tools/ and cms/ guidebooks

Two code-grounded tree-docs guidebooks under vibe/guidebooks/, drilling into the
lab-ecosystem 02-tools and 03-cms pages (bidirectional):

- tools/  : hub + components.md (Vault+VSO, Prometheus, Grafana, CrowdSec,
  pgbouncer, Redis/KeyDB, Plausible, ClickHouse; pgcat/tool as Tier-2) +
  secrets-and-vso.md (Vault engines/auth, the app_roles/app_policy modules =
  the <app> join-key machinery, VSO CRDs, secret-paths inventory).
- cms/    : hub + site.md (Nuxt + dual Pages/k3s deploy) + cloudflare.md
  (zone via OVH->CF, Pages, cloudflared tunnel, Turnstile, R2 state) +
  zoho-email.md (OAuth, MX/SPF/DKIM/DMARC/BIMI, the 7 aliases).

Sibling-repo code linked via full gitea URLs; vibe-internal links bidirectional.
Reconciled the cloudflared tunnel token path to kvv2 cms/cloudflared (the chart
VaultStaticSecret is kv-v2; the kvv1 tofu reference is a commented-out stub).
6 mermaid diagrams MCP-validated; zero dead links. Lab Cartographer cohort.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-23 21:41:15 +02:00
parent dbe32161dc
commit 548dacfc44
10 changed files with 1110 additions and 0 deletions

View File

@@ -36,6 +36,8 @@ flowchart LR
|---|---|---|
| [Lab ecosystem](lab-ecosystem/README.md) | End-to-end map of `factory` + `tools` + `cms`: repos, the `<app>` join key, secrets via Vault, CI/CD, ArgoCD, and the data/control flows that connect them | ✅ Active |
| [Factory provisioning](factory-provisioning/README.md) | Deep dive into how factory provisions everything: Ansible playbooks + roles and OpenTofu | ✅ Active |
| [Tools](tools/README.md) | Deep dive into the lab platform services in the `tools` namespace (Vault+VSO, Prometheus, Grafana, CrowdSec, poolers, Redis, Plausible, ClickHouse) | ✅ Active |
| [CMS](cms/README.md) | Deep dive into the public Nuxt site arcodange.fr + its Cloudflare DNS/tunnel/Turnstile and Zoho email IaC | ✅ Active |
## Rules to contribute

View File

@@ -0,0 +1,75 @@
[vibe](../../README.md) > [Guidebooks](../README.md) > **CMS**
# CMS
> **Status:** ✅ Active
> **Last Updated:** 2026-06-23
> **Upstream:** [lab-ecosystem 03 · cms](../lab-ecosystem/03-cms.md)
> **Downstream:** [Site (Nuxt)](site.md) · [Cloudflare](cloudflare.md) · [Zoho email](zoho-email.md)
> **Related:** [tools CrowdSec](../tools/components.md) · [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) · [tofu CI flow](../factory-provisioning/opentofu/ci-apply-flow.md) · [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md)
This guidebook maps the [`cms` repo](https://gitea.arcodange.lab/arcodange-org/cms) — the one app in the lab whose primary audience is the open Internet. It serves the public site **arcodange.fr** and owns the OpenTofu that wires its Cloudflare edge, its Cloudflared tunnel into the cluster, its Turnstile CAPTCHA, and its Zoho email.
## Two faces of one repo
The `cms` repo holds two distinct concerns that share a domain but live in different directories.
| Face | Where | What it is |
|---|---|---|
| **The SITE** | repo root ([`app/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/app), [`content/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/content), [`chart/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart)) | A **Nuxt 4** application (Nuxt Content + Nuxt Studio) built to static output and deployed **two ways**: to **Cloudflare Pages** (public `arcodange.fr` / `www`) and into **k3s** via a Helm chart (ArgoCD app **`cms`**) reachable through the Cloudflared tunnel (e.g. `cms-rec.arcodange.fr`, `www.arcodange.lab`) |
| **The IaC** | [`cloudflare/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare) | **OpenTofu** managing the `arcodange.fr` zone (registered at OVH, DNS delegated to Cloudflare), Cloudflare **Pages**, the **Cloudflared** Zero-Trust tunnel into internal Traefik, a **Turnstile** CAPTCHA feeding CrowdSec, and **Zoho** email |
The site is *what visitors see*; the IaC is *how they reach it and how mail flows*. Both deploy from the same Gitea repo through Gitea Actions.
## Public request + email flow
```mermaid
%%{init: {'theme': 'base'}}%%
flowchart LR
classDef edge fill:#d97706,stroke:#b45309,color:#fff
classDef proc fill:#059669,stroke:#047857,color:#fff
classDef store fill:#7c3aed,stroke:#6d28d9,color:#fff
USER(["Visitor"]):::edge
CFDNS["Cloudflare DNS<br>arcodange.fr zone"]:::edge
PAGES["Cloudflare Pages<br>(static Nuxt build)"]:::proc
TUN["Cloudflared tunnel"]:::edge
TRAEFIK["internal Traefik"]:::proc
CS["CrowdSec bouncer<br>(Turnstile-backed)"]:::proc
CMS["cms pod (Nuxt)<br>cms-rec.arcodange.fr"]:::proc
MAIL(["Sender"]):::edge
ZOHO["Zoho<br>MX / SPF / DKIM / DMARC / BIMI"]:::store
USER --> CFDNS
CFDNS -- "arcodange.fr / www" --> PAGES
CFDNS -- "*.arcodange.fr" --> TUN
TUN --> TRAEFIK --> CS --> CMS
MAIL -- "MX lookup arcodange.fr" --> ZOHO
```
1. A **visitor** resolves a hostname under `arcodange.fr` through **Cloudflare DNS** (the zone OpenTofu manages).
2. The apex and `www` records (proxied CNAMEs) land on **Cloudflare Pages**, which serves the static Nuxt build directly from the edge.
3. Wildcard `*.arcodange.fr` hostnames route through the **Cloudflared** Zero-Trust tunnel — no home-LAN ports are opened — onto **internal Traefik**, which passes the request through the **CrowdSec** bouncer (its CAPTCHA challenge backed by Turnstile) to the in-cluster **`cms`** Nuxt pod (e.g. `cms-rec.arcodange.fr`).
4. Separately, **email** to `arcodange.fr` follows the **MX** record to **Zoho**, with **SPF/DKIM/DMARC/BIMI** authenticating and presenting the mail.
## Index
| Page | What it maps | Status |
|---|---|---|
| [Site (Nuxt)](site.md) | The Nuxt 4 app: Nuxt Content + Studio, static build, the dual deploy to Cloudflare Pages and to k3s via the Helm chart / ArgoCD app `cms` | ✅ Active |
| [Cloudflare](cloudflare.md) | The `cloudflare/` OpenTofu: zone (OVH-registered, CF-delegated), Pages, the Cloudflared tunnel into Traefik, and the Turnstile CAPTCHA for CrowdSec | ✅ Active |
| [Zoho email](zoho-email.md) | Zoho mail IaC: domain verification, MX/SPF/DKIM/DMARC/BIMI records, and the public aliases | ✅ Active |
## Maintenance rule
> [!IMPORTANT]
> **If any component documented in this guidebook is altered, update the page describing it in the same change.** A reference map that drifts from the real `cms` repo sends readers and agents down dead paths. The PR that changes a component is the PR that updates its CMS guidebook page.
## Cross-references
- [lab-ecosystem 03 · cms](../lab-ecosystem/03-cms.md) — the whole-lab view of where `cms` sits among `factory` + `tools`.
- [tools CrowdSec](../tools/components.md) — the Traefik bouncer the Turnstile challenge feeds for public-edge decisioning.
- [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) — where the Cloudflared tunnel token, Turnstile secret, and Cloudflare/Zoho/OVH credentials live in Vault.
- [tofu CI flow](../factory-provisioning/opentofu/ci-apply-flow.md) — the OpenTofu apply pipeline pattern the `cloudflare/` IaC follows in Gitea Actions.
- [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md) — why public-facing surfaces like this one are isolated from a safe prod-like environment.
- Repo: [arcodange-org/cms](https://gitea.arcodange.lab/arcodange-org/cms).

View File

@@ -0,0 +1,182 @@
[vibe](../../README.md) > [Guidebooks](../README.md) > [CMS](README.md) > **Cloudflare**
# Cloudflare
> **Status:** ✅ Active
> **Last Updated:** 2026-06-23
> **Upstream:** [CMS](README.md) · [lab-ecosystem 03 · cms](../lab-ecosystem/03-cms.md)
> **Downstream:** [tools CrowdSec](../tools/components.md) (consumes the Turnstile widget)
> **Related:** [Zoho email](zoho-email.md) · [tofu CI flow](../factory-provisioning/opentofu/ci-apply-flow.md) · [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) · [naming conventions](../lab-ecosystem/naming-conventions.md) · [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md)
This page maps [`cms/cloudflare/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare) — the OpenTofu root that owns the **`arcodange.fr`** edge. One `tofu apply` registers the zone at OVH, **delegates its DNS to Cloudflare**, publishes the public site on **Cloudflare Pages**, opens a **Cloudflared** Zero-Trust tunnel into the in-cluster Traefik, mints the **Turnstile** CAPTCHA the [tools CrowdSec bouncer](../tools/components.md) challenges with, and (via a sibling module) wires **Zoho** mail. The Nuxt site itself is not built here — see [Site (Nuxt)](site.md).
## Providers
Declared in [`providers.tf`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/providers.tf). Versions pinned in [`.terraform.lock.hcl`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/.terraform.lock.hcl).
| Provider | Source | Version | Auth | Purpose |
|---|---|---|---|---|
| `cloudflare` | `cloudflare/cloudflare` | `~> 5` | `CLOUDFLARE_API_TOKEN` env | Zone, Pages, DNS records, Zero-Trust tunnel, Turnstile, zone settings |
| `ovh` | `ovh/ovh` | `~> 2.8` | `OVH_*` env (`ovh-eu` endpoint) | Domain registration + nameserver delegation |
| `vault` | `vault` | `5.5.0` | `auth_login_jwt` (mount `gitea_jwt`, role `gitea_cicd_cms`) at `https://vault.arcodange.lab` | Persists the Turnstile secret/sitekey; reads tunnel token |
> [!NOTE]
> The Vault provider authenticates with a **Gitea-issued OIDC JWT** (`TERRAFORM_VAULT_AUTH_JWT`), the same OIDC→Vault pattern the [tofu CI flow](../factory-provisioning/opentofu/ci-apply-flow.md) documents lab-wide.
## State backend — S3 on Cloudflare R2
[`backend.tf`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/backend.tf) keeps state in an **S3-compatible bucket on Cloudflare R2**, not AWS. The `skip_*` flags and `use_path_style` are what let the AWS S3 backend talk to R2.
| Setting | Value |
|---|---|
| `bucket` | `arcodange-tf` |
| `key` | `cms/terraform.tfstate` |
| `region` | `auto` |
| `endpoints.s3` | `var.CLOUDFLARE_S3_ENDPOINT` (R2 S3 API URL) |
| `access_key` / `secret_key` | `var.CLOUDFLARE_S3_ACCESS_KEY` / `var.CLOUDFLARE_S3_SECRET_ACCESS_KEY` |
| Flags | `skip_credentials_validation`, `skip_metadata_api_check`, `skip_region_validation`, `skip_requesting_account_id`, `skip_s3_checksum`, `use_path_style` |
> [!WARNING]
> The R2 backend credentials are **Terraform variables**, so they must be present in the environment *before* `tofu init` can read state. CI injects them from Vault path `kvv1/cloudflare/r2/arcodange-tf` (mapped to `TF_VAR_CLOUDFLARE_*` — see [CI](#ci--cloudflareyaml) below). Without those creds nothing — not even a read-only plan — can run.
## Resource graph
```mermaid
%%{init: {'theme': 'base'}}%%
flowchart TD
classDef ovh fill:#1e3a8a,stroke:#1e40af,color:#fff
classDef cf fill:#d97706,stroke:#b45309,color:#fff
classDef mod fill:#059669,stroke:#047857,color:#fff
classDef vault fill:#7c3aed,stroke:#6d28d9,color:#fff
OVHDOM["ovh_domain_name<br>arcodange.fr"]:::ovh
OVHNS["ovh_domain_name_servers<br>delegate NS"]:::ovh
ZONE["cloudflare_zone<br>arcodange.fr"]:::cf
PAGES["cloudflare_pages_project<br>arcodange-cms (branch main)"]:::cf
PDOM["cloudflare_pages_domain<br>arcodange.fr + www"]:::cf
DNS["cloudflare_dns_record<br>@ + www CNAME (proxied)"]:::cf
TUN["module.cf_tunnel<br>Zero-Trust tunnel 'lab'"]:::mod
CAP["module.cf_captcha_for_crowdsec<br>Turnstile widget"]:::mod
ZOHO["module.zoho<br>mail records"]:::mod
VBACK["module.vault_backend<br>cms app role (cloudflared)"]:::vault
OVHDOM --> ZONE
ZONE -- "name_servers" --> OVHNS
ZONE --> PAGES --> PDOM
PAGES -- "subdomain target" --> DNS
ZONE --> DNS
ZONE --> TUN
ZONE --> ZOHO
OVHDOM --> CAP
```
1. **`ovh_domain_name "arcodange.fr"`** anchors the registration (imported into state, not created by OpenTofu).
2. **`cloudflare_zone`** creates the Cloudflare zone for that domain under the `arcodange@gmail.com` account.
3. **`ovh_domain_name_servers`** writes Cloudflare's assigned nameservers back at OVH, **delegating DNS to Cloudflare**.
4. **`cloudflare_pages_project "arcodange-cms"`** (production branch `main`) plus two **`cloudflare_pages_domain`** resources attach `arcodange.fr` and `www.arcodange.fr` to Pages.
5. **`cloudflare_dns_record`** publishes apex (`@`) and `www` as **proxied CNAMEs** pointing at the Pages project's `.pages.dev` subdomain.
6. The three **modules** (`cf_tunnel`, `cf_captcha_for_crowdsec`, `zoho`) and `vault_backend` hang off the same zone/domain/account.
### DNS & zone resources
| Resource | Name | Detail |
|---|---|---|
| `ovh_domain_name.arcodange_fr` | `arcodange.fr` | Registration; `# was terraform imported into state` |
| `cloudflare_zone.arcodange_fr` | `arcodange.fr` | Zone under account resolved from `arcodange@gmail.com` |
| `ovh_domain_name_servers.arcodange_fr` | — | Delegates NS to `cloudflare_zone…name_servers` (or `original_name_servers` when rolling back) |
| `terraform_data.arcodange_fr_initial_conf` | — | Snapshot of OVH's pre-Cloudflare config, kept for rollback inspection (`ignore_changes`) |
| `cloudflare_pages_project.arcodange_fr` | `arcodange-cms` | `production_branch = "main"` |
| `cloudflare_pages_domain.arcodange_fr` | `arcodange.fr` | Custom domain on Pages |
| `cloudflare_pages_domain.www_arcodange_fr` | `www.arcodange.fr` | Custom domain on Pages |
| `cloudflare_dns_record.root_cname` | `@` | CNAME → Pages `subdomain`, `proxied = true`, `ttl = 1` |
| `cloudflare_dns_record.www_cname` | `www` | CNAME → Pages `subdomain`, `proxied = true`, `ttl = 1` |
All wiring lives in [`iac.tf`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/iac.tf). The account id is resolved at plan time via `data.cloudflare_account` filtered on the `arcodange@gmail.com` account name.
## Module: `cloudflared_tunnel`
[`modules/cloudflared_tunnel/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/modules/cloudflared_tunnel). A **Zero-Trust Cloudflared tunnel** that lets public hostnames reach in-cluster services **without opening any home-LAN port** — Cloudflare originates the connection from inside the cluster outward. Instantiated as `module.cf_tunnel` with `tunnel_name = "lab"`.
| Resource | Role |
|---|---|
| `cloudflare_zero_trust_tunnel_cloudflared.tunnel` | The tunnel named **`lab`** under the account |
| `cloudflare_zero_trust_tunnel_cloudflared_config.tunnel_config` | Ingress rules from `hostname_mappings`, terminating in a catch-all `http_status:404` |
| `data.cloudflare_zone.arcodange` | Looks up the zone (created by the root module) |
| `cloudflare_zone_setting.setting` | Sets **`always_use_https = on`** |
| `cloudflare_dns_record.dns` | One **proxied CNAME** per mapping → `<tunnel_id>.cfargotunnel.com` |
The single ingress mapping passed from the root is:
| Hostname | Service |
|---|---|
| `*.arcodange.fr` | `http://traefik.kube-system.svc.cluster.local:80` |
So every wildcard subdomain under `arcodange.fr` lands on the cluster's **internal Traefik** (`origin_request.no_tls_verify = true`), which then routes to the right in-cluster app (e.g. the `cms` Nuxt pod, Grafana, etc.). Pairs with the apex/`www` Pages records above, which are *not* tunneled.
> [!CAUTION]
> **The tunnel token is created by hand and rotation is not automated.** Cloudflare only issues a connector token from the web console, so it is **manually stored in Vault** under the KV-v2 mount `kvv2` at path `cms/cloudflared` (the in-repo `vault_kv_secret` resource is commented out for exactly this reason). The cluster-side `cloudflared` Deployment reads it via a `VaultStaticSecret` (Vault Secrets Operator), role `cms`, refreshed hourly. If the token is rotated in the console, the Vault entry must be updated **manually** — nothing in this IaC will do it. `module.vault_backend` provisions the `cms` Vault app role (service account `cloudflared`) that grants that read; see [secrets-and-vault](../lab-ecosystem/secrets-and-vault.md).
## Module: `cloudflared_captcha_for_crowdsec`
[`modules/cloudflared_captcha_for_crowdsec/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/modules/cloudflared_captcha_for_crowdsec). Mints a **Cloudflare Turnstile widget** and stores its keys in Vault for the [tools CrowdSec bouncer](../tools/components.md) to serve as a CAPTCHA challenge on remediated requests.
| Resource | Detail |
|---|---|
| `cloudflare_turnstile_widget.turnstile` | `name = "crowdsec captcha"`, `mode = "invisible"`, `clearance_level = "interactive"`, `region = "world"`; `bot_fight_mode`/`ephemeral_id`/`offlabel` all `false` |
| `vault_kv_secret_v2.turnstile` | Writes `{ sitekey, secret }` to KV-v2 (`cas = 1`) |
Instantiated as `module.cf_captcha_for_crowdsec` with `domain_names = [arcodange.fr, arcodange.lab, arcodange.duckdns.org]` and `vault_path = "cms/factory/turnstile"`.
| What | Where |
|---|---|
| **Turnstile mode** | Invisible widget, interactive clearance — challenges only when CrowdSec flags a request |
| **Vault destination** | `kvv2/cms/factory/turnstile` → keys `sitekey` + `secret` |
| **Consumer** | The [CrowdSec Traefik bouncer in `tools`](../tools/components.md) reads sitekey + secret to render and verify the challenge |
This is the one knot that ties the **`cms`** edge to the **`tools`** security stack: `cms` produces the Turnstile keys; `tools` consumes them.
## Sibling module: Zoho mail
`module.zoho` (source `./zoho`) lives in **this same OpenTofu root** and writes mail records into the same `cloudflare_zone`. It is documented separately on [Zoho email](zoho-email.md) — note that a `cms/cloudflare` apply touches mail DNS too, so plan output there is expected.
## CI — `cloudflare.yaml`
[`.gitea/workflows/cloudflare.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/.gitea/workflows/cloudflare.yaml). Manual-only (`workflow_dispatch`), same Gitea-OIDC→Vault→`tofu apply` shape as the [tofu CI flow concept](../factory-provisioning/opentofu/ci-apply-flow.md).
1. **`gitea_vault_auth`** — mints a Gitea OIDC id-token (decodes `vault_oauth__sh_b64` and runs it), exported as `gitea_vault_jwt`.
2. **`tofu`** — depends on the auth job; a shared `*vault_step` reads all secrets from Vault (role `gitea_cicd_cms`, mount `gitea_jwt`), prepares the homelab CA cert, then runs **`dflook/terraform-apply@v1`** on `path: cloudflare/` with **`auto_approve: true`** at **OpenTofu `1.8.2`**.
### Vault secrets read by the workflow
| Vault path | Mapped to | Used for |
|---|---|---|
| `kvv1/cloudflare/cms/cf_arcodange_cms_token` (`token`) | `CLOUDFLARE_API_TOKEN` | Cloudflare provider auth |
| `kvv1/cloudflare/r2/arcodange-tf` (`*`) | `TF_VAR_CLOUDFLARE_*` | R2/S3 state backend creds + endpoint |
| `kvv1/gitea/tofu_module_reader` (`ssh_private_key`) | `TERRAFORM_SSH_KEY` | SSH key to clone the `tools` git module (`vault_backend`) |
| `kvv1/ovh/cms/app` (`*`) | `OVH_*` | OVH provider auth |
| `kvv1/zoho/self_client` (`*`) | `ZOHO_*` **and** `TF_VAR_ZOHO_*` | Zoho API auth for `module.zoho` |
> [!CAUTION]
> **`auto_approve: true` applies without a human gate.** Any dispatch of this workflow on any ref runs `tofu apply` straight against the live `arcodange.fr` edge and Vault. There is no plan-review step; review happens in the PR before merge, not in the apply. Treat a dispatch as a production change.
## Gotchas
> [!CAUTION]
> **Cloudflared tunnel token — manual, unrotated.** Created in the Cloudflare console and hand-placed in Vault under `kvv2` at path `cms/cloudflared`. No IaC rotates it. (Repeated here because it is the most common surprise.)
> [!WARNING]
> **OVH → Cloudflare nameserver delegation is the live cutover.** `ovh_domain_name_servers` points OVH at Cloudflare's nameservers. The `use_ovh_initial_name_servers` variable (default `false`) is meant to flip delegation back to OVH's `original_name_servers`, but that **rollback path is untested** — `terraform_data.arcodange_fr_initial_conf` only *snapshots* the pre-Cloudflare config for inspection. Do not assume a clean revert.
> [!WARNING]
> **R2-backed state creds gate everything.** State lives on Cloudflare R2 and the access/secret keys are `TF_VAR_` inputs (from `kvv1/cloudflare/r2/arcodange-tf`). If those creds are missing or rotated out from under the workflow, even `tofu init` fails — there is no fallback backend.
## Cross-references
- [CMS](README.md) — the guidebook hub; the public-request + email flow diagram.
- [Site (Nuxt)](site.md) — the Nuxt app served by the Pages project and the in-cluster pod this tunnel fronts.
- [Zoho email](zoho-email.md) — `module.zoho` lives in this same OpenTofu root.
- [tools CrowdSec](../tools/components.md) — consumer of the Turnstile widget minted here.
- [tofu CI flow concept](../factory-provisioning/opentofu/ci-apply-flow.md) — the shared Gitea-OIDC→Vault→apply pattern.
- [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) — where the tunnel token, Turnstile keys, and provider creds live.
- [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md) — why this Internet-facing surface is isolated from the safe prod-like environment.
- Code: [`cms/cloudflare/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare).

165
vibe/guidebooks/cms/site.md Normal file
View File

@@ -0,0 +1,165 @@
[vibe](../../README.md) > [Guidebooks](../README.md) > [CMS](README.md) > **Site (Nuxt)**
# Site (Nuxt)
> **Status:** ✅ Active
> **Last Updated:** 2026-06-23
> **Upstream:** [CMS](README.md) · [lab-ecosystem 03 · cms](../lab-ecosystem/03-cms.md)
> **Downstream:** [Cloudflare](cloudflare.md)
> **Related:** [Zoho email](zoho-email.md) · [tools CrowdSec](../tools/components.md) · [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md)
The public site face of the [`cms` repo](https://gitea.arcodange.lab/arcodange-org/cms): a **Nuxt 4** application built to **static HTML** and shipped two ways from one image — to **Cloudflare Pages** (the live public `arcodange.fr`) and into **k3s** via a Helm chart behind the Cloudflared tunnel. This page maps the Nuxt app, its Docker build, the Helm chart, and the Gitea Actions that drive both deploys.
## The Nuxt 4 application
Configured in [`nuxt.config.ts`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/nuxt.config.ts). It runs `ssr: true` for dev but is shipped via **`nuxt generate`** — a full static prerender — so production is plain HTML served by a static file server, no Node runtime.
| Concern | Setting | Notes |
|---|---|---|
| Rendering | `ssr: true`, shipped via `nuxt generate` | Static prerender to `.output/public`; Nitro `prerender.autoSubfolderIndex: false` |
| Site identity | `site.url: https://arcodange.fr`, `site.name: Arcodange`, `trailingSlash: true` | Drives canonical URLs, sitemap, robots via `@nuxtjs/seo` |
| Content | `@nuxt/content` collections | Markdown under [`content/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/content); mermaid highlight enabled |
| Editing | **Nuxt Studio** at route **`/admin`** | `nuxt-studio` module; repo `arcodange-org/cms`, commits to `main` |
| Sitemap / robots | `@nuxtjs/sitemap` (`zeroRuntime: true`), `@nuxtjs/seo` | No runtime sitemap server — fully prerendered |
| Analytics | `@nuxtjs/plausible` | `apiHost: https://analytics.arcodange.fr`, `hashMode: true`, outbound tracking on, `localhost` ignored |
| i18n | `@nuxtjs/i18n` | Single locale **`fr`** (default `fr`); `htmlAttrs.lang: fr` |
| Images | `@nuxt/image` | `webp`/`jpeg`, quality 80 |
| Fonts | `@nuxt/fonts` | Local **Noto Emoji** preloaded |
| UI | `@nuxt/ui` | Plus `@nuxt/scripts`, `@nuxtjs/device`, `nuxt-booster`, `@compodium/nuxt` |
### Content collections
Declared in [`content.config.ts`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/content.config.ts). Every collection is wrapped with `asSeoCollection()` (from `@nuxtjs/seo`) and sourced from a folder of Markdown under [`content/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/content).
| Collection | Source glob | Type | Schema extras |
|---|---|---|---|
| `parcours` | `parcours/*.md` | `page` | — |
| `site` | `site/*.md` | `page` | — |
| `tech` | `tech/*.md` | `page` | `date` (required), `image` (media), `featured` (default `false`) |
| `experiences` | `experiences/*.md` | `page` | `date`, `enddate`, `icon` (default `i-lucide-rocket`), `image`, `secondaryImage`, `descriptionHTML` |
A content build transformer `~~/content/transformers/description-md` runs at build time, and Markdown highlighting registers the `mermaid` language.
## Docker build: one image, two static trees
[`Dockerfile`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/Dockerfile) is a multi-stage build that produces **two** static outputs from the same source and packs them into a static web server image.
```mermaid
%%{init: {'theme': 'base'}}%%
flowchart LR
classDef base fill:#7c3aed,stroke:#6d28d9,color:#fff
classDef build fill:#059669,stroke:#047857,color:#fff
classDef out fill:#d97706,stroke:#b45309,color:#fff
DEPS["cms-deps:TAG<br>(Dockerfile.deps base)"]:::base
BUILD["build stage<br>npm ci"]:::build
PROD["nuxt generate<br>→ /app/prod"]:::out
STG["NUXT_SITE_ENV=staging<br>nuxt generate → /app/.output/public"]:::out
SWS["static-web-server:2<br>serves /public"]:::build
DEPS --> BUILD
BUILD --> PROD
BUILD --> STG
PROD --> SWS
STG --> SWS
```
1. The **build stage** starts `FROM gitea.arcodange.lab/arcodange-org/cms-deps:${BASE_IMAGE_TAG}` — a prebuilt base ([`Dockerfile.deps`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/Dockerfile.deps), `node:24-slim` + `python3`/`make`/`g++`/`sqlite3`/`libvips` for `better-sqlite3`/`libvips`) — copies the source and runs `npm ci`.
2. The **prod** build: `npx nuxt generate`, then the output is moved to **`/app/prod`**.
3. The **staging** build: `NUXT_SITE_ENV="staging" npx nuxt generate`, leaving its output at **`/app/.output/public`**.
4. The **server stage** is `FROM joseluisq/static-web-server:2`; it copies the staging tree to **`/public`** and the prod tree to **`/prod`**, plus [`webserver.config.toml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/webserver.config.toml) as `/sws.toml`, and serves on port 80.
> [!NOTE]
> **`/public` is staging, `/prod` is production.** The static-web-server serves `root = "./public"` (the **staging** build) by default — that is what the in-cluster k3s deploy exposes (e.g. `cms-rec.arcodange.fr`). The **prod** build at `/prod` is the tree extracted and pushed to Cloudflare Pages by the `arcodange_fr` workflow. One image therefore carries both faces.
The final image is pushed to **`gitea.arcodange.lab/arcodange-org/cms`** (tags `latest` and the branch ref).
## Helm chart
[`chart/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart) deploys the in-cluster face. The pod is just the static-web-server image above, fronted by Traefik with a CrowdSec middleware and reached either over the lab ingress (`www.arcodange.lab`) or through a sidecar Cloudflared tunnel (`cms-rec.arcodange.fr`).
| Key | Value | Source |
|---|---|---|
| Chart name / version | `arcodange-cms` / `0.1.0`, `appVersion: latest` | [`Chart.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/Chart.yaml) |
| Image | `gitea.arcodange.lab/arcodange-org/cms:latest`, `pullPolicy: Always` | [`values.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/values.yaml) |
| Replicas | `1` (autoscaling disabled) | `replicaCount: 1`, `autoscaling.enabled: false` |
| Service | `ClusterIP`, port **80** (named `http`) | `service.port: 80` |
| Probes | liveness + readiness `httpGet /` on `http` | — |
| ServiceAccount | created, name **`cms`**, automount on | `serviceAccount.name: cms` |
| Lab ingress | `www.arcodange.lab`, path `/` Prefix | Traefik `websecure`, TLS via `letsencrypt` resolver (`arcodange.lab` + SAN `www.arcodange.lab`) |
| Edge middleware | `kube-system-crowdsec@kubernetescrd` | applied on both ingresses |
| Tunnel ingress | `cms-rec.arcodange.fr`, Traefik `web` entrypoint | `ingress.cloudflared.host` |
| Cloudflared sidecar | enabled, `Deployment`, `1` replica, image `cloudflare/cloudflared:latest` | `cloudflared.*` |
| Tunnel token | Vault KV-v2 `kvv2` path `cms/cloudflared`, role `cms`, refresh `1h` | `cloudflared.vault.*` |
### Chart templates
The chart renders these objects (in [`chart/templates/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates)):
| Template | Renders |
|---|---|
| [`deployment.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates/deployment.yaml) | the `cms` static-web-server pod, port `http`/80 |
| [`service.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates/service.yaml) | ClusterIP service on 80 |
| [`ingress.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates/ingress.yaml) | lab Traefik ingress for `www.arcodange.lab` + CrowdSec middleware |
| [`ingress_cloudflared.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates/ingress_cloudflared.yaml) | `<fullname>-cloudflared` ingress for `cms-rec.arcodange.fr` (web entrypoint) |
| [`cloudflared_tunnel.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates/cloudflared_tunnel.yaml) | `cloudflared` SA, `VaultAuth`, `VaultStaticSecret`, and the cloudflared `Deployment`/`DaemonSet` |
| [`serviceaccount.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates/serviceaccount.yaml) | the `cms` ServiceAccount |
| [`ingress_gitea.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates/ingress_gitea.yaml), [`hpa.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates/hpa.yaml) | optional Gitea ingress; HPA (disabled) |
### Cloudflared tunnel template
The cloudflared sidecar pulls its tunnel token from Vault through the [VSO](../tools/components.md) operator, never from a static manifest:
1. A `ServiceAccount` **`cloudflared`** is created with a `VaultAuth` (Kubernetes auth, mount `kubernetes`, role from `cloudflared.vault.role` = `cms`, audience `vault`).
2. A **`VaultStaticSecret`** named `cloudflared-tunnel-token` reads **KV-v2** mount **`kvv2`** at path **`cms/cloudflared`** (refresh `1h`) and materialises a `cloudflared-tunnel-token` Secret.
3. The cloudflared `Deployment` (1 replica, pinned to a `control-plane` node via affinity) runs `cloudflared tunnel --no-autoupdate run --token $(TUNNEL_TOKEN) --no-tls-verify`, with `TUNNEL_TOKEN` injected from that Secret's `token` key.
This connects Cloudflare's edge to internal Traefik so `cms-rec.arcodange.fr` reaches the in-cluster `cms` service without opening any home-LAN port — the cluster side of the tunnel whose Cloudflare side lives in the [Cloudflare IaC](cloudflare.md).
## CI: building and deploying
Three Gitea Actions workflows under [`.gitea/workflows/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/.gitea/workflows) cover the site. (A fourth, `cloudflare.yaml`, drives the OpenTofu — see [Cloudflare](cloudflare.md).)
| Workflow | Triggers | What it does |
|---|---|---|
| [`docker-dependencies.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/.gitea/workflows/docker-dependencies.yaml) | `workflow_dispatch`; push to `main` touching `package.json`, `package-lock.json`, `Dockerfile.deps` | Builds the **deps** base image, pushes `gitea.arcodange.lab/<repo>-deps:{latest,YYYYMMDD-SHA8}`, then creates+pushes a **git tag `deps-YYYYMMDD-SHA8`** (with retry, up to 30 attempts) |
| [`docker-content.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/.gitea/workflows/docker-content.yaml) | `workflow_dispatch`; push to `main` touching `nuxt.config.ts`, `app/**`, `content.config.ts`, `content/**`, `public/**`, `package*.json`, `Dockerfile` | Finds the latest `deps-*` git tag, strips `deps-` to get `BASE_TAG`, builds the **full image** with `--build-arg BASE_IMAGE_TAG=$BASE_TAG`, pushes `gitea.arcodange.lab/<repo>:{latest,<ref>}` |
| [`arcodange_fr.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/.gitea/workflows/arcodange_fr.yaml) | `workflow_dispatch` (input `image_tag`, default `main`) | Pulls `cms:<image_tag>`, `docker create` + `docker cp` to extract **`/prod`** to `./public`, writes a minimal `wrangler.toml`, then **`wrangler pages deploy`** to project `arcodange-cms`, branch `main` |
> [!IMPORTANT]
> **The deps tag is the contract between the two Docker workflows.** `docker-dependencies` publishes both the `-deps` image and a matching **git tag** `deps-YYYYMMDD-SHA8`; `docker-content` discovers that tag (`git tag --list "deps-*" | sort -V | tail -n1`) to pin its `BASE_IMAGE_TAG`. Touch `package.json`/lockfile/`Dockerfile.deps` and the deps build must land first, or the content build pins a stale base.
### From image to Cloudflare Pages
```mermaid
%%{init: {'theme': 'base'}}%%
flowchart LR
classDef ci fill:#059669,stroke:#047857,color:#fff
classDef reg fill:#7c3aed,stroke:#6d28d9,color:#fff
classDef edge fill:#d97706,stroke:#b45309,color:#fff
DEP["docker-dependencies<br>→ -deps image + git tag deps-*"]:::ci
CON["docker-content<br>pins BASE_IMAGE_TAG"]:::ci
REG["registry<br>gitea.arcodange.lab/arcodange-org/cms"]:::reg
FR["arcodange_fr<br>extract /prod"]:::ci
PAGES["Cloudflare Pages<br>project arcodange-cms"]:::edge
K3S["k3s Helm chart<br>serves /public (staging)"]:::edge
DEP --> CON --> REG
REG --> FR --> PAGES
REG --> K3S
```
1. **`docker-dependencies`** publishes the `-deps` base image and a `deps-YYYYMMDD-SHA8` git tag whenever dependencies change.
2. **`docker-content`** resolves that tag, builds the full dual-tree image, and pushes it to **`gitea.arcodange.lab/arcodange-org/cms`**.
3. **`arcodange_fr`** (manual) pulls that image, extracts the **`/prod`** tree, and deploys it to **Cloudflare Pages** project `arcodange-cms` on branch `main` — this is the live public `arcodange.fr`.
4. In parallel, the k3s **Helm chart** runs the same image and serves the **`/public`** (staging) tree behind Traefik + CrowdSec and the Cloudflared tunnel (`cms-rec.arcodange.fr`, `www.arcodange.lab`).
## Cross-references
- [CMS](README.md) — the guidebook hub: the two faces of the repo and the public request/email flow.
- [Cloudflare](cloudflare.md) — the Cloudflare side of the tunnel, the Pages project, and the zone the deploys publish into.
- [Zoho email](zoho-email.md) — mail for the same `arcodange.fr` domain.
- [tools CrowdSec](../tools/components.md) — the Traefik bouncer middleware fronting both chart ingresses.
- [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) — where the Cloudflared tunnel token (`kvv2` `cms/cloudflared`) and registry/CF credentials live.
- Repo: [arcodange-org/cms](https://gitea.arcodange.lab/arcodange-org/cms).

View File

@@ -0,0 +1,116 @@
[vibe](../../README.md) > [Guidebooks](../README.md) > [CMS](README.md) > **Zoho email**
# Zoho email
> **Status:** ✅ Active
> **Last Updated:** 2026-06-23
> **Upstream:** [CMS](README.md) · [Cloudflare](cloudflare.md)
> **Downstream:** [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md)
> **Related:** [lab-ecosystem 03 · cms](../lab-ecosystem/03-cms.md) · [tofu CI flow](../factory-provisioning/opentofu/ci-apply-flow.md) · [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md) · [safe-env PRD](../../PRD/safe-prod-like-environment/README.md)
Email for **arcodange.fr** is hosted at **Zoho Mail (EU region)** and provisioned *entirely from OpenTofu*. There is no Zoho web-console click-ops in the steady state: the same `tofu apply` that owns the Cloudflare zone also drives the Zoho REST API to read the organization, publish the DNS records mail delivery depends on, and create one mailbox alias + one Inbox sub-folder per address. This page lives under [`cms/cloudflare/zoho/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho), a sub-module of the [Cloudflare](cloudflare.md) tofu root.
> [!CAUTION]
> **DNS/email changes here are high-stakes and slow to fail.** A wrong MX, SPF, DKIM, or DMARC record silently degrades or breaks `arcodange.fr` deliverability for **days** — receivers cache TTLs, reputation decays, and there is no synchronous error to catch in CI. DMARC is published as **`p=reject`**, so a broken SPF/DKIM alignment means conforming receivers *drop* legitimate mail outright rather than quarantine it. This is a prime motivation for the **safe environment**: changes to this module must be validated **plan-only against a throwaway/clone zone**, never iterated directly against the live `arcodange.fr` zone. See the [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md) and the [safe-env PRD](../../PRD/safe-prod-like-environment/README.md).
## How the module is wired
The Cloudflare root ([`cloudflare/iac.tf`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/iac.tf)) instantiates `module "zoho"`, passing it the live zone and domain plus the OAuth client credentials:
| Input | Source | Purpose |
|---|---|---|
| `domain_name` | `ovh_domain_name.arcodange_fr.domain_name` | the domain to manage (`arcodange.fr`) |
| `dns_zone_id` | `cloudflare_zone.arcodange_fr.id` | Cloudflare zone the DNS records land in |
| `zoho_client_id` | `var.ZOHO_CLIENT_ID` (Vault `kvv1/zoho/self_client`) | OAuth2 self-client id |
| `zoho_client_secret` | `var.ZOHO_CLIENT_SECRET` (Vault `kvv1/zoho/self_client`) | OAuth2 self-client secret |
In CI the secrets are injected by the `vault-action` step in [`.gitea/workflows/cloudflare.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/.gitea/workflows/cloudflare.yaml), which maps the whole `kvv1/zoho/self_client` KV-v1 secret into **both** the shell env (`ZOHO_*`, consumed by the helper scripts) and the tofu vars (`TF_VAR_ZOHO_*`, consumed by `config.tf`):
```
kvv1/zoho/self_client * | ZOHO_ ;
kvv1/zoho/self_client * | TF_VAR_ZOHO_ ;
```
## OAuth2: client-credentials flow
Zoho is a self-client (machine-to-machine) integration on the **EU** datacenter — every host is `*.zoho.eu` / `accounts.zoho.eu`. Authentication uses the OAuth2 **`client_credentials`** grant; there is no interactive user consent in the running flow (a commented device-code flow remains in [`.env`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/.env) as historical bootstrap).
The token is minted in [`config.tf`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho/config.tf) via a `data "http"` POST to `https://accounts.zoho.eu/oauth/v2/token` with `grant_type=client_credentials` and the comma-joined scope list. The bearer is then folded into an `Authorization: Zoho-oauthtoken <token>` header (`local.auth_headers`) reused by every subsequent read.
| Scope | Access | Why it is needed |
|---|---|---|
| `ZohoMail.partner.organization.READ` | READ (org) | resolve the org **ZOID** |
| `ZohoMail.organization.accounts.READ` | READ (accounts) | find the super-admin **account id / zuid** |
| `ZohoMail.organization.accounts.UPDATE` | UPDATE (accounts) | add / remove email aliases |
| `ZohoMail.organization.domains.READ` | READ (domains) | fetch the domain verification code + DKIM public key |
| `ZohoMail.folders.ALL` | ALL (folders) | list and create per-alias Inbox sub-folders |
Lookup chain (each step feeds the next):
1. `GET https://mail.zoho.eu/api/organization``local.org`, from which `zoid` builds `local.api_prefix = https://mail.zoho.eu/api/organization/<zoid>`.
2. `GET {api_prefix}/domains/{domain_name}` ([`dns.tf`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho/dns.tf)) → `local.domain`, exposing `CNAMEVerificationCode` and `dkimDetailList[0].publicKey`.
3. `GET {api_prefix}/accounts` ([`email_aliases.tf`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho/email_aliases.tf)) → the single `iamUserRole == "super_admin"` account, giving its `accountId` and `zuid`.
## DNS records published on the Cloudflare zone
[`modules/zoho_mail_dns`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho/modules/zoho_mail_dns) materialises every `cloudflare_dns_record` Zoho mail needs onto the live zone. The DKIM key and verification code are read live from the Zoho domain API (step 2 above) and passed in as module inputs, so the records always track what Zoho actually expects. All records use **TTL 3600** and apply to the apex (`@`) unless noted.
| Name | Type | Value | Purpose |
|---|---|---|---|
| `@` | TXT | `"zoho-verification=<CNAMEVerificationCode>.zmverify.zoho.eu"` | proves domain ownership to Zoho |
| `@` | MX | `mx.zoho.eu` (priority **10**) | primary inbound mail exchanger |
| `@` | MX | `mx2.zoho.eu` (priority **20**) | secondary mail exchanger |
| `@` | MX | `mx3.zoho.eu` (priority **50**) | tertiary mail exchanger |
| `@` | TXT | `"v=spf1 include:zohomail.eu ~all"` | SPF: authorise Zoho to send for the domain |
| `zmail._domainkey` | TXT | `"<dkim_public_key>"` (from `dkimDetailList[0].publicKey`) | DKIM public key for outbound signing |
| `_dmarc` | TXT | `"v=DMARC1; p=reject; rua=mailto:arcodange@gmail.com; ruf=mailto:arcodange@gmail.com; sp=reject; adkim=r; aspf=r; pct=100"` | DMARC policy: **reject** non-aligned mail, 100% coverage, aggregate+forensic reports to `arcodange@gmail.com` |
| `default._bimi` | TXT | `"v=BIMI1; l=https://arcodange.fr/.well-known/logo.svg; avp=brand;"` | BIMI: display the brand logo beside authenticated mail (created only when `bimi_logo_url != null`) |
> [!WARNING]
> The DMARC policy is the strictest tier: `p=reject` **and** `sp=reject` (subdomains) with relaxed alignment (`adkim=r`, `aspf=r`) and `pct=100`. There is no `quarantine` grace band — any message that fails both SPF *and* DKIM alignment is rejected by conforming receivers. Validate SPF/DKIM correctness in the safe environment before touching the live `_dmarc` or apex records.
## Email aliases
Seven addresses are defined as a single map in [`email_aliases.tf`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho/email_aliases.tf) (`local.email_aliases`). Each is provisioned **twice** against the super-admin mailbox: as an **email alias** on the account, and as a matching **Inbox sub-folder** so mail to that address can be filtered into its own folder.
| Alias (`@arcodange.fr`) | Display name | Purpose |
|---|---|---|
| `bonjour` | `Service Bonjour` | commercial / sales |
| `bureaux` | `Bureaux Arcodange` | official bodies (URSSAF, administration) |
| `contact` | `Premier Contact` | website contact form |
| `helloworld` | `✅ Arcodange 🏹💻🪽` | social networks, newsletter |
| `analytics` | `Analytics 📊🔍` | social networks, newsletter |
| `books` | `Accounting 📒🧮` | accounting / bookkeeping |
| `abonnements` | `Abonnements 📱🤖` | subscriptions (phone, AI, services) |
Provisioning is *imperative-inside-declarative*: each alias is a `terraform_data` resource whose `triggers_replace` watches whether the alias/folder is already present, and whose `local-exec` provisioners shell out to [`zoho_api_call.sh`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho/zoho_api_call.sh) on **create** and **destroy**:
1. **Alias create**`PUT {api_prefix}/accounts/{zuid}` with `mode=addEmailAlias`, scope `ZohoMail.organization.accounts.UPDATE`; fails fast if the response contains `OPERATION_NOT_PERMITTED`.
2. **Alias destroy** — same endpoint with `mode=deleteEmailAlias` (the bare local-part, split off the `alias:display` key).
3. **Folder create**`POST /api/accounts/{accountId}/folders` with `parentFolderId` = the resolved **Inbox** folder id, scope `ZohoMail.folders.ALL`.
4. **Folder destroy** — looks the folder id up by name, `DELETE`s it, then also sweeps the corresponding `/Trash/<name>` (or `/Trash/Inbox_<name>`) folder Zoho leaves behind.
> [!NOTE]
> `terraform_data` + `local-exec` is used because aliases and folders are Zoho-side mutations with no first-class Terraform provider. The `triggers_replace = { missing = !contains(...) }` guard makes the apply idempotent: the provisioner only re-runs when the alias/folder is genuinely absent, so a clean plan is a no-op rather than a re-create.
## Helper scripts
Both scripts live beside the tofu and are invoked from `local-exec`. They share the OAuth client env vars (`ZOHO_CLIENT_ID`, `ZOHO_CLIENT_SECRET`, `ZOHO_TOKEN_ENDPOINT`) injected from Vault.
| Script | Role |
|---|---|
| [`zoho_api_call.sh`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho/zoho_api_call.sh) | Thin HTTP wrapper. Parses `--endpoint`, `-x=<METHOD>`, `--scope`, `--data_json` / `--data_url`, and `--fail_if_str_in_resp`; sources `zoho_gen_token.sh`, attaches the bearer header, `curl`s the call, fails if a sentinel string (e.g. `OPERATION_NOT_PERMITTED`) appears, and emits compact JSON via `jq`. |
| [`zoho_gen_token.sh`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho/zoho_gen_token.sh) | OAuth token cache. `gen_zoho_token <scope>` returns a cached token from `/tmp/zoho_oauth_tokens.cache` when fresh, otherwise mints a new `client_credentials` token and stores it. |
`zoho_gen_token.sh` is **lock-based and TTL-bounded**:
- A mutex is taken by `mkdir /tmp/zoho_oauth_tokens.lock` (atomic dir creation), with up to 10 one-second retries, so concurrent `local-exec` provisioners don't corrupt the cache. The lock is released on every function exit via `trap`.
- Tokens are keyed by scope in `/tmp/zoho_oauth_tokens.cache` (file mode `600`). A token is reused only while younger than **3600 s (~1 h)**; `cleanup_cache` prunes expired entries on each call.
- The wrapper runs `cleanup_cache` before each request and re-traps it on `INT TERM EXIT`, so stale tokens never leak past their TTL.
## Cross-references
- **Parent tofu / zone & Pages:** [Cloudflare](cloudflare.md) — owns `cloudflare_zone.arcodange_fr` that this module writes records into, and the `vault-action` CI step that supplies the credentials.
- **Where these secrets come from:** [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) (`kvv1/zoho/self_client`).
- **How apply runs:** [tofu CI flow](../factory-provisioning/opentofu/ci-apply-flow.md).
- **Why a safe environment exists:** [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md) · [safe-env PRD](../../PRD/safe-prod-like-environment/README.md).

View File

@@ -5,6 +5,7 @@
> **Status:** ✅ Active
> **Last Updated:** 2026-06-23
> **Upstream:** [01 · factory](01-factory.md)
> **Deeper dive:** [Tools guidebook](../tools/README.md) — deploy model, component inventory, and per-component internals
> **Related:** [secrets-and-vault.md](secrets-and-vault.md) · [storage-and-recovery.md](storage-and-recovery.md)
The [`tools` repo](https://gitea.arcodange.lab/arcodange-org/tools) is deployed by factory's ArgoCD into the **`tools` namespace**. It is the platform layer that every app namespace depends on: secrets (Vault + VSO), observability (Prometheus + Grafana), edge security (CrowdSec), database pooling (pgbouncer / pgcat), caching (Redis/KeyDB), and analytics (Plausible + ClickHouse). Each component ships its own Helm chart or Kustomize overlay, and most carry an `iac/` directory of OpenTofu that declares the Vault config (roles, policies, dynamic-secret backends) that wires the component to secrets — see [secrets-and-vault.md](secrets-and-vault.md).
@@ -69,6 +70,7 @@ flowchart TB
## Cross-references
- [Tools guidebook](../tools/README.md) — the deeper dive: deploy model (one ArgoCD app → meta-chart → per-component Applications), full component inventory, and per-component internals.
- [Lab ecosystem hub](README.md) — the whole-lab map.
- [01 · factory](01-factory.md) — the ArgoCD that deploys this namespace, and the `postgres/iac/` roles + `user_lookup()` that pgbouncer consumes.
- [03 · cms](03-cms.md) — the public edge protected by **CrowdSec** (Turnstile → CrowdSec wiring).

View File

@@ -6,6 +6,7 @@
> **Last Updated:** 2026-06-23
> **Upstream:** [01 · factory](01-factory.md)
> **Related:** [02 · tools](02-tools.md) · [secrets-and-vault.md](secrets-and-vault.md)
> **Deeper dive:** [CMS guidebook](../cms/README.md)
The [`cms` repo](https://gitea.arcodange.lab/arcodange-org/cms) is the **public-facing site** of the lab: a Nuxt static site served at **`arcodange.fr`**, plus the OpenTofu that owns its Cloudflare edge and its Zoho email. It is the one app whose primary audience is the open Internet, so it ties together the public-DNS, tunnel, CAPTCHA, and email plumbing.
@@ -75,6 +76,7 @@ flowchart LR
## Cross-references
- [CMS guidebook](../cms/README.md) — the deeper-dive map of the `cms` repo: the Nuxt site, the Cloudflare edge, and Zoho email.
- [Lab ecosystem hub](README.md) — the whole-lab map.
- [01 · factory](01-factory.md) — the ArgoCD app `cms`, and `iac/cloudflare.tf` / `iac/ovh.tf` that grant the CMS its Cloudflare token and OVH nameserver-edit rights.
- [02 · tools](02-tools.md) — **CrowdSec** (the Traefik bouncer the Turnstile challenge feeds).

View File

@@ -0,0 +1,114 @@
[vibe](../../README.md) > [Guidebooks](../README.md) > **Tools**
# Tools
> **Status:** ✅ Active
> **Last Updated:** 2026-06-23
> **Upstream:** [Guidebooks index](../README.md) · [lab-ecosystem 02 · tools](../lab-ecosystem/02-tools.md)
> **Downstream:** [Components](components.md) · [Secrets & VSO](secrets-and-vso.md)
> **Related:** [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) · [tofu CI apply flow](../factory-provisioning/opentofu/ci-apply-flow.md) · [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md)
The [`tools` repo](https://gitea.arcodange.lab/arcodange-org/tools) is the lab's **platform layer**: the cluster-wide services every app namespace leans on — secrets (Vault + VSO), observability (Prometheus + Grafana), edge security (CrowdSec), database pooling (pgbouncer), caching (Redis/KeyDB), and analytics (Plausible + ClickHouse). Everything in this repo lands in the single **`tools` namespace**.
This hub explains the **deploy model** — how one factory-owned ArgoCD Application fans out into one Application per component — and gives a **component inventory**. For per-component internals see [Components](components.md); for how secrets reach the pods see [Secrets & VSO](secrets-and-vso.md).
## Deploy model
The whole repo is wired into the cluster through a single **meta-chart** that factory's ArgoCD points at:
1. Factory's ArgoCD declares **one** Application named `tools` whose source is this repo's [`chart/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart) meta-chart.
2. That meta-chart renders two kinds of object from [`chart/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/values.yaml):
- an **AppProject** named `tools` ([`chart/templates/project.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/templates/project.yaml)) that pins every child Application to `sourceRepos: tools` and `destinations: tools` namespace only;
- one ArgoCD **Application per component** ([`chart/templates/apps.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/templates/apps.yaml) — a `range` over `.Values.tools`), each pointing `path:` at the matching **top-level directory** of the repo (`path: pgbouncer`, `path: grafana`, …).
3. Each child Application targets `namespace: tools`, with `automated` sync (`prune: true`, `selfHeal: true`) and `CreateNamespace=true`.
4. A component directory is **either** a Helm chart (`Chart.yaml` whose `dependencies:` pull the upstream chart + the `tool` library) **or** a Kustomize overlay (`kustomization.yaml` using a `helmCharts:` inflation generator).
5. [`tool/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/tool) is a Helm **library chart** (`type: library`): it ships shared templates/helpers consumed by the component charts via `dependencies:` and is **not deployable** on its own.
> [!NOTE]
> A component is deployed **only if it appears as a key under `tools:` in [`chart/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/values.yaml)**. `pgcat` is present in the repo but commented out there, so no Application is rendered for it.
## Component inventory
| Component | How declared (chart + version OR Kustomize) | Ingress host | Persistence | Purpose |
|---|---|---|---|---|
| **hashicorp-vault** | Helm — `hashicorp/vault` `0.28.1` (+ `tool` lib) | `vault.arcodange.lab` (Traefik, Let's Encrypt) | `storage "file"` at `/vault/data` + audit storage (PVC) | Secrets engine: KV, transit, PostgreSQL dynamic creds; auth `kubernetes` + Gitea OIDC/JWT |
| **vault-secrets-operator (VSO)** | Helm — `hashicorp/vault-secrets-operator` `0.9.0`, a dependency of the `hashicorp-vault` chart | — | — | Injects Vault secrets into pods via `VaultAuth` / `VaultDynamicSecret` CRDs; client-cache `direct-encrypted` via transit |
| **prometheus** | Helm — `prometheus-community/prometheus` `28.13.0` (app `v3.10.0`) | none (in-cluster) | `persistentVolume` enabled, `8Gi` | Metrics scraping + TSDB storage |
| **grafana** | Helm — `grafana/grafana` `10.3.0` (+ `tool` lib) | `grafana.arcodange.lab` (Traefik, Let's Encrypt) | `persistence.enabled: false` (ephemeral; dashboards provisioned) | Dashboards; datasources Prometheus + ClickHouse |
| **crowdsec** | Helm — `crowdsecurity/crowdsec` `0.20.1` (+ `tool` lib) | none (Traefik bouncer + AppSec on the edge) | LAPI state in external PostgreSQL (via pgbouncer) | Behavioural detection; agent parses Traefik logs, AppSec virtual-patching |
| **pgbouncer** | Helm — `icoretech/pgbouncer` `2.3.1` (+ `tool` lib) | none (cluster service `pgbouncer.tools`) | stateless (config only) | Connection pooler to the **external** PostgreSQL on `pi2` (`192.168.1.202`), pinned via `kubernetes.io/hostname: pi2` |
| **redis / KeyDB** | Helm — `pascaliske/redis` `2.1.0` (+ `tool` lib) | none (cluster service) | PVC `create: true`, `1Gi` at `/data` | In-memory cache; KeyDB master + replica, Redis-compatible |
| **plausible** | **Kustomize** — inflates `pascaliske/plausible` `2.0.0` | `analytics.arcodange.lab` (Traefik `IngressRoute`, Let's Encrypt) | stateless app; data lives in ClickHouse | Privacy-friendly web analytics; `DB_HOST: pgbouncer.tools` |
| **clickhouse** | **Kustomize** — inflates `pascaliske/clickhouse` `0.4.0` + local `databases` chart | none (cluster service) | PVC `16Gi` (StatefulSet) | OLAP column store backing Plausible |
| **pgcat** *(disabled)* | Helm — `improwised/pgcat` `0.1.0`**commented out** in `chart/values.yaml` | — | — | Alternative pooler; not rendered (too constraining: must list every db/user, md5-only auth) |
| **tool** *(library)* | Helm **library chart** (`type: library`), not deployable | — | — | Shared templates/helpers consumed by the component charts |
## How tools fit together
```mermaid
%%{init: {'theme': 'base'}}%%
flowchart TB
classDef ext fill:#7c3aed,stroke:#6d28d9,color:#fff
classDef proc fill:#059669,stroke:#047857,color:#fff
classDef edge fill:#d97706,stroke:#b45309,color:#fff
classDef meta fill:#2563eb,stroke:#1e40af,color:#fff
ARGOCD["factory ArgoCD<br>Application: tools"]:::meta
META["tools meta-chart<br>chart/ (apps.yaml + project.yaml)"]:::meta
PROJ["AppProject: tools"]:::meta
subgraph NS["tools namespace"]
VAULT[("hashicorp-vault<br>+ VSO")]:::ext
PROM["prometheus"]:::proc
GRAF["grafana"]:::proc
CS["crowdsec<br>Traefik bouncer + AppSec"]:::edge
PGB["pgbouncer"]:::proc
REDIS[("redis / KeyDB")]:::ext
PLA["plausible"]:::proc
CH[("clickhouse")]:::ext
PODS["app + tool pods"]:::proc
end
PG[("external PostgreSQL<br>pi2 · 192.168.1.202")]:::ext
TRAEFIK["Traefik ingress<br>vault / grafana / analytics .arcodange.lab"]:::edge
ARGOCD --> META
META --> PROJ
META -- "one Application per component" --> NS
VAULT -- "inject secrets (VSO)" --> PODS
PGB -- "pools to" --> PG
PLA -- "writes analytics" --> CH
PROM --> GRAF
CH --> GRAF
TRAEFIK --> VAULT
TRAEFIK --> GRAF
TRAEFIK --> PLA
CS -- "fronts the edge" --> TRAEFIK
```
1. **Factory's ArgoCD** owns a single Application named `tools` pointed at this repo's `chart/` meta-chart.
2. The **meta-chart** renders the `tools` **AppProject** (which scopes every child to the `tools` repo + `tools` namespace) and **one Application per component** listed under `tools:` in `chart/values.yaml`.
3. Every child Application deploys into the **`tools` namespace** — Vault+VSO, Prometheus, Grafana, CrowdSec, pgbouncer, Redis/KeyDB, Plausible, ClickHouse.
4. **Vault + VSO** inject secrets into app and tool pods via the `VaultAuth` / `VaultDynamicSecret` CRDs.
5. **pgbouncer** pools connections out to the **external PostgreSQL** on `pi2` (`192.168.1.202`), the same database CrowdSec's LAPI and Plausible use through it.
6. **Plausible** writes analytics into **ClickHouse**; both **Prometheus** and **ClickHouse** are wired as **Grafana** datasources.
7. **Traefik** publishes `vault.arcodange.lab`, `grafana.arcodange.lab`, and `analytics.arcodange.lab` over Let's Encrypt, with **CrowdSec** running as the bouncer/AppSec layer fronting that edge.
## Pages in this guidebook
| Page | What it covers | Status |
|---|---|---|
| [Components](components.md) | Per-component internals: chart values, ingress, persistence, how each gets its secrets | ✅ Active |
| [Secrets & VSO](secrets-and-vso.md) | How Vault + the Vault Secrets Operator deliver static and dynamic secrets into `tools` pods | ✅ Active |
## Maintenance rule
> [!IMPORTANT]
> **If a component in the `tools` repo changes, update this guidebook in the same change.** Adding or removing a key under `tools:` in `chart/values.yaml`, bumping an upstream chart version, switching a component between Helm and Kustomize, or changing an ingress host or persistence size all alter the inventory above — keep the table and the diagram in sync as part of the same PR. A reference map that drifts from reality sends readers (and agents) confidently down dead paths.
## Cross-references
- [lab-ecosystem 02 · tools](../lab-ecosystem/02-tools.md) — the parent whole-lab view of this namespace.
- [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) — the lab-wide Vault model these services depend on.
- [tofu CI apply flow](../factory-provisioning/opentofu/ci-apply-flow.md) — how each component's `iac/` (Vault config) is applied.
- [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md) — why a safe, prod-like environment shapes how these platform services are run.

View File

@@ -0,0 +1,218 @@
[vibe](../../README.md) > [Guidebooks](../README.md) > [Tools](README.md) > **Components**
# Components
> **Status:** ✅ Active
> **Last Updated:** 2026-06-23
> **Upstream:** [Tools hub](README.md) · [lab-ecosystem 02 · tools](../lab-ecosystem/02-tools.md)
> **Downstream:** [Secrets & VSO](secrets-and-vso.md)
> **Related:** [storage & recovery concept](../lab-ecosystem/storage-and-recovery.md) · [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) · [naming conventions](../lab-ecosystem/naming-conventions.md)
This is the **per-component reference** for the `tools` platform layer: pinned chart/app versions, the values that actually matter (replicas, storage, ports, auth), and the cross-service wiring. Every component lands in the single **`tools` namespace**. For the deploy model (how one ArgoCD Application fans out into one per component) see the [Tools hub](README.md); for how Vault secrets reach the pods see [Secrets & VSO](secrets-and-vso.md).
Components split into two **tiers**:
- **Tier 1** — the load-bearing services, each with its own subsection and value tables below.
- **Tier 2** — supporting / inactive pieces, summarised in a single table.
Severity legend (GitHub alerts): `[!NOTE]` informational · `[!TIP]` good-to-know · `[!WARNING]` operational hazard · `[!CAUTION]` live risk.
---
## Tier 1 — load-bearing services
### hashicorp-vault
[`hashicorp-vault/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault) — the lab's secrets brain. The chart bundles **three** dependencies: the upstream `vault` server, the `vault-secrets-operator` (VSO) that injects secrets into pods, and the shared `tool` library chart.
| Key | Value |
|---|---|
| Chart deps | `vault` `0.28.1`, `vault-secrets-operator` `0.9.0`, `tool` `0.1.0` |
| Mode | `standalone` (single instance, **not** HA / raft) |
| Storage | `storage "file"` at `/vault/data` + audit storage enabled |
| Listener | TLS **off** (`tls_disable = 1`) on `[::]:8200` — terminated at the edge |
| Ingress | `vault.arcodange.lab` (Traefik `websecure`, Let's Encrypt, `localIp@file` middleware) |
| UI | enabled (`ui = true`) |
| Log level | `trace` |
**Mounts (secret engines) exposed:**
| Mount | Type | Purpose |
|---|---|---|
| `kvv1` | KV v1 | Static secrets (legacy / v1 layout) |
| `kvv2` | KV v2 | Versioned static secrets (primary store) |
| `transit` | transit | Encryption-as-a-service; backs VSO client-cache (`vso-client-cache` key) |
| `postgres` | database | Dynamic PostgreSQL credentials (connection via `pgbouncer.tools:5432`) |
**Auth methods enabled:**
| Method | Used by |
|---|---|
| `kubernetes` | In-cluster workloads (VSO, app ServiceAccounts) authenticate by SA token |
| `gitea_jwt` | Gitea Actions / OIDC-JWT pipelines authenticate from CI |
> [!NOTE]
> The full secret-engine layout, VSO `VaultAuth` / `VaultConnection` / `VaultDynamicSecret` wiring, and the `kvv2/data/...` path conventions are documented in [Secrets & VSO](secrets-and-vso.md) — this page only inventories what the chart stands up.
The VSO sub-chart ships a `defaultVaultConnection` pointing at `http://hashicorp-vault.tools.svc.cluster.local:8200` and a client cache with `persistenceModel: direct-encrypted`, encrypted through the `transit` mount.
### prometheus
[`prometheus/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/prometheus) — metrics collection and TSDB, via the `kube-prometheus`-style community chart.
| Key | Value |
|---|---|
| Chart deps | `prometheus` `28.13.0` (app `v3.10.0`), `tool` `0.1.0` |
| Server replicas | `1` (Deployment, `strategy: Recreate`) |
| Server storage | `persistentVolume` enabled, **8Gi** at `/data` (`ReadWriteOnce`) |
| Retention | `15d` |
| Alertmanager | enabled, persistence **2Gi** (`ReadWriteOnce`) |
| node-exporter | enabled (DaemonSet, `prometheus-node-exporter` sub-chart) |
| kube-state-metrics | enabled |
| pushgateway | enabled (`prometheus.io/probe: pushgateway`) |
| Scrape / eval interval | `1m` (scrape timeout `10s`) |
| Ingress | none — **internal only** |
**Scrape targets** (default `scrapeConfigs`, all enabled): the Prometheus server itself, the Kubernetes API servers, nodes + kubelet cadvisor, plus **annotation-based** service-endpoint and pod discovery (`prometheus.io/scrape`, `prometheus.io/port`, `prometheus.io/path`, `prometheus.io/scheme`), with `*-slow` (5m) variants for cheaper targets.
### grafana
[`grafana/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/grafana) — dashboards over Prometheus and ClickHouse.
| Key | Value |
|---|---|
| Chart deps | `grafana` `10.3.0` (app `latest`), `tool` `0.1.0` |
| Replicas | `1` (Deployment, `RollingUpdate`) |
| Persistence | **disabled** — ephemeral; dashboards/datasources are provisioned at boot |
| Ingress | `grafana.arcodange.lab` (Traefik `websecure`, Let's Encrypt, `localIp@file` middleware) |
| Plugin | `grafana-clickhouse-datasource` |
| Resources | requests `100m` / `128Mi`, limits `100m` / `512Mi` |
| Timezone | `Europe/Paris` |
**Datasources (provisioned):**
| Name | Type | Target | Default |
|---|---|---|---|
| Prometheus | `prometheus` | `http://prometheus-server.tools.svc.cluster.local` | ✅ yes |
| clickhouse | `grafana-clickhouse-datasource` | `clickhouse.tools.svc.cluster.local:9000` (native, `tlsSkipVerify`) | no |
> [!WARNING]
> The Grafana **admin password is static and committed** in [`grafana/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/grafana/values.yaml) (`adminUser: admin`). The provisioned ClickHouse datasource password is committed there too (`secureJsonData.password`). Treat these as lab-only credentials; do not reuse them outside the homelab.
### crowdsec
[`crowdsec/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec) — behavioural edge security that feeds a Traefik blocklist.
| Key | Value |
|---|---|
| Chart deps | `crowdsec` `0.20.1`, `tool` `0.1.0` |
| LAPI | Deployment (`RollingUpdate`, `maxUnavailable: 0`) — the local API + decision store |
| Agent | DaemonSet pinned to control-plane nodes (`node-role.kubernetes.io/control-plane`) |
| Log source | parses **Traefik** pod logs in `kube-system` (`podName: traefik-*`, `program: traefik`) |
| Collections | `crowdsecurity/traefik`, `crowdsecurity/http-cve` (+ AppSec rules below) |
| AppSec (WAF) | **enabled**`crowdsecurity/appsec-default` on `0.0.0.0:7422`; collections `appsec-virtual-patching` + `appsec-generic-rules` |
| Database | external PostgreSQL `crowdsec` via **pgbouncer** (`host: pgbouncer.tools:5432`, `type: postgresql`) |
| DB credentials | dynamic, from secret `crowdsec-db-credentials` (`DB_USER` / `DB_PASSWORD`, sourced via VSO) |
| Console | enrolled as instance `homelab` |
The decisions CrowdSec produces are surfaced as a **Traefik middleware blocklist applied at the edge**, so malicious IPs are dropped before they reach app namespaces. `server_reset_query: DEALLOCATE ALL` on pgbouncer (below) exists specifically to keep CrowdSec's prepared statements happy through the pooler. The CAPTCHA challenge CrowdSec serves on remediated requests is a **Cloudflare Turnstile widget minted by the `cms` repo** — see the [CMS Cloudflare page](../cms/cloudflare.md), which produces the sitekey/secret this bouncer consumes from Vault.
### pgbouncer
[`pgbouncer/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/pgbouncer) — the connection pooler in front of the **external** PostgreSQL.
| Key | Value |
|---|---|
| Chart deps | `pgbouncer` `2.3.1` (`icoretech/pgbouncer`), `tool` `0.1.0` |
| Scheduling | `nodeSelector: kubernetes.io/hostname: pi2` (co-located with PostgreSQL) |
| Upstream DB | external PostgreSQL at `192.168.1.202:5432` (the `pi2` host), wildcard database `"*"` |
| Auth type | `scram-sha-256` |
| `auth_query` | `SELECT uname, phash FROM user_lookup($1)` |
| `server_reset_query` | `DEALLOCATE ALL` (clears prepared statements — fixes CrowdSec re-use) |
| `server_idle_timeout` | `7200` (2h) |
| `ignore_startup_parameters` | `extra_float_digits` (unsupported JDBC arg) |
| Exporter | disabled |
| Service | `pgbouncer.tools:5432` (cluster-internal) |
> [!NOTE]
> pgbouncer is the single front door to the lab's PostgreSQL: CrowdSec, Plausible, and Vault's `postgres` dynamic-secret backend all connect through `pgbouncer.tools:5432`, never to `192.168.1.202` directly.
### redis (KeyDB)
[`redis/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/redis) — the in-memory cache / session store. The chart targets **KeyDB** (EqAlpha, Redis-compatible), tuned for the 2× Raspberry Pi 5 nodes.
| Key | Value |
|---|---|
| Chart deps | `redis` `2.1.0` (`pascaliske/redis`), `tool` `0.1.0` |
| Workload | **StatefulSet** (master at index 0, replica running `replicaof` the master) |
| Storage | PVC `create: true`, **1Gi** at `/data` (`ReadWriteOnce`) |
| Tuning | `server-threads 4` (ARM-tuned for the Pi 5 cores) |
| Port | `6379` (`ClusterIP`) |
| Security | `runAsUser/Group/fsGroup: 999`, non-root |
| Timezone | `Europe/Paris` |
> [!NOTE]
> Access the instance for inspection with `kubectl port-forward -n tools svc/redis 6379:6379` and Redis Insights (per the [chart README](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/redis/README.md)).
### plausible
[`plausible/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible) — privacy-friendly web analytics. Deployed via a **Kustomize** overlay that inflates the upstream Helm chart (not a `Chart.yaml` dependency like the Tier-1 charts above).
| Key | Value |
|---|---|
| Declared via | Kustomize `helmCharts:` inflation generator |
| Chart / version | `plausible` `2.0.0` (`pascaliske/plausible`), image `ghcr.io/plausible/community-edition` |
| Replicas | `1` (Deployment) |
| Ingress | `analytics.arcodange.lab` (Traefik IngressRoute, Let's Encrypt, `localIp@file` middleware) |
| App DB | PostgreSQL via **pgbouncer** — an **init container** assembles `DATABASE_URL` from VSO dynamic creds |
| Event store | **ClickHouse** (see below) |
| GeoIP | MaxMind **GeoLite2** (`GeoLite2-Country` + `GeoLite2-City`), license key from secret `plausible-geoip` |
| Secrets | `SECRET_KEY_BASE` / `TOTP_VAULT_KEY` from existing secret `plausible-config` (VSO-fed) |
Plausible writes analytics events to ClickHouse and stores app/account state in PostgreSQL — two distinct backends, both reached through lab-internal services.
### clickhouse
[`clickhouse/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/clickhouse) — the OLAP column store behind Plausible. Also a **Kustomize** overlay inflating the upstream chart, plus a `databases` sub-chart that runs an init job.
| Key | Value |
|---|---|
| Declared via | Kustomize `helmCharts:` inflation generator (`chartHome: charts`) |
| Chart / version | `clickhouse` `0.4.0` (`pascaliske/clickhouse`), image `clickhouse/clickhouse-server` |
| Workload | **StatefulSet**, `replicas: 1` |
| Storage | PVC **16Gi** at `/var/lib/clickhouse` (`ReadWriteOnce`) |
| Ports | `8123` (HTTP), `9000` (native protocol) |
| Custom user | `arcodange` (full network access, `access_management: 1`) via `custom-users.xml` |
| Security | `runAsUser/Group/fsGroup: 101`, non-root |
| Timezone | `Europe/Paris` |
> [!WARNING]
> The ClickHouse `arcodange` user password is **static and committed** in [`clickhouse/clickhouseValues.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/clickhouse/clickhouseValues.yaml) (`custom-users.xml`). The same value appears in Grafana's provisioned datasource — keep the two in sync if you rotate it.
> [!CAUTION]
> ClickHouse carries a `nodeAffinity` that **excludes `pi2`** (`kubernetes.io/hostname NotIn [pi2]`). `pi2` hosts PostgreSQL and pgbouncer; ClickHouse is deliberately kept off it to avoid I/O contention on that node. A cluster where `pi2` is the only schedulable node will leave ClickHouse `Pending`.
---
## Tier 2 — supporting & inactive
| Component | Status | Notes |
|---|---|---|
| [`pgcat/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/pgcat) | ❌ disabled | Alternative Postgres pooler (`pgcat` chart `0.1.0`). Not in service — its sole pool has empty `username`/`password`/`database` placeholders, and it is **not** keyed under `tools:` in [`chart/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/values.yaml), so ArgoCD renders no Application for it. [pgbouncer](#pgbouncer) is the active pooler. |
| [`tool/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/tool) | ✅ active (library) | Helm **library chart** (`type: library`, version `0.1.0`) consumed by **every** component chart via `dependencies:`. Ships shared templates/helpers; **not deployable** on its own. |
---
## Gotchas
> [!WARNING]
> **No high availability.** Every Tier-1 service runs a **single replica** — Vault (`standalone`), Prometheus (`replicaCount: 1`), Grafana (`replicas: 1`), ClickHouse and Redis/KeyDB StatefulSets (`replicas: 1`), Plausible and the CrowdSec LAPI (single Deployment). Any node drain or pod restart is a brief outage for that service, not a failover.
> [!WARNING]
> **Static, committed passwords.** Grafana admin (+ its ClickHouse datasource), the ClickHouse `arcodange` user, and the pgbouncer admin/auth users all carry plaintext credentials in their `values.yaml`. They are lab-only; rotate before any exposure and never copy them to a real environment.
> [!CAUTION]
> **ClickHouse must avoid `pi2`.** The `NotIn [pi2]` `nodeAffinity` keeps it off the PostgreSQL/pgbouncer host. If `pi2` is the only schedulable node, ClickHouse (and therefore Plausible analytics) stays `Pending`. See the [storage & recovery concept](../lab-ecosystem/storage-and-recovery.md) for how PVC-backed services map onto specific nodes.
> [!CAUTION]
> **Vault is single-instance and starts sealed.** After **any** restart (pod reschedule, node reboot, chart upgrade) Vault comes up **sealed** with no automatic unseal configured — every VSO injection and dynamic-secret lease blocks until an operator unseals it. This is the first thing to check when secrets stop flowing across the cluster; the unseal procedure lives in [Secrets & VSO](secrets-and-vso.md).

View File

@@ -0,0 +1,234 @@
[vibe](../../README.md) > [Guidebooks](../README.md) > [Tools](README.md) > **Secrets & VSO**
# Tools — Secrets & VSO
> **Status:** ✅ Active
> **Last Updated:** 2026-06-23
> **Upstream:** [Tools](README.md) · [Components](components.md)
> **Downstream:** consumed by every `tools`-namespace pod and by every app's CI/CD
> **Related:** [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) · [naming-conventions concept](../lab-ecosystem/naming-conventions.md) · [storage-and-recovery](../lab-ecosystem/storage-and-recovery.md) · [tofu CI apply flow](../factory-provisioning/opentofu/ci-apply-flow.md) · [postgres IaC](../factory-provisioning/opentofu/postgres-iac.md) · [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md)
This page maps how secrets live in **HashiCorp Vault** (engines, auth backends) and how they reach **Kubernetes pods** via the **Vault Secrets Operator (VSO)**. The keystone is the **`app_policy` + `app_roles` module pair**: the machinery that turns a single `<app>` name into a matched set of Vault policies, roles, and CI identities — the same `<app>` join key documented in the [naming-conventions concept](../lab-ecosystem/naming-conventions.md).
Vault itself runs as a component in the `tools` namespace; see the [Components](components.md) page for its deploy shape. The admin/bootstrap layer (the `kvv1` engine, the `gitea_jwt` auth backend, the base `gitea_cicd` role, the Kubernetes auth backend mount) is created **by factory's Ansible-managed Vault Terraform** in [`hashicorp_vault.tf`](https://gitea.arcodange.lab/arcodange-org/factory/src/branch/main/ansible/arcodange/factory/playbooks/tools/roles/hashicorp_vault/files/hashicorp_vault.tf); everything in this page that is *per-app* is created by the IaC under [`hashicorp-vault/iac`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac).
> [!CAUTION]
> Vault runs **standalone** with file/raft storage and starts **sealed** after any restart or node reboot. Until it is unsealed, every VSO read fails and no app can fetch DB creds or config — pods that depend on a `VaultDynamicSecret` will not start. Unseal procedure and key custody live in [storage-and-recovery](../lab-ecosystem/storage-and-recovery.md).
---
## 1) Vault engines & auth backends
All engines below are mounted by [`hashicorp-vault/iac/main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf) except `kvv1`, which is bootstrapped by factory's Ansible Vault Terraform.
| Mount | Type | Holds | Defined in |
|---|---|---|---|
| `kvv1` | KV **v1** | Admin / cloud secrets: `kvv1/google/credentials`, `kvv1/gitea/*`, `kvv1/cloudflare/*`, `kvv1/ovh/*`, `kvv1/postgres/credentials`, `kvv1/admin/*` | factory [`hashicorp_vault.tf`](https://gitea.arcodange.lab/arcodange-org/factory/src/branch/main/ansible/arcodange/factory/playbooks/tools/roles/hashicorp_vault/files/hashicorp_vault.tf) |
| `kvv2` | KV **v2** (versioned) | Per-app config secrets under `kvv2/<app>/*` | [`main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf) |
| `transit` | transit | The **VSO client-cache encryption key** `vso-client-cache` — lets VSO persist its client cache encrypted so it survives an operator restart without re-auth storms | [`main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf) |
| `postgres` | database | **Dynamic** Postgres creds at `postgres/creds/<app>`; connects to the DB through `pgbouncer.tools:5432` using the `credentials_editor` root account | [`main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf) |
The `postgres` connection is configured with `allowed_roles = ["*"]` and a root-rotation statement (`ALTER USER … WITH PASSWORD`); the editor username/password come from the sensitive `POSTGRES_CREDENTIALS_EDITOR_*` variables.
### Auth backends
| Backend | Mount | Who uses it | Role(s) |
|---|---|---|---|
| `kubernetes` | `kubernetes` | VSO controller + every app pod's ServiceAccount | `vault-secret-operator` (VSO itself), `<app>` (one per app), `factory_crowdsec_conf` |
| `gitea_jwt` | `gitea_jwt` | CI/OpenTofu jobs running in Gitea Actions | `gitea_cicd` (base, factory-bootstrapped) + per-app `gitea_cicd_<app>` |
- **`kubernetes`** auth ([`main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf)) is configured against `https://kubernetes.default.svc:443`. The VSO role `vault-secret-operator` binds SA `hashicorp-vault-vault-secrets-operator-controller-manager` in ns `tools`, `audience = vault`, and carries the `edit-vso-client-cache` policy (encrypt/decrypt on `transit/.../vso-client-cache`).
- **`gitea_jwt`** is the OIDC/JWT backend for CI. Its backend, `default_role = gitea_cicd`, and the base `gitea_cicd` role are created by factory's Vault bootstrap; the Vault provider in each IaC project logs in via `auth_login_jwt { mount = "gitea_jwt", role = "gitea_cicd[_<app>]" }` using the `TERRAFORM_VAULT_AUTH_JWT` env var. See the [tofu CI apply flow](../factory-provisioning/opentofu/ci-apply-flow.md) for how the token is minted in the pipeline.
### Terraform state
Each IaC project keeps its state in the **`arcodange-tf` GCS bucket** under a distinct prefix:
| Project | GCS prefix |
|---|---|
| Vault admin/app machinery | `tools/hashicorp_vault/main` |
| Plausible | `tools/plausible/main` |
| CrowdSec | `tools/crowdsec/main` |
---
## 2) The `app_policy` + `app_roles` modules — the `<app>` join-key machinery
> [!IMPORTANT]
> These two modules are the heart of the secrets layer. Given a single `<app>` name they emit a **matched, name-derived** set of Vault objects so that an app's runtime, its CI, and its database identity all line up on the same key. This is the Vault half of the lab-wide [naming convention](../lab-ecosystem/naming-conventions.md): the same `<app>` string also names the Kubernetes namespace, the ServiceAccount, the Postgres `<app>_role`, and the Gitea repo.
The two modules live on **opposite sides of the trust boundary**:
- [`modules/app_policy`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/modules/app_policy) is declared **once, centrally**, in the Vault admin project ([`main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf), `for_each` over `var.applications`). It creates the **policies and the CI identity** — the privileged bits — so the app's own repo never holds them.
- [`modules/app_roles`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/modules/app_roles) is declared **by the subordinate app project** (pulled over SSH as a Git module), running under the `<app>`-ops policy. It creates the **roles** the app needs.
### `app_roles` — runtime roles (declared by the app repo)
For `<app>`, [`app_roles/main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/modules/app_roles/main.tf) creates:
| Resource | Path | Key settings |
|---|---|---|
| Kubernetes auth role | `auth/kubernetes/role/<app>` | `bound_service_account_names = [<app>] + extras`, `bound_service_account_namespaces = [<app>] + extras`, `token_ttl = 3600` (1h), `token_policies = [default, <app>]`, `audience = vault` |
| Postgres dynamic role | `postgres/roles/<app>` | `db_name = postgres`; creation SQL: `CREATE ROLE "{{name}}" WITH LOGIN PASSWORD … VALID UNTIL …` then `GRANT <app>_role TO "{{name}}"`; revocation: `REASSIGN OWNED BY "{{name}}" TO <app>_role` then `REVOKE ALL ON DATABASE <app> FROM "{{name}}"` |
> [!IMPORTANT]
> The Postgres dynamic role's creation SQL does `GRANT <app>_role TO {{name}}` and its revocation does `REASSIGN OWNED BY {{name}} TO <app>_role`. **The non-login `<app>_role` must already exist in Postgres** — it is created by factory's [postgres IaC](../factory-provisioning/opentofu/postgres-iac.md) (`postgresql_role.app_role["<app>"]`, owner of the `<app>` database). If that role is missing, every ephemeral-user creation/revocation fails. This is the ordering dependency between the two repos: **factory postgres/iac before tools app_roles**.
> [!NOTE]
> The Kubernetes auth role binds **both** SA names **and** namespaces — the check is an **AND**. A token presenting SA `<app>` from the wrong namespace (or any other SA from ns `<app>`) is rejected. The default binding is SA `<app>` in ns `<app>`; the `service_account_names` / `service_account_namespaces` inputs widen it (e.g. CrowdSec/Plausible run in ns `tools`, not a namespace named after the app).
The Postgres role can be skipped with `disable_database = true`; the DB name defaults to `<app>` but can be overridden via `database`.
### `app_policy` — policies + CI identity (declared centrally)
For `<app>`, [`app_policy/main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/modules/app_policy/main.tf) creates:
| Resource | Name | Grants |
|---|---|---|
| **App policy** | `<app>` | `read,list` on `kvv2/data/<app>/*`; `read` on `postgres/creds/<app>*` — what the runtime pod can do |
| **Ops policy** | `<app>-ops` | The CI bundle (below) |
| **JWT role** | `gitea_cicd_<app>` (mount `gitea_jwt`) | `token_policies = [default] + <app>'s ops_policies`, `bound_audiences = [gitea_app_id]`, `user_claim = email`, `role_type = jwt` |
| **Identity group** | `<app>-ops` | Internal group carrying the `<app>-ops` policy, so Vault users mapped to their Gitea entity inherit ops rights |
The **`<app>-ops` policy** is the privilege set a CI job needs to *manage* the app's own corner of Vault and the clouds:
- `create/update` on `auth/token/create`; `read` on `sys/mounts/auth/*` (so the Vault provider works);
- full CRUD on `postgres/roles/<app>*` and on `auth/kubernetes/role/<app>*` (so `app_roles` can apply) — the k8s-role rule is **parameter-constrained**: it may only set `bound_service_account_names`/`bound_service_account_namespaces` to the whitelisted `[<app>] + extras` lists and `token_policies` to `["default","<app>"]`, preventing a CI job from minting a role with broader bindings;
- full CRUD on the app's KV-v2 data, delete/undelete/destroy, and `metadata` (`kvv2/data|delete|undelete|destroy|metadata/<app>/*`);
- `read` on `kvv1/google/credentials` (the GCS backend SA), `kvv1/gitea/tofu_module_reader` (the bot SSH key that lets CI pull the `app_roles` Git module);
- CRUD on `kvv1/cloudflare/<app>*` and `kvv1/ovh/<app>*` (cloud DNS/edge secrets scoped to the app).
> [!NOTE]
> The policy document is post-processed with two `replace()` calls. The Vault provider serializes the whitelisted list parameters as a JSON-encoded string (`"["webapp"]"`); the replaces strip the outer quotes so Vault receives a real list. If you change those `allowed_parameter` blocks, keep the replaces in sync.
### Apps wired in `terraform.tfvars`
[`terraform.tfvars`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/terraform.tfvars) declares the `applications` set the central `app_policy` `for_each` walks:
| `<app>` | Extra SA | Extra ns | Extra ops policy | Notes |
|---|---|---|---|---|
| `webapp` | — | — | — | defaults: SA `webapp` / ns `webapp` |
| `erp` | — | — | — | defaults |
| `cms` | `cloudflared` | — | `factory__cf_r2_arcodange_tf` | extra SA for the Cloudflare tunnel; extra ops policy for the CF R2 Terraform-state bucket |
| `crowdsec` | — | `tools` | — | runs in ns `tools` |
| `plausible` | — | `tools` | — | runs in ns `tools` |
> [!NOTE]
> `terraform.tfvars` uses the key `ops_policies` for the CMS extra policy while `variables.tf` declares the optional attribute as `policies`; the central `main.tf` passes `each.value.policies` into the module's `ops_policies` input. Read these together when adding a new app so the extra-policy list actually lands on the JWT role.
---
## 3) VSO CRDs — how a secret becomes a Kubernetes Secret
The [Vault Secrets Operator](https://developer.hashicorp.com/vault/docs/platform/k8s/vso) watches three custom resources and writes plain Kubernetes `Secret` objects that pods consume normally (env / volume). The app repo ships the CRDs; the operator does the Vault round-trips.
| CRD | What it does | Refresh / rotation |
|---|---|---|
| `VaultAuth` | Picks the auth method (`kubernetes`), the `mount`, the Vault `role` (= `<app>`), and the pod **ServiceAccount** (= `<app>`) used to log in; references a `VaultConnection` (here the in-cluster `default``http://hashicorp-vault.tools.svc.cluster.local:8200`) | n/a — used by the other two CRDs via `vaultAuthRef` |
| `VaultStaticSecret` | Reads a **KV-v2** path → writes a k8s `Secret` | `refreshAfter` (the lab uses `30s`) |
| `VaultDynamicSecret` | Reads `postgres/creds/<app>` (a **dynamic** lease) → writes a k8s `Secret`; `rolloutRestartTargets` lists Deployments to restart when creds rotate | follows the Vault lease TTL (1h); VSO renews/re-issues and restarts the targets |
### Worked example — Plausible (`tools` namespace)
Files under [`plausible/resources`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/resources):
1. **`VaultAuth` `plausible`** ([`vaultauth.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/resources/vaultauth.yaml)) — `method: kubernetes`, `role: plausible`, `serviceAccount: plausible`, `audiences: [vault]`. This is the Vault role `app_roles` created in [`plausible/iac/main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/iac/main.tf).
2. **`VaultStaticSecret` `plausible`** ([`vaultsecret.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/resources/vaultsecret.yaml)) — `kvv2` path `plausible/config` → Secret `plausible-config` (`refreshAfter: 30s`). The config payload holds **`SECRET_KEY_BASE`** and **`TOTP_VAULT_KEY`**, both **generated by Terraform** (`random_password`, base64-encoded) and written to `kvv2/plausible/config` via `vault_kv_secret_v2` in the plausible IaC.
3. **`VaultStaticSecret` `plausible-geoip`** ([`geoipsecret.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/resources/geoipsecret.yaml)) — `kvv2` path `plausible/geoip` → Secret `plausible-geoip` exposing **`LICENSE_KEY`** (the MaxMind GeoIP license, an admin-seeded value, fed to the `geoipupdate` sidecar via env `GEOIPUPDATE_LICENSE_KEY`).
4. **`VaultDynamicSecret` `plausible-db-credentials`** ([`vaultdynamicsecret.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/resources/vaultdynamicsecret.yaml)) — `postgres/creds/plausible` → Secret `plausible-db-credentials`; `rolloutRestartTargets` restarts Deployment `plausible`. An **init container** ([`add-initcontainer.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/add-initcontainer.yaml)) reads `username`/`password` from that Secret and writes `DATABASE_URL` (`postgres://${DB_USER}:${DB_PASS}@${DB_HOST}:${DB_PORT}/${DB_NAME}`) into a shared `generated-secrets` volume the app reads.
### Worked example — CrowdSec (`tools` namespace)
Templates under [`crowdsec/templates`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec/templates):
1. **`VaultAuth` `crowdsec`** ([`vaultauth.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec/templates/vaultauth.yaml)) — `role: crowdsec`, `serviceAccount: crowdsec`.
2. **`VaultDynamicSecret` `crowdsec-db-credentials`** ([`vaultdynamicsecret.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec/templates/vaultdynamicsecret.yaml)) — `postgres/creds/crowdsec` → Secret `crowdsec-db-credentials`; `rolloutRestartTargets` restarts Deployment **`crowdsec-lapi`** (the Local API that owns the DB connection).
### `factory_auth.tf` — the Ansible CrowdSec/Traefik plugin reader
Separately from the per-app machinery, [`factory_auth.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/factory_auth.tf) wires a Kubernetes auth role **`factory_crowdsec_conf`** for SA **`factory-ansible-tool-crowdsec-traefik-plugin`** in ns **`kube-system`** (`token_ttl = 3600`). It carries policy `factory_crowdsec_conf`, which grants `read,list` on **`kvv2/data/cms/factory/*`**. This is how the Ansible-deployed CrowdSec/Traefik bouncer plugin reads the **Turnstile** configuration that the [`cms` repo](https://gitea.arcodange.lab/arcodange-org/cms) writes into `kvv2/cms/factory/*` — a cross-repo handoff entirely through Vault, with no shared file. The producer side (the Turnstile widget and the `vault_kv_secret_v2` write) is documented on the [CMS Cloudflare page](../cms/cloudflare.md).
---
## 4) Secret-paths inventory
| Path | Engine | Holds | Producer | Consumer |
|---|---|---|---|---|
| `kvv2/<app>/config` | KV v2 | App runtime config | app CI (KV CRUD via `<app>-ops`) | `VaultStaticSecret` → pod |
| `kvv2/plausible/config` | KV v2 | `SECRET_KEY_BASE`, `TOTP_VAULT_KEY` | Plausible IaC (`random_password``vault_kv_secret_v2`) | `VaultStaticSecret plausible``plausible-config` |
| `kvv2/plausible/geoip` | KV v2 | `LICENSE_KEY` (MaxMind) | admin-seeded | `VaultStaticSecret plausible-geoip``geoipupdate` sidecar |
| `kvv2/cms/factory/turnstile` | KV v2 | Cloudflare Turnstile config | `cms` repo IaC | `factory_crowdsec_conf` k8s role → Ansible CrowdSec/Traefik plugin |
| `postgres/creds/<app>` | database | Ephemeral DB user (`username`/`password`, 1h lease) | Vault on demand (role `<app>`, `GRANT <app>_role`) | `VaultDynamicSecret` → pod (e.g. `plausible-db-credentials`, `crowdsec-db-credentials`) |
| `transit/.../vso-client-cache` | transit | VSO client-cache encryption key | Vault admin IaC | VSO controller (encrypt/decrypt its cache) |
| `kvv1/cloudflare/<app>*` | KV v1 | Cloudflare DNS/edge secrets | admin | app CI (`<app>-ops` CRUD) |
| `kvv1/ovh/<app>*` | KV v1 | OVH secrets | admin | app CI (`<app>-ops` CRUD) |
| `kvv1/gitea/tofu_module_reader` | KV v1 | Bot SSH key to pull the `app_roles` Git module | admin | app CI (`<app>-ops` read) |
| `kvv1/google/credentials` | KV v1 | GCS Terraform-backend SA key | admin | every IaC CI job (read) |
---
## 5) Secrets flow
```mermaid
%%{init: {'theme': 'base'}}%%
flowchart TB
classDef eng fill:#7c3aed,stroke:#5b21b6,color:#ffffff
classDef auth fill:#b45309,stroke:#92400e,color:#ffffff
classDef crd fill:#059669,stroke:#047857,color:#ffffff
classDef k8s fill:#2563eb,stroke:#1e40af,color:#ffffff
classDef ci fill:#be123c,stroke:#9f1239,color:#ffffff
subgraph VAULT["Vault (tools ns)"]
KV2["kvv2 engine<br>kvv2/&lt;app&gt;/*"]:::eng
PG["postgres engine<br>postgres/creds/&lt;app&gt;"]:::eng
TR["transit<br>vso-client-cache"]:::eng
KKUB["kubernetes auth<br>role &lt;app&gt; (SA AND ns)"]:::auth
KJWT["gitea_jwt auth<br>gitea_cicd_&lt;app&gt;"]:::auth
end
subgraph RUNTIME["Runtime path"]
VA["VaultAuth<br>role &lt;app&gt;, SA &lt;app&gt;"]:::crd
VSS["VaultStaticSecret<br>kvv2/&lt;app&gt;/config"]:::crd
VDS["VaultDynamicSecret<br>postgres/creds/&lt;app&gt;"]:::crd
SEC["k8s Secret<br>&lt;app&gt;-config / -db-credentials"]:::k8s
POD["App pod<br>(SA &lt;app&gt;)"]:::k8s
end
subgraph CICD["CI path"]
GHA["Gitea Actions<br>OpenTofu job"]:::ci
TOFU["apply app_roles<br>(under &lt;app&gt;-ops)"]:::ci
end
KKUB --> VA
VA --> VSS
VA --> VDS
KV2 --> VSS
PG --> VDS
VSS --> SEC
VDS -- "rolloutRestart on rotation" --> SEC
SEC --> POD
TR -. "encrypts client cache" .-> VA
GHA -- "JWT login" --> KJWT
KJWT --> TOFU
TOFU -- "creates" --> KKUB
TOFU -- "creates" --> PG
```
1. **Vault** mounts the engines (`kvv2`, `postgres`, `transit`) and the two auth backends (`kubernetes`, `gitea_jwt`), all in the `tools` namespace.
2. A pod's `VaultAuth` logs in through the **`kubernetes`** backend with SA `<app>` against role `<app>`; the role accepts only when **both** the SA name **and** its namespace match (AND).
3. `VaultStaticSecret` reads `kvv2/<app>/config` and `VaultDynamicSecret` reads `postgres/creds/<app>` using that auth; VSO writes the values into ordinary k8s `Secret` objects.
4. The pod consumes the Secret (env or volume); on a dynamic-cred **rotation** VSO restarts the `rolloutRestartTargets` Deployment so it picks up the new credentials.
5. The **`transit`** key `vso-client-cache` encrypts VSO's client cache so an operator restart doesn't trigger a re-auth storm.
6. On the CI side, a **Gitea Actions** OpenTofu job logs into the **`gitea_jwt`** backend as `gitea_cicd_<app>` (audience = the Gitea OAuth app id, identity from the `email` claim).
7. Running under the `<app>-ops` policy, that job **applies the `app_roles` module**, creating/updating the Kubernetes auth role and the Postgres dynamic role for `<app>` — closing the loop so the runtime path in steps 2-4 works.
---
## Gotchas
- **Vault must be unsealed after every restart.** Sealed Vault → all VSO reads fail → dynamic-secret consumers won't start. See [storage-and-recovery](../lab-ecosystem/storage-and-recovery.md).
- **The Kubernetes auth role binds SA *and* namespace (AND).** The wrong namespace, or a different SA in the right namespace, is rejected. Apps in ns `tools` (CrowdSec, Plausible) widen the binding via `service_account_namespaces`.
- **The Postgres dynamic role depends on `<app>_role` existing.** `GRANT <app>_role TO {{name}}` (create) and `REASSIGN OWNED BY {{name}} TO <app>_role` (revoke) both fail if factory's [postgres IaC](../factory-provisioning/opentofu/postgres-iac.md) hasn't created the `<app>_role` non-login role first. Order: **factory postgres/iac → tools app_roles**.
- **The `ops_policies` vs `policies` key mismatch** in `terraform.tfvars` / `variables.tf` (see §2) — read both when adding an app's extra ops policy.
- **The sandbox uses a separate Vault.** Per the [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md), the prod-like sandbox stands up its own Vault instance; none of the paths or roles above are shared with it. Don't assume a secret seeded in prod exists in the sandbox.