diff --git a/vibe/guidebooks/README.md b/vibe/guidebooks/README.md index fd2f104..5f62593 100644 --- a/vibe/guidebooks/README.md +++ b/vibe/guidebooks/README.md @@ -36,6 +36,8 @@ flowchart LR |---|---|---| | [Lab ecosystem](lab-ecosystem/README.md) | End-to-end map of `factory` + `tools` + `cms`: repos, the `` join key, secrets via Vault, CI/CD, ArgoCD, and the data/control flows that connect them | ✅ Active | | [Factory provisioning](factory-provisioning/README.md) | Deep dive into how factory provisions everything: Ansible playbooks + roles and OpenTofu | ✅ Active | +| [Tools](tools/README.md) | Deep dive into the lab platform services in the `tools` namespace (Vault+VSO, Prometheus, Grafana, CrowdSec, poolers, Redis, Plausible, ClickHouse) | ✅ Active | +| [CMS](cms/README.md) | Deep dive into the public Nuxt site arcodange.fr + its Cloudflare DNS/tunnel/Turnstile and Zoho email IaC | ✅ Active | ## Rules to contribute diff --git a/vibe/guidebooks/cms/README.md b/vibe/guidebooks/cms/README.md new file mode 100644 index 0000000..b9a1046 --- /dev/null +++ b/vibe/guidebooks/cms/README.md @@ -0,0 +1,75 @@ +[vibe](../../README.md) > [Guidebooks](../README.md) > **CMS** + +# CMS + +> **Status:** ✅ Active +> **Last Updated:** 2026-06-23 +> **Upstream:** [lab-ecosystem 03 · cms](../lab-ecosystem/03-cms.md) +> **Downstream:** [Site (Nuxt)](site.md) · [Cloudflare](cloudflare.md) · [Zoho email](zoho-email.md) +> **Related:** [tools CrowdSec](../tools/components.md) · [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) · [tofu CI flow](../factory-provisioning/opentofu/ci-apply-flow.md) · [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md) + +This guidebook maps the [`cms` repo](https://gitea.arcodange.lab/arcodange-org/cms) — the one app in the lab whose primary audience is the open Internet. It serves the public site **arcodange.fr** and owns the OpenTofu that wires its Cloudflare edge, its Cloudflared tunnel into the cluster, its Turnstile CAPTCHA, and its Zoho email. + +## Two faces of one repo + +The `cms` repo holds two distinct concerns that share a domain but live in different directories. + +| Face | Where | What it is | +|---|---|---| +| **The SITE** | repo root ([`app/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/app), [`content/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/content), [`chart/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart)) | A **Nuxt 4** application (Nuxt Content + Nuxt Studio) built to static output and deployed **two ways**: to **Cloudflare Pages** (public `arcodange.fr` / `www`) and into **k3s** via a Helm chart (ArgoCD app **`cms`**) reachable through the Cloudflared tunnel (e.g. `cms-rec.arcodange.fr`, `www.arcodange.lab`) | +| **The IaC** | [`cloudflare/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare) | **OpenTofu** managing the `arcodange.fr` zone (registered at OVH, DNS delegated to Cloudflare), Cloudflare **Pages**, the **Cloudflared** Zero-Trust tunnel into internal Traefik, a **Turnstile** CAPTCHA feeding CrowdSec, and **Zoho** email | + +The site is *what visitors see*; the IaC is *how they reach it and how mail flows*. Both deploy from the same Gitea repo through Gitea Actions. + +## Public request + email flow + +```mermaid +%%{init: {'theme': 'base'}}%% +flowchart LR + classDef edge fill:#d97706,stroke:#b45309,color:#fff + classDef proc fill:#059669,stroke:#047857,color:#fff + classDef store fill:#7c3aed,stroke:#6d28d9,color:#fff + + USER(["Visitor"]):::edge + CFDNS["Cloudflare DNS
arcodange.fr zone"]:::edge + PAGES["Cloudflare Pages
(static Nuxt build)"]:::proc + TUN["Cloudflared tunnel"]:::edge + TRAEFIK["internal Traefik"]:::proc + CS["CrowdSec bouncer
(Turnstile-backed)"]:::proc + CMS["cms pod (Nuxt)
cms-rec.arcodange.fr"]:::proc + MAIL(["Sender"]):::edge + ZOHO["Zoho
MX / SPF / DKIM / DMARC / BIMI"]:::store + + USER --> CFDNS + CFDNS -- "arcodange.fr / www" --> PAGES + CFDNS -- "*.arcodange.fr" --> TUN + TUN --> TRAEFIK --> CS --> CMS + MAIL -- "MX lookup arcodange.fr" --> ZOHO +``` + +1. A **visitor** resolves a hostname under `arcodange.fr` through **Cloudflare DNS** (the zone OpenTofu manages). +2. The apex and `www` records (proxied CNAMEs) land on **Cloudflare Pages**, which serves the static Nuxt build directly from the edge. +3. Wildcard `*.arcodange.fr` hostnames route through the **Cloudflared** Zero-Trust tunnel — no home-LAN ports are opened — onto **internal Traefik**, which passes the request through the **CrowdSec** bouncer (its CAPTCHA challenge backed by Turnstile) to the in-cluster **`cms`** Nuxt pod (e.g. `cms-rec.arcodange.fr`). +4. Separately, **email** to `arcodange.fr` follows the **MX** record to **Zoho**, with **SPF/DKIM/DMARC/BIMI** authenticating and presenting the mail. + +## Index + +| Page | What it maps | Status | +|---|---|---| +| [Site (Nuxt)](site.md) | The Nuxt 4 app: Nuxt Content + Studio, static build, the dual deploy to Cloudflare Pages and to k3s via the Helm chart / ArgoCD app `cms` | ✅ Active | +| [Cloudflare](cloudflare.md) | The `cloudflare/` OpenTofu: zone (OVH-registered, CF-delegated), Pages, the Cloudflared tunnel into Traefik, and the Turnstile CAPTCHA for CrowdSec | ✅ Active | +| [Zoho email](zoho-email.md) | Zoho mail IaC: domain verification, MX/SPF/DKIM/DMARC/BIMI records, and the public aliases | ✅ Active | + +## Maintenance rule + +> [!IMPORTANT] +> **If any component documented in this guidebook is altered, update the page describing it in the same change.** A reference map that drifts from the real `cms` repo sends readers and agents down dead paths. The PR that changes a component is the PR that updates its CMS guidebook page. + +## Cross-references + +- [lab-ecosystem 03 · cms](../lab-ecosystem/03-cms.md) — the whole-lab view of where `cms` sits among `factory` + `tools`. +- [tools CrowdSec](../tools/components.md) — the Traefik bouncer the Turnstile challenge feeds for public-edge decisioning. +- [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) — where the Cloudflared tunnel token, Turnstile secret, and Cloudflare/Zoho/OVH credentials live in Vault. +- [tofu CI flow](../factory-provisioning/opentofu/ci-apply-flow.md) — the OpenTofu apply pipeline pattern the `cloudflare/` IaC follows in Gitea Actions. +- [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md) — why public-facing surfaces like this one are isolated from a safe prod-like environment. +- Repo: [arcodange-org/cms](https://gitea.arcodange.lab/arcodange-org/cms). diff --git a/vibe/guidebooks/cms/cloudflare.md b/vibe/guidebooks/cms/cloudflare.md new file mode 100644 index 0000000..082e8a6 --- /dev/null +++ b/vibe/guidebooks/cms/cloudflare.md @@ -0,0 +1,182 @@ +[vibe](../../README.md) > [Guidebooks](../README.md) > [CMS](README.md) > **Cloudflare** + +# Cloudflare + +> **Status:** ✅ Active +> **Last Updated:** 2026-06-23 +> **Upstream:** [CMS](README.md) · [lab-ecosystem 03 · cms](../lab-ecosystem/03-cms.md) +> **Downstream:** [tools CrowdSec](../tools/components.md) (consumes the Turnstile widget) +> **Related:** [Zoho email](zoho-email.md) · [tofu CI flow](../factory-provisioning/opentofu/ci-apply-flow.md) · [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) · [naming conventions](../lab-ecosystem/naming-conventions.md) · [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md) + +This page maps [`cms/cloudflare/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare) — the OpenTofu root that owns the **`arcodange.fr`** edge. One `tofu apply` registers the zone at OVH, **delegates its DNS to Cloudflare**, publishes the public site on **Cloudflare Pages**, opens a **Cloudflared** Zero-Trust tunnel into the in-cluster Traefik, mints the **Turnstile** CAPTCHA the [tools CrowdSec bouncer](../tools/components.md) challenges with, and (via a sibling module) wires **Zoho** mail. The Nuxt site itself is not built here — see [Site (Nuxt)](site.md). + +## Providers + +Declared in [`providers.tf`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/providers.tf). Versions pinned in [`.terraform.lock.hcl`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/.terraform.lock.hcl). + +| Provider | Source | Version | Auth | Purpose | +|---|---|---|---|---| +| `cloudflare` | `cloudflare/cloudflare` | `~> 5` | `CLOUDFLARE_API_TOKEN` env | Zone, Pages, DNS records, Zero-Trust tunnel, Turnstile, zone settings | +| `ovh` | `ovh/ovh` | `~> 2.8` | `OVH_*` env (`ovh-eu` endpoint) | Domain registration + nameserver delegation | +| `vault` | `vault` | `5.5.0` | `auth_login_jwt` (mount `gitea_jwt`, role `gitea_cicd_cms`) at `https://vault.arcodange.lab` | Persists the Turnstile secret/sitekey; reads tunnel token | + +> [!NOTE] +> The Vault provider authenticates with a **Gitea-issued OIDC JWT** (`TERRAFORM_VAULT_AUTH_JWT`), the same OIDC→Vault pattern the [tofu CI flow](../factory-provisioning/opentofu/ci-apply-flow.md) documents lab-wide. + +## State backend — S3 on Cloudflare R2 + +[`backend.tf`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/backend.tf) keeps state in an **S3-compatible bucket on Cloudflare R2**, not AWS. The `skip_*` flags and `use_path_style` are what let the AWS S3 backend talk to R2. + +| Setting | Value | +|---|---| +| `bucket` | `arcodange-tf` | +| `key` | `cms/terraform.tfstate` | +| `region` | `auto` | +| `endpoints.s3` | `var.CLOUDFLARE_S3_ENDPOINT` (R2 S3 API URL) | +| `access_key` / `secret_key` | `var.CLOUDFLARE_S3_ACCESS_KEY` / `var.CLOUDFLARE_S3_SECRET_ACCESS_KEY` | +| Flags | `skip_credentials_validation`, `skip_metadata_api_check`, `skip_region_validation`, `skip_requesting_account_id`, `skip_s3_checksum`, `use_path_style` | + +> [!WARNING] +> The R2 backend credentials are **Terraform variables**, so they must be present in the environment *before* `tofu init` can read state. CI injects them from Vault path `kvv1/cloudflare/r2/arcodange-tf` (mapped to `TF_VAR_CLOUDFLARE_*` — see [CI](#ci--cloudflareyaml) below). Without those creds nothing — not even a read-only plan — can run. + +## Resource graph + +```mermaid +%%{init: {'theme': 'base'}}%% +flowchart TD + classDef ovh fill:#1e3a8a,stroke:#1e40af,color:#fff + classDef cf fill:#d97706,stroke:#b45309,color:#fff + classDef mod fill:#059669,stroke:#047857,color:#fff + classDef vault fill:#7c3aed,stroke:#6d28d9,color:#fff + + OVHDOM["ovh_domain_name
arcodange.fr"]:::ovh + OVHNS["ovh_domain_name_servers
delegate NS"]:::ovh + ZONE["cloudflare_zone
arcodange.fr"]:::cf + PAGES["cloudflare_pages_project
arcodange-cms (branch main)"]:::cf + PDOM["cloudflare_pages_domain
arcodange.fr + www"]:::cf + DNS["cloudflare_dns_record
@ + www CNAME (proxied)"]:::cf + TUN["module.cf_tunnel
Zero-Trust tunnel 'lab'"]:::mod + CAP["module.cf_captcha_for_crowdsec
Turnstile widget"]:::mod + ZOHO["module.zoho
mail records"]:::mod + VBACK["module.vault_backend
cms app role (cloudflared)"]:::vault + + OVHDOM --> ZONE + ZONE -- "name_servers" --> OVHNS + ZONE --> PAGES --> PDOM + PAGES -- "subdomain target" --> DNS + ZONE --> DNS + ZONE --> TUN + ZONE --> ZOHO + OVHDOM --> CAP +``` + +1. **`ovh_domain_name "arcodange.fr"`** anchors the registration (imported into state, not created by OpenTofu). +2. **`cloudflare_zone`** creates the Cloudflare zone for that domain under the `arcodange@gmail.com` account. +3. **`ovh_domain_name_servers`** writes Cloudflare's assigned nameservers back at OVH, **delegating DNS to Cloudflare**. +4. **`cloudflare_pages_project "arcodange-cms"`** (production branch `main`) plus two **`cloudflare_pages_domain`** resources attach `arcodange.fr` and `www.arcodange.fr` to Pages. +5. **`cloudflare_dns_record`** publishes apex (`@`) and `www` as **proxied CNAMEs** pointing at the Pages project's `.pages.dev` subdomain. +6. The three **modules** (`cf_tunnel`, `cf_captcha_for_crowdsec`, `zoho`) and `vault_backend` hang off the same zone/domain/account. + +### DNS & zone resources + +| Resource | Name | Detail | +|---|---|---| +| `ovh_domain_name.arcodange_fr` | `arcodange.fr` | Registration; `# was terraform imported into state` | +| `cloudflare_zone.arcodange_fr` | `arcodange.fr` | Zone under account resolved from `arcodange@gmail.com` | +| `ovh_domain_name_servers.arcodange_fr` | — | Delegates NS to `cloudflare_zone…name_servers` (or `original_name_servers` when rolling back) | +| `terraform_data.arcodange_fr_initial_conf` | — | Snapshot of OVH's pre-Cloudflare config, kept for rollback inspection (`ignore_changes`) | +| `cloudflare_pages_project.arcodange_fr` | `arcodange-cms` | `production_branch = "main"` | +| `cloudflare_pages_domain.arcodange_fr` | `arcodange.fr` | Custom domain on Pages | +| `cloudflare_pages_domain.www_arcodange_fr` | `www.arcodange.fr` | Custom domain on Pages | +| `cloudflare_dns_record.root_cname` | `@` | CNAME → Pages `subdomain`, `proxied = true`, `ttl = 1` | +| `cloudflare_dns_record.www_cname` | `www` | CNAME → Pages `subdomain`, `proxied = true`, `ttl = 1` | + +All wiring lives in [`iac.tf`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/iac.tf). The account id is resolved at plan time via `data.cloudflare_account` filtered on the `arcodange@gmail.com` account name. + +## Module: `cloudflared_tunnel` + +[`modules/cloudflared_tunnel/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/modules/cloudflared_tunnel). A **Zero-Trust Cloudflared tunnel** that lets public hostnames reach in-cluster services **without opening any home-LAN port** — Cloudflare originates the connection from inside the cluster outward. Instantiated as `module.cf_tunnel` with `tunnel_name = "lab"`. + +| Resource | Role | +|---|---| +| `cloudflare_zero_trust_tunnel_cloudflared.tunnel` | The tunnel named **`lab`** under the account | +| `cloudflare_zero_trust_tunnel_cloudflared_config.tunnel_config` | Ingress rules from `hostname_mappings`, terminating in a catch-all `http_status:404` | +| `data.cloudflare_zone.arcodange` | Looks up the zone (created by the root module) | +| `cloudflare_zone_setting.setting` | Sets **`always_use_https = on`** | +| `cloudflare_dns_record.dns` | One **proxied CNAME** per mapping → `.cfargotunnel.com` | + +The single ingress mapping passed from the root is: + +| Hostname | Service | +|---|---| +| `*.arcodange.fr` | `http://traefik.kube-system.svc.cluster.local:80` | + +So every wildcard subdomain under `arcodange.fr` lands on the cluster's **internal Traefik** (`origin_request.no_tls_verify = true`), which then routes to the right in-cluster app (e.g. the `cms` Nuxt pod, Grafana, etc.). Pairs with the apex/`www` Pages records above, which are *not* tunneled. + +> [!CAUTION] +> **The tunnel token is created by hand and rotation is not automated.** Cloudflare only issues a connector token from the web console, so it is **manually stored in Vault** under the KV-v2 mount `kvv2` at path `cms/cloudflared` (the in-repo `vault_kv_secret` resource is commented out for exactly this reason). The cluster-side `cloudflared` Deployment reads it via a `VaultStaticSecret` (Vault Secrets Operator), role `cms`, refreshed hourly. If the token is rotated in the console, the Vault entry must be updated **manually** — nothing in this IaC will do it. `module.vault_backend` provisions the `cms` Vault app role (service account `cloudflared`) that grants that read; see [secrets-and-vault](../lab-ecosystem/secrets-and-vault.md). + +## Module: `cloudflared_captcha_for_crowdsec` + +[`modules/cloudflared_captcha_for_crowdsec/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/modules/cloudflared_captcha_for_crowdsec). Mints a **Cloudflare Turnstile widget** and stores its keys in Vault for the [tools CrowdSec bouncer](../tools/components.md) to serve as a CAPTCHA challenge on remediated requests. + +| Resource | Detail | +|---|---| +| `cloudflare_turnstile_widget.turnstile` | `name = "crowdsec captcha"`, `mode = "invisible"`, `clearance_level = "interactive"`, `region = "world"`; `bot_fight_mode`/`ephemeral_id`/`offlabel` all `false` | +| `vault_kv_secret_v2.turnstile` | Writes `{ sitekey, secret }` to KV-v2 (`cas = 1`) | + +Instantiated as `module.cf_captcha_for_crowdsec` with `domain_names = [arcodange.fr, arcodange.lab, arcodange.duckdns.org]` and `vault_path = "cms/factory/turnstile"`. + +| What | Where | +|---|---| +| **Turnstile mode** | Invisible widget, interactive clearance — challenges only when CrowdSec flags a request | +| **Vault destination** | `kvv2/cms/factory/turnstile` → keys `sitekey` + `secret` | +| **Consumer** | The [CrowdSec Traefik bouncer in `tools`](../tools/components.md) reads sitekey + secret to render and verify the challenge | + +This is the one knot that ties the **`cms`** edge to the **`tools`** security stack: `cms` produces the Turnstile keys; `tools` consumes them. + +## Sibling module: Zoho mail + +`module.zoho` (source `./zoho`) lives in **this same OpenTofu root** and writes mail records into the same `cloudflare_zone`. It is documented separately on [Zoho email](zoho-email.md) — note that a `cms/cloudflare` apply touches mail DNS too, so plan output there is expected. + +## CI — `cloudflare.yaml` + +[`.gitea/workflows/cloudflare.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/.gitea/workflows/cloudflare.yaml). Manual-only (`workflow_dispatch`), same Gitea-OIDC→Vault→`tofu apply` shape as the [tofu CI flow concept](../factory-provisioning/opentofu/ci-apply-flow.md). + +1. **`gitea_vault_auth`** — mints a Gitea OIDC id-token (decodes `vault_oauth__sh_b64` and runs it), exported as `gitea_vault_jwt`. +2. **`tofu`** — depends on the auth job; a shared `*vault_step` reads all secrets from Vault (role `gitea_cicd_cms`, mount `gitea_jwt`), prepares the homelab CA cert, then runs **`dflook/terraform-apply@v1`** on `path: cloudflare/` with **`auto_approve: true`** at **OpenTofu `1.8.2`**. + +### Vault secrets read by the workflow + +| Vault path | Mapped to | Used for | +|---|---|---| +| `kvv1/cloudflare/cms/cf_arcodange_cms_token` (`token`) | `CLOUDFLARE_API_TOKEN` | Cloudflare provider auth | +| `kvv1/cloudflare/r2/arcodange-tf` (`*`) | `TF_VAR_CLOUDFLARE_*` | R2/S3 state backend creds + endpoint | +| `kvv1/gitea/tofu_module_reader` (`ssh_private_key`) | `TERRAFORM_SSH_KEY` | SSH key to clone the `tools` git module (`vault_backend`) | +| `kvv1/ovh/cms/app` (`*`) | `OVH_*` | OVH provider auth | +| `kvv1/zoho/self_client` (`*`) | `ZOHO_*` **and** `TF_VAR_ZOHO_*` | Zoho API auth for `module.zoho` | + +> [!CAUTION] +> **`auto_approve: true` applies without a human gate.** Any dispatch of this workflow on any ref runs `tofu apply` straight against the live `arcodange.fr` edge and Vault. There is no plan-review step; review happens in the PR before merge, not in the apply. Treat a dispatch as a production change. + +## Gotchas + +> [!CAUTION] +> **Cloudflared tunnel token — manual, unrotated.** Created in the Cloudflare console and hand-placed in Vault under `kvv2` at path `cms/cloudflared`. No IaC rotates it. (Repeated here because it is the most common surprise.) + +> [!WARNING] +> **OVH → Cloudflare nameserver delegation is the live cutover.** `ovh_domain_name_servers` points OVH at Cloudflare's nameservers. The `use_ovh_initial_name_servers` variable (default `false`) is meant to flip delegation back to OVH's `original_name_servers`, but that **rollback path is untested** — `terraform_data.arcodange_fr_initial_conf` only *snapshots* the pre-Cloudflare config for inspection. Do not assume a clean revert. + +> [!WARNING] +> **R2-backed state creds gate everything.** State lives on Cloudflare R2 and the access/secret keys are `TF_VAR_` inputs (from `kvv1/cloudflare/r2/arcodange-tf`). If those creds are missing or rotated out from under the workflow, even `tofu init` fails — there is no fallback backend. + +## Cross-references + +- [CMS](README.md) — the guidebook hub; the public-request + email flow diagram. +- [Site (Nuxt)](site.md) — the Nuxt app served by the Pages project and the in-cluster pod this tunnel fronts. +- [Zoho email](zoho-email.md) — `module.zoho` lives in this same OpenTofu root. +- [tools CrowdSec](../tools/components.md) — consumer of the Turnstile widget minted here. +- [tofu CI flow concept](../factory-provisioning/opentofu/ci-apply-flow.md) — the shared Gitea-OIDC→Vault→apply pattern. +- [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) — where the tunnel token, Turnstile keys, and provider creds live. +- [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md) — why this Internet-facing surface is isolated from the safe prod-like environment. +- Code: [`cms/cloudflare/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare). diff --git a/vibe/guidebooks/cms/site.md b/vibe/guidebooks/cms/site.md new file mode 100644 index 0000000..5e20f40 --- /dev/null +++ b/vibe/guidebooks/cms/site.md @@ -0,0 +1,165 @@ +[vibe](../../README.md) > [Guidebooks](../README.md) > [CMS](README.md) > **Site (Nuxt)** + +# Site (Nuxt) + +> **Status:** ✅ Active +> **Last Updated:** 2026-06-23 +> **Upstream:** [CMS](README.md) · [lab-ecosystem 03 · cms](../lab-ecosystem/03-cms.md) +> **Downstream:** [Cloudflare](cloudflare.md) +> **Related:** [Zoho email](zoho-email.md) · [tools CrowdSec](../tools/components.md) · [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) + +The public site face of the [`cms` repo](https://gitea.arcodange.lab/arcodange-org/cms): a **Nuxt 4** application built to **static HTML** and shipped two ways from one image — to **Cloudflare Pages** (the live public `arcodange.fr`) and into **k3s** via a Helm chart behind the Cloudflared tunnel. This page maps the Nuxt app, its Docker build, the Helm chart, and the Gitea Actions that drive both deploys. + +## The Nuxt 4 application + +Configured in [`nuxt.config.ts`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/nuxt.config.ts). It runs `ssr: true` for dev but is shipped via **`nuxt generate`** — a full static prerender — so production is plain HTML served by a static file server, no Node runtime. + +| Concern | Setting | Notes | +|---|---|---| +| Rendering | `ssr: true`, shipped via `nuxt generate` | Static prerender to `.output/public`; Nitro `prerender.autoSubfolderIndex: false` | +| Site identity | `site.url: https://arcodange.fr`, `site.name: Arcodange`, `trailingSlash: true` | Drives canonical URLs, sitemap, robots via `@nuxtjs/seo` | +| Content | `@nuxt/content` collections | Markdown under [`content/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/content); mermaid highlight enabled | +| Editing | **Nuxt Studio** at route **`/admin`** | `nuxt-studio` module; repo `arcodange-org/cms`, commits to `main` | +| Sitemap / robots | `@nuxtjs/sitemap` (`zeroRuntime: true`), `@nuxtjs/seo` | No runtime sitemap server — fully prerendered | +| Analytics | `@nuxtjs/plausible` | `apiHost: https://analytics.arcodange.fr`, `hashMode: true`, outbound tracking on, `localhost` ignored | +| i18n | `@nuxtjs/i18n` | Single locale **`fr`** (default `fr`); `htmlAttrs.lang: fr` | +| Images | `@nuxt/image` | `webp`/`jpeg`, quality 80 | +| Fonts | `@nuxt/fonts` | Local **Noto Emoji** preloaded | +| UI | `@nuxt/ui` | Plus `@nuxt/scripts`, `@nuxtjs/device`, `nuxt-booster`, `@compodium/nuxt` | + +### Content collections + +Declared in [`content.config.ts`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/content.config.ts). Every collection is wrapped with `asSeoCollection()` (from `@nuxtjs/seo`) and sourced from a folder of Markdown under [`content/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/content). + +| Collection | Source glob | Type | Schema extras | +|---|---|---|---| +| `parcours` | `parcours/*.md` | `page` | — | +| `site` | `site/*.md` | `page` | — | +| `tech` | `tech/*.md` | `page` | `date` (required), `image` (media), `featured` (default `false`) | +| `experiences` | `experiences/*.md` | `page` | `date`, `enddate`, `icon` (default `i-lucide-rocket`), `image`, `secondaryImage`, `descriptionHTML` | + +A content build transformer `~~/content/transformers/description-md` runs at build time, and Markdown highlighting registers the `mermaid` language. + +## Docker build: one image, two static trees + +[`Dockerfile`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/Dockerfile) is a multi-stage build that produces **two** static outputs from the same source and packs them into a static web server image. + +```mermaid +%%{init: {'theme': 'base'}}%% +flowchart LR + classDef base fill:#7c3aed,stroke:#6d28d9,color:#fff + classDef build fill:#059669,stroke:#047857,color:#fff + classDef out fill:#d97706,stroke:#b45309,color:#fff + + DEPS["cms-deps:TAG
(Dockerfile.deps base)"]:::base + BUILD["build stage
npm ci"]:::build + PROD["nuxt generate
→ /app/prod"]:::out + STG["NUXT_SITE_ENV=staging
nuxt generate → /app/.output/public"]:::out + SWS["static-web-server:2
serves /public"]:::build + + DEPS --> BUILD + BUILD --> PROD + BUILD --> STG + PROD --> SWS + STG --> SWS +``` + +1. The **build stage** starts `FROM gitea.arcodange.lab/arcodange-org/cms-deps:${BASE_IMAGE_TAG}` — a prebuilt base ([`Dockerfile.deps`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/Dockerfile.deps), `node:24-slim` + `python3`/`make`/`g++`/`sqlite3`/`libvips` for `better-sqlite3`/`libvips`) — copies the source and runs `npm ci`. +2. The **prod** build: `npx nuxt generate`, then the output is moved to **`/app/prod`**. +3. The **staging** build: `NUXT_SITE_ENV="staging" npx nuxt generate`, leaving its output at **`/app/.output/public`**. +4. The **server stage** is `FROM joseluisq/static-web-server:2`; it copies the staging tree to **`/public`** and the prod tree to **`/prod`**, plus [`webserver.config.toml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/webserver.config.toml) as `/sws.toml`, and serves on port 80. + +> [!NOTE] +> **`/public` is staging, `/prod` is production.** The static-web-server serves `root = "./public"` (the **staging** build) by default — that is what the in-cluster k3s deploy exposes (e.g. `cms-rec.arcodange.fr`). The **prod** build at `/prod` is the tree extracted and pushed to Cloudflare Pages by the `arcodange_fr` workflow. One image therefore carries both faces. + +The final image is pushed to **`gitea.arcodange.lab/arcodange-org/cms`** (tags `latest` and the branch ref). + +## Helm chart + +[`chart/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart) deploys the in-cluster face. The pod is just the static-web-server image above, fronted by Traefik with a CrowdSec middleware and reached either over the lab ingress (`www.arcodange.lab`) or through a sidecar Cloudflared tunnel (`cms-rec.arcodange.fr`). + +| Key | Value | Source | +|---|---|---| +| Chart name / version | `arcodange-cms` / `0.1.0`, `appVersion: latest` | [`Chart.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/Chart.yaml) | +| Image | `gitea.arcodange.lab/arcodange-org/cms:latest`, `pullPolicy: Always` | [`values.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/values.yaml) | +| Replicas | `1` (autoscaling disabled) | `replicaCount: 1`, `autoscaling.enabled: false` | +| Service | `ClusterIP`, port **80** (named `http`) | `service.port: 80` | +| Probes | liveness + readiness `httpGet /` on `http` | — | +| ServiceAccount | created, name **`cms`**, automount on | `serviceAccount.name: cms` | +| Lab ingress | `www.arcodange.lab`, path `/` Prefix | Traefik `websecure`, TLS via `letsencrypt` resolver (`arcodange.lab` + SAN `www.arcodange.lab`) | +| Edge middleware | `kube-system-crowdsec@kubernetescrd` | applied on both ingresses | +| Tunnel ingress | `cms-rec.arcodange.fr`, Traefik `web` entrypoint | `ingress.cloudflared.host` | +| Cloudflared sidecar | enabled, `Deployment`, `1` replica, image `cloudflare/cloudflared:latest` | `cloudflared.*` | +| Tunnel token | Vault KV-v2 `kvv2` path `cms/cloudflared`, role `cms`, refresh `1h` | `cloudflared.vault.*` | + +### Chart templates + +The chart renders these objects (in [`chart/templates/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates)): + +| Template | Renders | +|---|---| +| [`deployment.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates/deployment.yaml) | the `cms` static-web-server pod, port `http`/80 | +| [`service.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates/service.yaml) | ClusterIP service on 80 | +| [`ingress.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates/ingress.yaml) | lab Traefik ingress for `www.arcodange.lab` + CrowdSec middleware | +| [`ingress_cloudflared.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates/ingress_cloudflared.yaml) | `-cloudflared` ingress for `cms-rec.arcodange.fr` (web entrypoint) | +| [`cloudflared_tunnel.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates/cloudflared_tunnel.yaml) | `cloudflared` SA, `VaultAuth`, `VaultStaticSecret`, and the cloudflared `Deployment`/`DaemonSet` | +| [`serviceaccount.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates/serviceaccount.yaml) | the `cms` ServiceAccount | +| [`ingress_gitea.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates/ingress_gitea.yaml), [`hpa.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/chart/templates/hpa.yaml) | optional Gitea ingress; HPA (disabled) | + +### Cloudflared tunnel template + +The cloudflared sidecar pulls its tunnel token from Vault through the [VSO](../tools/components.md) operator, never from a static manifest: + +1. A `ServiceAccount` **`cloudflared`** is created with a `VaultAuth` (Kubernetes auth, mount `kubernetes`, role from `cloudflared.vault.role` = `cms`, audience `vault`). +2. A **`VaultStaticSecret`** named `cloudflared-tunnel-token` reads **KV-v2** mount **`kvv2`** at path **`cms/cloudflared`** (refresh `1h`) and materialises a `cloudflared-tunnel-token` Secret. +3. The cloudflared `Deployment` (1 replica, pinned to a `control-plane` node via affinity) runs `cloudflared tunnel --no-autoupdate run --token $(TUNNEL_TOKEN) --no-tls-verify`, with `TUNNEL_TOKEN` injected from that Secret's `token` key. + +This connects Cloudflare's edge to internal Traefik so `cms-rec.arcodange.fr` reaches the in-cluster `cms` service without opening any home-LAN port — the cluster side of the tunnel whose Cloudflare side lives in the [Cloudflare IaC](cloudflare.md). + +## CI: building and deploying + +Three Gitea Actions workflows under [`.gitea/workflows/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/.gitea/workflows) cover the site. (A fourth, `cloudflare.yaml`, drives the OpenTofu — see [Cloudflare](cloudflare.md).) + +| Workflow | Triggers | What it does | +|---|---|---| +| [`docker-dependencies.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/.gitea/workflows/docker-dependencies.yaml) | `workflow_dispatch`; push to `main` touching `package.json`, `package-lock.json`, `Dockerfile.deps` | Builds the **deps** base image, pushes `gitea.arcodange.lab/-deps:{latest,YYYYMMDD-SHA8}`, then creates+pushes a **git tag `deps-YYYYMMDD-SHA8`** (with retry, up to 30 attempts) | +| [`docker-content.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/.gitea/workflows/docker-content.yaml) | `workflow_dispatch`; push to `main` touching `nuxt.config.ts`, `app/**`, `content.config.ts`, `content/**`, `public/**`, `package*.json`, `Dockerfile` | Finds the latest `deps-*` git tag, strips `deps-` to get `BASE_TAG`, builds the **full image** with `--build-arg BASE_IMAGE_TAG=$BASE_TAG`, pushes `gitea.arcodange.lab/:{latest,}` | +| [`arcodange_fr.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/.gitea/workflows/arcodange_fr.yaml) | `workflow_dispatch` (input `image_tag`, default `main`) | Pulls `cms:`, `docker create` + `docker cp` to extract **`/prod`** to `./public`, writes a minimal `wrangler.toml`, then **`wrangler pages deploy`** to project `arcodange-cms`, branch `main` | + +> [!IMPORTANT] +> **The deps tag is the contract between the two Docker workflows.** `docker-dependencies` publishes both the `-deps` image and a matching **git tag** `deps-YYYYMMDD-SHA8`; `docker-content` discovers that tag (`git tag --list "deps-*" | sort -V | tail -n1`) to pin its `BASE_IMAGE_TAG`. Touch `package.json`/lockfile/`Dockerfile.deps` and the deps build must land first, or the content build pins a stale base. + +### From image to Cloudflare Pages + +```mermaid +%%{init: {'theme': 'base'}}%% +flowchart LR + classDef ci fill:#059669,stroke:#047857,color:#fff + classDef reg fill:#7c3aed,stroke:#6d28d9,color:#fff + classDef edge fill:#d97706,stroke:#b45309,color:#fff + + DEP["docker-dependencies
→ -deps image + git tag deps-*"]:::ci + CON["docker-content
pins BASE_IMAGE_TAG"]:::ci + REG["registry
gitea.arcodange.lab/arcodange-org/cms"]:::reg + FR["arcodange_fr
extract /prod"]:::ci + PAGES["Cloudflare Pages
project arcodange-cms"]:::edge + K3S["k3s Helm chart
serves /public (staging)"]:::edge + + DEP --> CON --> REG + REG --> FR --> PAGES + REG --> K3S +``` + +1. **`docker-dependencies`** publishes the `-deps` base image and a `deps-YYYYMMDD-SHA8` git tag whenever dependencies change. +2. **`docker-content`** resolves that tag, builds the full dual-tree image, and pushes it to **`gitea.arcodange.lab/arcodange-org/cms`**. +3. **`arcodange_fr`** (manual) pulls that image, extracts the **`/prod`** tree, and deploys it to **Cloudflare Pages** project `arcodange-cms` on branch `main` — this is the live public `arcodange.fr`. +4. In parallel, the k3s **Helm chart** runs the same image and serves the **`/public`** (staging) tree behind Traefik + CrowdSec and the Cloudflared tunnel (`cms-rec.arcodange.fr`, `www.arcodange.lab`). + +## Cross-references + +- [CMS](README.md) — the guidebook hub: the two faces of the repo and the public request/email flow. +- [Cloudflare](cloudflare.md) — the Cloudflare side of the tunnel, the Pages project, and the zone the deploys publish into. +- [Zoho email](zoho-email.md) — mail for the same `arcodange.fr` domain. +- [tools CrowdSec](../tools/components.md) — the Traefik bouncer middleware fronting both chart ingresses. +- [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) — where the Cloudflared tunnel token (`kvv2` `cms/cloudflared`) and registry/CF credentials live. +- Repo: [arcodange-org/cms](https://gitea.arcodange.lab/arcodange-org/cms). diff --git a/vibe/guidebooks/cms/zoho-email.md b/vibe/guidebooks/cms/zoho-email.md new file mode 100644 index 0000000..8d57549 --- /dev/null +++ b/vibe/guidebooks/cms/zoho-email.md @@ -0,0 +1,116 @@ +[vibe](../../README.md) > [Guidebooks](../README.md) > [CMS](README.md) > **Zoho email** + +# Zoho email + +> **Status:** ✅ Active +> **Last Updated:** 2026-06-23 +> **Upstream:** [CMS](README.md) · [Cloudflare](cloudflare.md) +> **Downstream:** [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) +> **Related:** [lab-ecosystem 03 · cms](../lab-ecosystem/03-cms.md) · [tofu CI flow](../factory-provisioning/opentofu/ci-apply-flow.md) · [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md) · [safe-env PRD](../../PRD/safe-prod-like-environment/README.md) + +Email for **arcodange.fr** is hosted at **Zoho Mail (EU region)** and provisioned *entirely from OpenTofu*. There is no Zoho web-console click-ops in the steady state: the same `tofu apply` that owns the Cloudflare zone also drives the Zoho REST API to read the organization, publish the DNS records mail delivery depends on, and create one mailbox alias + one Inbox sub-folder per address. This page lives under [`cms/cloudflare/zoho/`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho), a sub-module of the [Cloudflare](cloudflare.md) tofu root. + +> [!CAUTION] +> **DNS/email changes here are high-stakes and slow to fail.** A wrong MX, SPF, DKIM, or DMARC record silently degrades or breaks `arcodange.fr` deliverability for **days** — receivers cache TTLs, reputation decays, and there is no synchronous error to catch in CI. DMARC is published as **`p=reject`**, so a broken SPF/DKIM alignment means conforming receivers *drop* legitimate mail outright rather than quarantine it. This is a prime motivation for the **safe environment**: changes to this module must be validated **plan-only against a throwaway/clone zone**, never iterated directly against the live `arcodange.fr` zone. See the [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md) and the [safe-env PRD](../../PRD/safe-prod-like-environment/README.md). + +## How the module is wired + +The Cloudflare root ([`cloudflare/iac.tf`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/iac.tf)) instantiates `module "zoho"`, passing it the live zone and domain plus the OAuth client credentials: + +| Input | Source | Purpose | +|---|---|---| +| `domain_name` | `ovh_domain_name.arcodange_fr.domain_name` | the domain to manage (`arcodange.fr`) | +| `dns_zone_id` | `cloudflare_zone.arcodange_fr.id` | Cloudflare zone the DNS records land in | +| `zoho_client_id` | `var.ZOHO_CLIENT_ID` (Vault `kvv1/zoho/self_client`) | OAuth2 self-client id | +| `zoho_client_secret` | `var.ZOHO_CLIENT_SECRET` (Vault `kvv1/zoho/self_client`) | OAuth2 self-client secret | + +In CI the secrets are injected by the `vault-action` step in [`.gitea/workflows/cloudflare.yaml`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/.gitea/workflows/cloudflare.yaml), which maps the whole `kvv1/zoho/self_client` KV-v1 secret into **both** the shell env (`ZOHO_*`, consumed by the helper scripts) and the tofu vars (`TF_VAR_ZOHO_*`, consumed by `config.tf`): + +``` +kvv1/zoho/self_client * | ZOHO_ ; +kvv1/zoho/self_client * | TF_VAR_ZOHO_ ; +``` + +## OAuth2: client-credentials flow + +Zoho is a self-client (machine-to-machine) integration on the **EU** datacenter — every host is `*.zoho.eu` / `accounts.zoho.eu`. Authentication uses the OAuth2 **`client_credentials`** grant; there is no interactive user consent in the running flow (a commented device-code flow remains in [`.env`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/.env) as historical bootstrap). + +The token is minted in [`config.tf`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho/config.tf) via a `data "http"` POST to `https://accounts.zoho.eu/oauth/v2/token` with `grant_type=client_credentials` and the comma-joined scope list. The bearer is then folded into an `Authorization: Zoho-oauthtoken ` header (`local.auth_headers`) reused by every subsequent read. + +| Scope | Access | Why it is needed | +|---|---|---| +| `ZohoMail.partner.organization.READ` | READ (org) | resolve the org **ZOID** | +| `ZohoMail.organization.accounts.READ` | READ (accounts) | find the super-admin **account id / zuid** | +| `ZohoMail.organization.accounts.UPDATE` | UPDATE (accounts) | add / remove email aliases | +| `ZohoMail.organization.domains.READ` | READ (domains) | fetch the domain verification code + DKIM public key | +| `ZohoMail.folders.ALL` | ALL (folders) | list and create per-alias Inbox sub-folders | + +Lookup chain (each step feeds the next): + +1. `GET https://mail.zoho.eu/api/organization` → `local.org`, from which `zoid` builds `local.api_prefix = https://mail.zoho.eu/api/organization/`. +2. `GET {api_prefix}/domains/{domain_name}` ([`dns.tf`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho/dns.tf)) → `local.domain`, exposing `CNAMEVerificationCode` and `dkimDetailList[0].publicKey`. +3. `GET {api_prefix}/accounts` ([`email_aliases.tf`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho/email_aliases.tf)) → the single `iamUserRole == "super_admin"` account, giving its `accountId` and `zuid`. + +## DNS records published on the Cloudflare zone + +[`modules/zoho_mail_dns`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho/modules/zoho_mail_dns) materialises every `cloudflare_dns_record` Zoho mail needs onto the live zone. The DKIM key and verification code are read live from the Zoho domain API (step 2 above) and passed in as module inputs, so the records always track what Zoho actually expects. All records use **TTL 3600** and apply to the apex (`@`) unless noted. + +| Name | Type | Value | Purpose | +|---|---|---|---| +| `@` | TXT | `"zoho-verification=.zmverify.zoho.eu"` | proves domain ownership to Zoho | +| `@` | MX | `mx.zoho.eu` (priority **10**) | primary inbound mail exchanger | +| `@` | MX | `mx2.zoho.eu` (priority **20**) | secondary mail exchanger | +| `@` | MX | `mx3.zoho.eu` (priority **50**) | tertiary mail exchanger | +| `@` | TXT | `"v=spf1 include:zohomail.eu ~all"` | SPF: authorise Zoho to send for the domain | +| `zmail._domainkey` | TXT | `""` (from `dkimDetailList[0].publicKey`) | DKIM public key for outbound signing | +| `_dmarc` | TXT | `"v=DMARC1; p=reject; rua=mailto:arcodange@gmail.com; ruf=mailto:arcodange@gmail.com; sp=reject; adkim=r; aspf=r; pct=100"` | DMARC policy: **reject** non-aligned mail, 100% coverage, aggregate+forensic reports to `arcodange@gmail.com` | +| `default._bimi` | TXT | `"v=BIMI1; l=https://arcodange.fr/.well-known/logo.svg; avp=brand;"` | BIMI: display the brand logo beside authenticated mail (created only when `bimi_logo_url != null`) | + +> [!WARNING] +> The DMARC policy is the strictest tier: `p=reject` **and** `sp=reject` (subdomains) with relaxed alignment (`adkim=r`, `aspf=r`) and `pct=100`. There is no `quarantine` grace band — any message that fails both SPF *and* DKIM alignment is rejected by conforming receivers. Validate SPF/DKIM correctness in the safe environment before touching the live `_dmarc` or apex records. + +## Email aliases + +Seven addresses are defined as a single map in [`email_aliases.tf`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho/email_aliases.tf) (`local.email_aliases`). Each is provisioned **twice** against the super-admin mailbox: as an **email alias** on the account, and as a matching **Inbox sub-folder** so mail to that address can be filtered into its own folder. + +| Alias (`@arcodange.fr`) | Display name | Purpose | +|---|---|---| +| `bonjour` | `Service Bonjour` | commercial / sales | +| `bureaux` | `Bureaux Arcodange` | official bodies (URSSAF, administration) | +| `contact` | `Premier Contact` | website contact form | +| `helloworld` | `✅ Arcodange 🏹💻🪽` | social networks, newsletter | +| `analytics` | `Analytics 📊🔍` | social networks, newsletter | +| `books` | `Accounting 📒🧮` | accounting / bookkeeping | +| `abonnements` | `Abonnements 📱🤖` | subscriptions (phone, AI, services) | + +Provisioning is *imperative-inside-declarative*: each alias is a `terraform_data` resource whose `triggers_replace` watches whether the alias/folder is already present, and whose `local-exec` provisioners shell out to [`zoho_api_call.sh`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho/zoho_api_call.sh) on **create** and **destroy**: + +1. **Alias create** — `PUT {api_prefix}/accounts/{zuid}` with `mode=addEmailAlias`, scope `ZohoMail.organization.accounts.UPDATE`; fails fast if the response contains `OPERATION_NOT_PERMITTED`. +2. **Alias destroy** — same endpoint with `mode=deleteEmailAlias` (the bare local-part, split off the `alias:display` key). +3. **Folder create** — `POST /api/accounts/{accountId}/folders` with `parentFolderId` = the resolved **Inbox** folder id, scope `ZohoMail.folders.ALL`. +4. **Folder destroy** — looks the folder id up by name, `DELETE`s it, then also sweeps the corresponding `/Trash/` (or `/Trash/Inbox_`) folder Zoho leaves behind. + +> [!NOTE] +> `terraform_data` + `local-exec` is used because aliases and folders are Zoho-side mutations with no first-class Terraform provider. The `triggers_replace = { missing = !contains(...) }` guard makes the apply idempotent: the provisioner only re-runs when the alias/folder is genuinely absent, so a clean plan is a no-op rather than a re-create. + +## Helper scripts + +Both scripts live beside the tofu and are invoked from `local-exec`. They share the OAuth client env vars (`ZOHO_CLIENT_ID`, `ZOHO_CLIENT_SECRET`, `ZOHO_TOKEN_ENDPOINT`) injected from Vault. + +| Script | Role | +|---|---| +| [`zoho_api_call.sh`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho/zoho_api_call.sh) | Thin HTTP wrapper. Parses `--endpoint`, `-x=`, `--scope`, `--data_json` / `--data_url`, and `--fail_if_str_in_resp`; sources `zoho_gen_token.sh`, attaches the bearer header, `curl`s the call, fails if a sentinel string (e.g. `OPERATION_NOT_PERMITTED`) appears, and emits compact JSON via `jq`. | +| [`zoho_gen_token.sh`](https://gitea.arcodange.lab/arcodange-org/cms/src/branch/main/cloudflare/zoho/zoho_gen_token.sh) | OAuth token cache. `gen_zoho_token ` returns a cached token from `/tmp/zoho_oauth_tokens.cache` when fresh, otherwise mints a new `client_credentials` token and stores it. | + +`zoho_gen_token.sh` is **lock-based and TTL-bounded**: + +- A mutex is taken by `mkdir /tmp/zoho_oauth_tokens.lock` (atomic dir creation), with up to 10 one-second retries, so concurrent `local-exec` provisioners don't corrupt the cache. The lock is released on every function exit via `trap`. +- Tokens are keyed by scope in `/tmp/zoho_oauth_tokens.cache` (file mode `600`). A token is reused only while younger than **3600 s (~1 h)**; `cleanup_cache` prunes expired entries on each call. +- The wrapper runs `cleanup_cache` before each request and re-traps it on `INT TERM EXIT`, so stale tokens never leak past their TTL. + +## Cross-references + +- **Parent tofu / zone & Pages:** [Cloudflare](cloudflare.md) — owns `cloudflare_zone.arcodange_fr` that this module writes records into, and the `vault-action` CI step that supplies the credentials. +- **Where these secrets come from:** [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) (`kvv1/zoho/self_client`). +- **How apply runs:** [tofu CI flow](../factory-provisioning/opentofu/ci-apply-flow.md). +- **Why a safe environment exists:** [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md) · [safe-env PRD](../../PRD/safe-prod-like-environment/README.md). diff --git a/vibe/guidebooks/lab-ecosystem/02-tools.md b/vibe/guidebooks/lab-ecosystem/02-tools.md index 116b7ef..874f342 100644 --- a/vibe/guidebooks/lab-ecosystem/02-tools.md +++ b/vibe/guidebooks/lab-ecosystem/02-tools.md @@ -5,6 +5,7 @@ > **Status:** ✅ Active > **Last Updated:** 2026-06-23 > **Upstream:** [01 · factory](01-factory.md) +> **Deeper dive:** [Tools guidebook](../tools/README.md) — deploy model, component inventory, and per-component internals > **Related:** [secrets-and-vault.md](secrets-and-vault.md) · [storage-and-recovery.md](storage-and-recovery.md) The [`tools` repo](https://gitea.arcodange.lab/arcodange-org/tools) is deployed by factory's ArgoCD into the **`tools` namespace**. It is the platform layer that every app namespace depends on: secrets (Vault + VSO), observability (Prometheus + Grafana), edge security (CrowdSec), database pooling (pgbouncer / pgcat), caching (Redis/KeyDB), and analytics (Plausible + ClickHouse). Each component ships its own Helm chart or Kustomize overlay, and most carry an `iac/` directory of OpenTofu that declares the Vault config (roles, policies, dynamic-secret backends) that wires the component to secrets — see [secrets-and-vault.md](secrets-and-vault.md). @@ -69,6 +70,7 @@ flowchart TB ## Cross-references +- [Tools guidebook](../tools/README.md) — the deeper dive: deploy model (one ArgoCD app → meta-chart → per-component Applications), full component inventory, and per-component internals. - [Lab ecosystem hub](README.md) — the whole-lab map. - [01 · factory](01-factory.md) — the ArgoCD that deploys this namespace, and the `postgres/iac/` roles + `user_lookup()` that pgbouncer consumes. - [03 · cms](03-cms.md) — the public edge protected by **CrowdSec** (Turnstile → CrowdSec wiring). diff --git a/vibe/guidebooks/lab-ecosystem/03-cms.md b/vibe/guidebooks/lab-ecosystem/03-cms.md index 2bcdaad..930d04a 100644 --- a/vibe/guidebooks/lab-ecosystem/03-cms.md +++ b/vibe/guidebooks/lab-ecosystem/03-cms.md @@ -6,6 +6,7 @@ > **Last Updated:** 2026-06-23 > **Upstream:** [01 · factory](01-factory.md) > **Related:** [02 · tools](02-tools.md) · [secrets-and-vault.md](secrets-and-vault.md) +> **Deeper dive:** [CMS guidebook](../cms/README.md) The [`cms` repo](https://gitea.arcodange.lab/arcodange-org/cms) is the **public-facing site** of the lab: a Nuxt static site served at **`arcodange.fr`**, plus the OpenTofu that owns its Cloudflare edge and its Zoho email. It is the one app whose primary audience is the open Internet, so it ties together the public-DNS, tunnel, CAPTCHA, and email plumbing. @@ -75,6 +76,7 @@ flowchart LR ## Cross-references +- [CMS guidebook](../cms/README.md) — the deeper-dive map of the `cms` repo: the Nuxt site, the Cloudflare edge, and Zoho email. - [Lab ecosystem hub](README.md) — the whole-lab map. - [01 · factory](01-factory.md) — the ArgoCD app `cms`, and `iac/cloudflare.tf` / `iac/ovh.tf` that grant the CMS its Cloudflare token and OVH nameserver-edit rights. - [02 · tools](02-tools.md) — **CrowdSec** (the Traefik bouncer the Turnstile challenge feeds). diff --git a/vibe/guidebooks/tools/README.md b/vibe/guidebooks/tools/README.md new file mode 100644 index 0000000..6ae0afc --- /dev/null +++ b/vibe/guidebooks/tools/README.md @@ -0,0 +1,114 @@ +[vibe](../../README.md) > [Guidebooks](../README.md) > **Tools** + +# Tools + +> **Status:** ✅ Active +> **Last Updated:** 2026-06-23 +> **Upstream:** [Guidebooks index](../README.md) · [lab-ecosystem 02 · tools](../lab-ecosystem/02-tools.md) +> **Downstream:** [Components](components.md) · [Secrets & VSO](secrets-and-vso.md) +> **Related:** [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) · [tofu CI apply flow](../factory-provisioning/opentofu/ci-apply-flow.md) · [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md) + +The [`tools` repo](https://gitea.arcodange.lab/arcodange-org/tools) is the lab's **platform layer**: the cluster-wide services every app namespace leans on — secrets (Vault + VSO), observability (Prometheus + Grafana), edge security (CrowdSec), database pooling (pgbouncer), caching (Redis/KeyDB), and analytics (Plausible + ClickHouse). Everything in this repo lands in the single **`tools` namespace**. + +This hub explains the **deploy model** — how one factory-owned ArgoCD Application fans out into one Application per component — and gives a **component inventory**. For per-component internals see [Components](components.md); for how secrets reach the pods see [Secrets & VSO](secrets-and-vso.md). + +## Deploy model + +The whole repo is wired into the cluster through a single **meta-chart** that factory's ArgoCD points at: + +1. Factory's ArgoCD declares **one** Application named `tools` whose source is this repo's [`chart/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart) meta-chart. +2. That meta-chart renders two kinds of object from [`chart/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/values.yaml): + - an **AppProject** named `tools` ([`chart/templates/project.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/templates/project.yaml)) that pins every child Application to `sourceRepos: tools` and `destinations: tools` namespace only; + - one ArgoCD **Application per component** ([`chart/templates/apps.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/templates/apps.yaml) — a `range` over `.Values.tools`), each pointing `path:` at the matching **top-level directory** of the repo (`path: pgbouncer`, `path: grafana`, …). +3. Each child Application targets `namespace: tools`, with `automated` sync (`prune: true`, `selfHeal: true`) and `CreateNamespace=true`. +4. A component directory is **either** a Helm chart (`Chart.yaml` whose `dependencies:` pull the upstream chart + the `tool` library) **or** a Kustomize overlay (`kustomization.yaml` using a `helmCharts:` inflation generator). +5. [`tool/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/tool) is a Helm **library chart** (`type: library`): it ships shared templates/helpers consumed by the component charts via `dependencies:` and is **not deployable** on its own. + +> [!NOTE] +> A component is deployed **only if it appears as a key under `tools:` in [`chart/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/values.yaml)**. `pgcat` is present in the repo but commented out there, so no Application is rendered for it. + +## Component inventory + +| Component | How declared (chart + version OR Kustomize) | Ingress host | Persistence | Purpose | +|---|---|---|---|---| +| **hashicorp-vault** | Helm — `hashicorp/vault` `0.28.1` (+ `tool` lib) | `vault.arcodange.lab` (Traefik, Let's Encrypt) | `storage "file"` at `/vault/data` + audit storage (PVC) | Secrets engine: KV, transit, PostgreSQL dynamic creds; auth `kubernetes` + Gitea OIDC/JWT | +| **vault-secrets-operator (VSO)** | Helm — `hashicorp/vault-secrets-operator` `0.9.0`, a dependency of the `hashicorp-vault` chart | — | — | Injects Vault secrets into pods via `VaultAuth` / `VaultDynamicSecret` CRDs; client-cache `direct-encrypted` via transit | +| **prometheus** | Helm — `prometheus-community/prometheus` `28.13.0` (app `v3.10.0`) | none (in-cluster) | `persistentVolume` enabled, `8Gi` | Metrics scraping + TSDB storage | +| **grafana** | Helm — `grafana/grafana` `10.3.0` (+ `tool` lib) | `grafana.arcodange.lab` (Traefik, Let's Encrypt) | `persistence.enabled: false` (ephemeral; dashboards provisioned) | Dashboards; datasources Prometheus + ClickHouse | +| **crowdsec** | Helm — `crowdsecurity/crowdsec` `0.20.1` (+ `tool` lib) | none (Traefik bouncer + AppSec on the edge) | LAPI state in external PostgreSQL (via pgbouncer) | Behavioural detection; agent parses Traefik logs, AppSec virtual-patching | +| **pgbouncer** | Helm — `icoretech/pgbouncer` `2.3.1` (+ `tool` lib) | none (cluster service `pgbouncer.tools`) | stateless (config only) | Connection pooler to the **external** PostgreSQL on `pi2` (`192.168.1.202`), pinned via `kubernetes.io/hostname: pi2` | +| **redis / KeyDB** | Helm — `pascaliske/redis` `2.1.0` (+ `tool` lib) | none (cluster service) | PVC `create: true`, `1Gi` at `/data` | In-memory cache; KeyDB master + replica, Redis-compatible | +| **plausible** | **Kustomize** — inflates `pascaliske/plausible` `2.0.0` | `analytics.arcodange.lab` (Traefik `IngressRoute`, Let's Encrypt) | stateless app; data lives in ClickHouse | Privacy-friendly web analytics; `DB_HOST: pgbouncer.tools` | +| **clickhouse** | **Kustomize** — inflates `pascaliske/clickhouse` `0.4.0` + local `databases` chart | none (cluster service) | PVC `16Gi` (StatefulSet) | OLAP column store backing Plausible | +| **pgcat** *(disabled)* | Helm — `improwised/pgcat` `0.1.0` — **commented out** in `chart/values.yaml` | — | — | Alternative pooler; not rendered (too constraining: must list every db/user, md5-only auth) | +| **tool** *(library)* | Helm **library chart** (`type: library`), not deployable | — | — | Shared templates/helpers consumed by the component charts | + +## How tools fit together + +```mermaid +%%{init: {'theme': 'base'}}%% +flowchart TB + classDef ext fill:#7c3aed,stroke:#6d28d9,color:#fff + classDef proc fill:#059669,stroke:#047857,color:#fff + classDef edge fill:#d97706,stroke:#b45309,color:#fff + classDef meta fill:#2563eb,stroke:#1e40af,color:#fff + + ARGOCD["factory ArgoCD
Application: tools"]:::meta + META["tools meta-chart
chart/ (apps.yaml + project.yaml)"]:::meta + PROJ["AppProject: tools"]:::meta + + subgraph NS["tools namespace"] + VAULT[("hashicorp-vault
+ VSO")]:::ext + PROM["prometheus"]:::proc + GRAF["grafana"]:::proc + CS["crowdsec
Traefik bouncer + AppSec"]:::edge + PGB["pgbouncer"]:::proc + REDIS[("redis / KeyDB")]:::ext + PLA["plausible"]:::proc + CH[("clickhouse")]:::ext + PODS["app + tool pods"]:::proc + end + + PG[("external PostgreSQL
pi2 · 192.168.1.202")]:::ext + TRAEFIK["Traefik ingress
vault / grafana / analytics .arcodange.lab"]:::edge + + ARGOCD --> META + META --> PROJ + META -- "one Application per component" --> NS + VAULT -- "inject secrets (VSO)" --> PODS + PGB -- "pools to" --> PG + PLA -- "writes analytics" --> CH + PROM --> GRAF + CH --> GRAF + TRAEFIK --> VAULT + TRAEFIK --> GRAF + TRAEFIK --> PLA + CS -- "fronts the edge" --> TRAEFIK +``` + +1. **Factory's ArgoCD** owns a single Application named `tools` pointed at this repo's `chart/` meta-chart. +2. The **meta-chart** renders the `tools` **AppProject** (which scopes every child to the `tools` repo + `tools` namespace) and **one Application per component** listed under `tools:` in `chart/values.yaml`. +3. Every child Application deploys into the **`tools` namespace** — Vault+VSO, Prometheus, Grafana, CrowdSec, pgbouncer, Redis/KeyDB, Plausible, ClickHouse. +4. **Vault + VSO** inject secrets into app and tool pods via the `VaultAuth` / `VaultDynamicSecret` CRDs. +5. **pgbouncer** pools connections out to the **external PostgreSQL** on `pi2` (`192.168.1.202`), the same database CrowdSec's LAPI and Plausible use through it. +6. **Plausible** writes analytics into **ClickHouse**; both **Prometheus** and **ClickHouse** are wired as **Grafana** datasources. +7. **Traefik** publishes `vault.arcodange.lab`, `grafana.arcodange.lab`, and `analytics.arcodange.lab` over Let's Encrypt, with **CrowdSec** running as the bouncer/AppSec layer fronting that edge. + +## Pages in this guidebook + +| Page | What it covers | Status | +|---|---|---| +| [Components](components.md) | Per-component internals: chart values, ingress, persistence, how each gets its secrets | ✅ Active | +| [Secrets & VSO](secrets-and-vso.md) | How Vault + the Vault Secrets Operator deliver static and dynamic secrets into `tools` pods | ✅ Active | + +## Maintenance rule + +> [!IMPORTANT] +> **If a component in the `tools` repo changes, update this guidebook in the same change.** Adding or removing a key under `tools:` in `chart/values.yaml`, bumping an upstream chart version, switching a component between Helm and Kustomize, or changing an ingress host or persistence size all alter the inventory above — keep the table and the diagram in sync as part of the same PR. A reference map that drifts from reality sends readers (and agents) confidently down dead paths. + +## Cross-references + +- [lab-ecosystem 02 · tools](../lab-ecosystem/02-tools.md) — the parent whole-lab view of this namespace. +- [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) — the lab-wide Vault model these services depend on. +- [tofu CI apply flow](../factory-provisioning/opentofu/ci-apply-flow.md) — how each component's `iac/` (Vault config) is applied. +- [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md) — why a safe, prod-like environment shapes how these platform services are run. diff --git a/vibe/guidebooks/tools/components.md b/vibe/guidebooks/tools/components.md new file mode 100644 index 0000000..020b226 --- /dev/null +++ b/vibe/guidebooks/tools/components.md @@ -0,0 +1,218 @@ +[vibe](../../README.md) > [Guidebooks](../README.md) > [Tools](README.md) > **Components** + +# Components + +> **Status:** ✅ Active +> **Last Updated:** 2026-06-23 +> **Upstream:** [Tools hub](README.md) · [lab-ecosystem 02 · tools](../lab-ecosystem/02-tools.md) +> **Downstream:** [Secrets & VSO](secrets-and-vso.md) +> **Related:** [storage & recovery concept](../lab-ecosystem/storage-and-recovery.md) · [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) · [naming conventions](../lab-ecosystem/naming-conventions.md) + +This is the **per-component reference** for the `tools` platform layer: pinned chart/app versions, the values that actually matter (replicas, storage, ports, auth), and the cross-service wiring. Every component lands in the single **`tools` namespace**. For the deploy model (how one ArgoCD Application fans out into one per component) see the [Tools hub](README.md); for how Vault secrets reach the pods see [Secrets & VSO](secrets-and-vso.md). + +Components split into two **tiers**: + +- **Tier 1** — the load-bearing services, each with its own subsection and value tables below. +- **Tier 2** — supporting / inactive pieces, summarised in a single table. + +Severity legend (GitHub alerts): `[!NOTE]` informational · `[!TIP]` good-to-know · `[!WARNING]` operational hazard · `[!CAUTION]` live risk. + +--- + +## Tier 1 — load-bearing services + +### hashicorp-vault + +[`hashicorp-vault/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault) — the lab's secrets brain. The chart bundles **three** dependencies: the upstream `vault` server, the `vault-secrets-operator` (VSO) that injects secrets into pods, and the shared `tool` library chart. + +| Key | Value | +|---|---| +| Chart deps | `vault` `0.28.1`, `vault-secrets-operator` `0.9.0`, `tool` `0.1.0` | +| Mode | `standalone` (single instance, **not** HA / raft) | +| Storage | `storage "file"` at `/vault/data` + audit storage enabled | +| Listener | TLS **off** (`tls_disable = 1`) on `[::]:8200` — terminated at the edge | +| Ingress | `vault.arcodange.lab` (Traefik `websecure`, Let's Encrypt, `localIp@file` middleware) | +| UI | enabled (`ui = true`) | +| Log level | `trace` | + +**Mounts (secret engines) exposed:** + +| Mount | Type | Purpose | +|---|---|---| +| `kvv1` | KV v1 | Static secrets (legacy / v1 layout) | +| `kvv2` | KV v2 | Versioned static secrets (primary store) | +| `transit` | transit | Encryption-as-a-service; backs VSO client-cache (`vso-client-cache` key) | +| `postgres` | database | Dynamic PostgreSQL credentials (connection via `pgbouncer.tools:5432`) | + +**Auth methods enabled:** + +| Method | Used by | +|---|---| +| `kubernetes` | In-cluster workloads (VSO, app ServiceAccounts) authenticate by SA token | +| `gitea_jwt` | Gitea Actions / OIDC-JWT pipelines authenticate from CI | + +> [!NOTE] +> The full secret-engine layout, VSO `VaultAuth` / `VaultConnection` / `VaultDynamicSecret` wiring, and the `kvv2/data/...` path conventions are documented in [Secrets & VSO](secrets-and-vso.md) — this page only inventories what the chart stands up. + +The VSO sub-chart ships a `defaultVaultConnection` pointing at `http://hashicorp-vault.tools.svc.cluster.local:8200` and a client cache with `persistenceModel: direct-encrypted`, encrypted through the `transit` mount. + +### prometheus + +[`prometheus/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/prometheus) — metrics collection and TSDB, via the `kube-prometheus`-style community chart. + +| Key | Value | +|---|---| +| Chart deps | `prometheus` `28.13.0` (app `v3.10.0`), `tool` `0.1.0` | +| Server replicas | `1` (Deployment, `strategy: Recreate`) | +| Server storage | `persistentVolume` enabled, **8Gi** at `/data` (`ReadWriteOnce`) | +| Retention | `15d` | +| Alertmanager | enabled, persistence **2Gi** (`ReadWriteOnce`) | +| node-exporter | enabled (DaemonSet, `prometheus-node-exporter` sub-chart) | +| kube-state-metrics | enabled | +| pushgateway | enabled (`prometheus.io/probe: pushgateway`) | +| Scrape / eval interval | `1m` (scrape timeout `10s`) | +| Ingress | none — **internal only** | + +**Scrape targets** (default `scrapeConfigs`, all enabled): the Prometheus server itself, the Kubernetes API servers, nodes + kubelet cadvisor, plus **annotation-based** service-endpoint and pod discovery (`prometheus.io/scrape`, `prometheus.io/port`, `prometheus.io/path`, `prometheus.io/scheme`), with `*-slow` (5m) variants for cheaper targets. + +### grafana + +[`grafana/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/grafana) — dashboards over Prometheus and ClickHouse. + +| Key | Value | +|---|---| +| Chart deps | `grafana` `10.3.0` (app `latest`), `tool` `0.1.0` | +| Replicas | `1` (Deployment, `RollingUpdate`) | +| Persistence | **disabled** — ephemeral; dashboards/datasources are provisioned at boot | +| Ingress | `grafana.arcodange.lab` (Traefik `websecure`, Let's Encrypt, `localIp@file` middleware) | +| Plugin | `grafana-clickhouse-datasource` | +| Resources | requests `100m` / `128Mi`, limits `100m` / `512Mi` | +| Timezone | `Europe/Paris` | + +**Datasources (provisioned):** + +| Name | Type | Target | Default | +|---|---|---|---| +| Prometheus | `prometheus` | `http://prometheus-server.tools.svc.cluster.local` | ✅ yes | +| clickhouse | `grafana-clickhouse-datasource` | `clickhouse.tools.svc.cluster.local:9000` (native, `tlsSkipVerify`) | no | + +> [!WARNING] +> The Grafana **admin password is static and committed** in [`grafana/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/grafana/values.yaml) (`adminUser: admin`). The provisioned ClickHouse datasource password is committed there too (`secureJsonData.password`). Treat these as lab-only credentials; do not reuse them outside the homelab. + +### crowdsec + +[`crowdsec/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec) — behavioural edge security that feeds a Traefik blocklist. + +| Key | Value | +|---|---| +| Chart deps | `crowdsec` `0.20.1`, `tool` `0.1.0` | +| LAPI | Deployment (`RollingUpdate`, `maxUnavailable: 0`) — the local API + decision store | +| Agent | DaemonSet pinned to control-plane nodes (`node-role.kubernetes.io/control-plane`) | +| Log source | parses **Traefik** pod logs in `kube-system` (`podName: traefik-*`, `program: traefik`) | +| Collections | `crowdsecurity/traefik`, `crowdsecurity/http-cve` (+ AppSec rules below) | +| AppSec (WAF) | **enabled** — `crowdsecurity/appsec-default` on `0.0.0.0:7422`; collections `appsec-virtual-patching` + `appsec-generic-rules` | +| Database | external PostgreSQL `crowdsec` via **pgbouncer** (`host: pgbouncer.tools:5432`, `type: postgresql`) | +| DB credentials | dynamic, from secret `crowdsec-db-credentials` (`DB_USER` / `DB_PASSWORD`, sourced via VSO) | +| Console | enrolled as instance `homelab` | + +The decisions CrowdSec produces are surfaced as a **Traefik middleware blocklist applied at the edge**, so malicious IPs are dropped before they reach app namespaces. `server_reset_query: DEALLOCATE ALL` on pgbouncer (below) exists specifically to keep CrowdSec's prepared statements happy through the pooler. The CAPTCHA challenge CrowdSec serves on remediated requests is a **Cloudflare Turnstile widget minted by the `cms` repo** — see the [CMS Cloudflare page](../cms/cloudflare.md), which produces the sitekey/secret this bouncer consumes from Vault. + +### pgbouncer + +[`pgbouncer/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/pgbouncer) — the connection pooler in front of the **external** PostgreSQL. + +| Key | Value | +|---|---| +| Chart deps | `pgbouncer` `2.3.1` (`icoretech/pgbouncer`), `tool` `0.1.0` | +| Scheduling | `nodeSelector: kubernetes.io/hostname: pi2` (co-located with PostgreSQL) | +| Upstream DB | external PostgreSQL at `192.168.1.202:5432` (the `pi2` host), wildcard database `"*"` | +| Auth type | `scram-sha-256` | +| `auth_query` | `SELECT uname, phash FROM user_lookup($1)` | +| `server_reset_query` | `DEALLOCATE ALL` (clears prepared statements — fixes CrowdSec re-use) | +| `server_idle_timeout` | `7200` (2h) | +| `ignore_startup_parameters` | `extra_float_digits` (unsupported JDBC arg) | +| Exporter | disabled | +| Service | `pgbouncer.tools:5432` (cluster-internal) | + +> [!NOTE] +> pgbouncer is the single front door to the lab's PostgreSQL: CrowdSec, Plausible, and Vault's `postgres` dynamic-secret backend all connect through `pgbouncer.tools:5432`, never to `192.168.1.202` directly. + +### redis (KeyDB) + +[`redis/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/redis) — the in-memory cache / session store. The chart targets **KeyDB** (EqAlpha, Redis-compatible), tuned for the 2× Raspberry Pi 5 nodes. + +| Key | Value | +|---|---| +| Chart deps | `redis` `2.1.0` (`pascaliske/redis`), `tool` `0.1.0` | +| Workload | **StatefulSet** (master at index 0, replica running `replicaof` the master) | +| Storage | PVC `create: true`, **1Gi** at `/data` (`ReadWriteOnce`) | +| Tuning | `server-threads 4` (ARM-tuned for the Pi 5 cores) | +| Port | `6379` (`ClusterIP`) | +| Security | `runAsUser/Group/fsGroup: 999`, non-root | +| Timezone | `Europe/Paris` | + +> [!NOTE] +> Access the instance for inspection with `kubectl port-forward -n tools svc/redis 6379:6379` and Redis Insights (per the [chart README](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/redis/README.md)). + +### plausible + +[`plausible/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible) — privacy-friendly web analytics. Deployed via a **Kustomize** overlay that inflates the upstream Helm chart (not a `Chart.yaml` dependency like the Tier-1 charts above). + +| Key | Value | +|---|---| +| Declared via | Kustomize `helmCharts:` inflation generator | +| Chart / version | `plausible` `2.0.0` (`pascaliske/plausible`), image `ghcr.io/plausible/community-edition` | +| Replicas | `1` (Deployment) | +| Ingress | `analytics.arcodange.lab` (Traefik IngressRoute, Let's Encrypt, `localIp@file` middleware) | +| App DB | PostgreSQL via **pgbouncer** — an **init container** assembles `DATABASE_URL` from VSO dynamic creds | +| Event store | **ClickHouse** (see below) | +| GeoIP | MaxMind **GeoLite2** (`GeoLite2-Country` + `GeoLite2-City`), license key from secret `plausible-geoip` | +| Secrets | `SECRET_KEY_BASE` / `TOTP_VAULT_KEY` from existing secret `plausible-config` (VSO-fed) | + +Plausible writes analytics events to ClickHouse and stores app/account state in PostgreSQL — two distinct backends, both reached through lab-internal services. + +### clickhouse + +[`clickhouse/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/clickhouse) — the OLAP column store behind Plausible. Also a **Kustomize** overlay inflating the upstream chart, plus a `databases` sub-chart that runs an init job. + +| Key | Value | +|---|---| +| Declared via | Kustomize `helmCharts:` inflation generator (`chartHome: charts`) | +| Chart / version | `clickhouse` `0.4.0` (`pascaliske/clickhouse`), image `clickhouse/clickhouse-server` | +| Workload | **StatefulSet**, `replicas: 1` | +| Storage | PVC **16Gi** at `/var/lib/clickhouse` (`ReadWriteOnce`) | +| Ports | `8123` (HTTP), `9000` (native protocol) | +| Custom user | `arcodange` (full network access, `access_management: 1`) via `custom-users.xml` | +| Security | `runAsUser/Group/fsGroup: 101`, non-root | +| Timezone | `Europe/Paris` | + +> [!WARNING] +> The ClickHouse `arcodange` user password is **static and committed** in [`clickhouse/clickhouseValues.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/clickhouse/clickhouseValues.yaml) (`custom-users.xml`). The same value appears in Grafana's provisioned datasource — keep the two in sync if you rotate it. + +> [!CAUTION] +> ClickHouse carries a `nodeAffinity` that **excludes `pi2`** (`kubernetes.io/hostname NotIn [pi2]`). `pi2` hosts PostgreSQL and pgbouncer; ClickHouse is deliberately kept off it to avoid I/O contention on that node. A cluster where `pi2` is the only schedulable node will leave ClickHouse `Pending`. + +--- + +## Tier 2 — supporting & inactive + +| Component | Status | Notes | +|---|---|---| +| [`pgcat/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/pgcat) | ❌ disabled | Alternative Postgres pooler (`pgcat` chart `0.1.0`). Not in service — its sole pool has empty `username`/`password`/`database` placeholders, and it is **not** keyed under `tools:` in [`chart/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/values.yaml), so ArgoCD renders no Application for it. [pgbouncer](#pgbouncer) is the active pooler. | +| [`tool/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/tool) | ✅ active (library) | Helm **library chart** (`type: library`, version `0.1.0`) consumed by **every** component chart via `dependencies:`. Ships shared templates/helpers; **not deployable** on its own. | + +--- + +## Gotchas + +> [!WARNING] +> **No high availability.** Every Tier-1 service runs a **single replica** — Vault (`standalone`), Prometheus (`replicaCount: 1`), Grafana (`replicas: 1`), ClickHouse and Redis/KeyDB StatefulSets (`replicas: 1`), Plausible and the CrowdSec LAPI (single Deployment). Any node drain or pod restart is a brief outage for that service, not a failover. + +> [!WARNING] +> **Static, committed passwords.** Grafana admin (+ its ClickHouse datasource), the ClickHouse `arcodange` user, and the pgbouncer admin/auth users all carry plaintext credentials in their `values.yaml`. They are lab-only; rotate before any exposure and never copy them to a real environment. + +> [!CAUTION] +> **ClickHouse must avoid `pi2`.** The `NotIn [pi2]` `nodeAffinity` keeps it off the PostgreSQL/pgbouncer host. If `pi2` is the only schedulable node, ClickHouse (and therefore Plausible analytics) stays `Pending`. See the [storage & recovery concept](../lab-ecosystem/storage-and-recovery.md) for how PVC-backed services map onto specific nodes. + +> [!CAUTION] +> **Vault is single-instance and starts sealed.** After **any** restart (pod reschedule, node reboot, chart upgrade) Vault comes up **sealed** with no automatic unseal configured — every VSO injection and dynamic-secret lease blocks until an operator unseals it. This is the first thing to check when secrets stop flowing across the cluster; the unseal procedure lives in [Secrets & VSO](secrets-and-vso.md). diff --git a/vibe/guidebooks/tools/secrets-and-vso.md b/vibe/guidebooks/tools/secrets-and-vso.md new file mode 100644 index 0000000..66e3fd3 --- /dev/null +++ b/vibe/guidebooks/tools/secrets-and-vso.md @@ -0,0 +1,234 @@ +[vibe](../../README.md) > [Guidebooks](../README.md) > [Tools](README.md) > **Secrets & VSO** + +# Tools — Secrets & VSO + +> **Status:** ✅ Active +> **Last Updated:** 2026-06-23 +> **Upstream:** [Tools](README.md) · [Components](components.md) +> **Downstream:** consumed by every `tools`-namespace pod and by every app's CI/CD +> **Related:** [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) · [naming-conventions concept](../lab-ecosystem/naming-conventions.md) · [storage-and-recovery](../lab-ecosystem/storage-and-recovery.md) · [tofu CI apply flow](../factory-provisioning/opentofu/ci-apply-flow.md) · [postgres IaC](../factory-provisioning/opentofu/postgres-iac.md) · [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md) + +This page maps how secrets live in **HashiCorp Vault** (engines, auth backends) and how they reach **Kubernetes pods** via the **Vault Secrets Operator (VSO)**. The keystone is the **`app_policy` + `app_roles` module pair**: the machinery that turns a single `` name into a matched set of Vault policies, roles, and CI identities — the same `` join key documented in the [naming-conventions concept](../lab-ecosystem/naming-conventions.md). + +Vault itself runs as a component in the `tools` namespace; see the [Components](components.md) page for its deploy shape. The admin/bootstrap layer (the `kvv1` engine, the `gitea_jwt` auth backend, the base `gitea_cicd` role, the Kubernetes auth backend mount) is created **by factory's Ansible-managed Vault Terraform** in [`hashicorp_vault.tf`](https://gitea.arcodange.lab/arcodange-org/factory/src/branch/main/ansible/arcodange/factory/playbooks/tools/roles/hashicorp_vault/files/hashicorp_vault.tf); everything in this page that is *per-app* is created by the IaC under [`hashicorp-vault/iac`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac). + +> [!CAUTION] +> Vault runs **standalone** with file/raft storage and starts **sealed** after any restart or node reboot. Until it is unsealed, every VSO read fails and no app can fetch DB creds or config — pods that depend on a `VaultDynamicSecret` will not start. Unseal procedure and key custody live in [storage-and-recovery](../lab-ecosystem/storage-and-recovery.md). + +--- + +## 1) Vault engines & auth backends + +All engines below are mounted by [`hashicorp-vault/iac/main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf) except `kvv1`, which is bootstrapped by factory's Ansible Vault Terraform. + +| Mount | Type | Holds | Defined in | +|---|---|---|---| +| `kvv1` | KV **v1** | Admin / cloud secrets: `kvv1/google/credentials`, `kvv1/gitea/*`, `kvv1/cloudflare/*`, `kvv1/ovh/*`, `kvv1/postgres/credentials`, `kvv1/admin/*` | factory [`hashicorp_vault.tf`](https://gitea.arcodange.lab/arcodange-org/factory/src/branch/main/ansible/arcodange/factory/playbooks/tools/roles/hashicorp_vault/files/hashicorp_vault.tf) | +| `kvv2` | KV **v2** (versioned) | Per-app config secrets under `kvv2//*` | [`main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf) | +| `transit` | transit | The **VSO client-cache encryption key** `vso-client-cache` — lets VSO persist its client cache encrypted so it survives an operator restart without re-auth storms | [`main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf) | +| `postgres` | database | **Dynamic** Postgres creds at `postgres/creds/`; connects to the DB through `pgbouncer.tools:5432` using the `credentials_editor` root account | [`main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf) | + +The `postgres` connection is configured with `allowed_roles = ["*"]` and a root-rotation statement (`ALTER USER … WITH PASSWORD`); the editor username/password come from the sensitive `POSTGRES_CREDENTIALS_EDITOR_*` variables. + +### Auth backends + +| Backend | Mount | Who uses it | Role(s) | +|---|---|---|---| +| `kubernetes` | `kubernetes` | VSO controller + every app pod's ServiceAccount | `vault-secret-operator` (VSO itself), `` (one per app), `factory_crowdsec_conf` | +| `gitea_jwt` | `gitea_jwt` | CI/OpenTofu jobs running in Gitea Actions | `gitea_cicd` (base, factory-bootstrapped) + per-app `gitea_cicd_` | + +- **`kubernetes`** auth ([`main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf)) is configured against `https://kubernetes.default.svc:443`. The VSO role `vault-secret-operator` binds SA `hashicorp-vault-vault-secrets-operator-controller-manager` in ns `tools`, `audience = vault`, and carries the `edit-vso-client-cache` policy (encrypt/decrypt on `transit/.../vso-client-cache`). +- **`gitea_jwt`** is the OIDC/JWT backend for CI. Its backend, `default_role = gitea_cicd`, and the base `gitea_cicd` role are created by factory's Vault bootstrap; the Vault provider in each IaC project logs in via `auth_login_jwt { mount = "gitea_jwt", role = "gitea_cicd[_]" }` using the `TERRAFORM_VAULT_AUTH_JWT` env var. See the [tofu CI apply flow](../factory-provisioning/opentofu/ci-apply-flow.md) for how the token is minted in the pipeline. + +### Terraform state + +Each IaC project keeps its state in the **`arcodange-tf` GCS bucket** under a distinct prefix: + +| Project | GCS prefix | +|---|---| +| Vault admin/app machinery | `tools/hashicorp_vault/main` | +| Plausible | `tools/plausible/main` | +| CrowdSec | `tools/crowdsec/main` | + +--- + +## 2) The `app_policy` + `app_roles` modules — the `` join-key machinery + +> [!IMPORTANT] +> These two modules are the heart of the secrets layer. Given a single `` name they emit a **matched, name-derived** set of Vault objects so that an app's runtime, its CI, and its database identity all line up on the same key. This is the Vault half of the lab-wide [naming convention](../lab-ecosystem/naming-conventions.md): the same `` string also names the Kubernetes namespace, the ServiceAccount, the Postgres `_role`, and the Gitea repo. + +The two modules live on **opposite sides of the trust boundary**: + +- [`modules/app_policy`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/modules/app_policy) is declared **once, centrally**, in the Vault admin project ([`main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/main.tf), `for_each` over `var.applications`). It creates the **policies and the CI identity** — the privileged bits — so the app's own repo never holds them. +- [`modules/app_roles`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/modules/app_roles) is declared **by the subordinate app project** (pulled over SSH as a Git module), running under the ``-ops policy. It creates the **roles** the app needs. + +### `app_roles` — runtime roles (declared by the app repo) + +For ``, [`app_roles/main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/modules/app_roles/main.tf) creates: + +| Resource | Path | Key settings | +|---|---|---| +| Kubernetes auth role | `auth/kubernetes/role/` | `bound_service_account_names = [] + extras`, `bound_service_account_namespaces = [] + extras`, `token_ttl = 3600` (1h), `token_policies = [default, ]`, `audience = vault` | +| Postgres dynamic role | `postgres/roles/` | `db_name = postgres`; creation SQL: `CREATE ROLE "{{name}}" WITH LOGIN PASSWORD … VALID UNTIL …` then `GRANT _role TO "{{name}}"`; revocation: `REASSIGN OWNED BY "{{name}}" TO _role` then `REVOKE ALL ON DATABASE FROM "{{name}}"` | + +> [!IMPORTANT] +> The Postgres dynamic role's creation SQL does `GRANT _role TO {{name}}` and its revocation does `REASSIGN OWNED BY {{name}} TO _role`. **The non-login `_role` must already exist in Postgres** — it is created by factory's [postgres IaC](../factory-provisioning/opentofu/postgres-iac.md) (`postgresql_role.app_role[""]`, owner of the `` database). If that role is missing, every ephemeral-user creation/revocation fails. This is the ordering dependency between the two repos: **factory postgres/iac before tools app_roles**. + +> [!NOTE] +> The Kubernetes auth role binds **both** SA names **and** namespaces — the check is an **AND**. A token presenting SA `` from the wrong namespace (or any other SA from ns ``) is rejected. The default binding is SA `` in ns ``; the `service_account_names` / `service_account_namespaces` inputs widen it (e.g. CrowdSec/Plausible run in ns `tools`, not a namespace named after the app). + +The Postgres role can be skipped with `disable_database = true`; the DB name defaults to `` but can be overridden via `database`. + +### `app_policy` — policies + CI identity (declared centrally) + +For ``, [`app_policy/main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/modules/app_policy/main.tf) creates: + +| Resource | Name | Grants | +|---|---|---| +| **App policy** | `` | `read,list` on `kvv2/data//*`; `read` on `postgres/creds/*` — what the runtime pod can do | +| **Ops policy** | `-ops` | The CI bundle (below) | +| **JWT role** | `gitea_cicd_` (mount `gitea_jwt`) | `token_policies = [default] + 's ops_policies`, `bound_audiences = [gitea_app_id]`, `user_claim = email`, `role_type = jwt` | +| **Identity group** | `-ops` | Internal group carrying the `-ops` policy, so Vault users mapped to their Gitea entity inherit ops rights | + +The **`-ops` policy** is the privilege set a CI job needs to *manage* the app's own corner of Vault and the clouds: + +- `create/update` on `auth/token/create`; `read` on `sys/mounts/auth/*` (so the Vault provider works); +- full CRUD on `postgres/roles/*` and on `auth/kubernetes/role/*` (so `app_roles` can apply) — the k8s-role rule is **parameter-constrained**: it may only set `bound_service_account_names`/`bound_service_account_namespaces` to the whitelisted `[] + extras` lists and `token_policies` to `["default",""]`, preventing a CI job from minting a role with broader bindings; +- full CRUD on the app's KV-v2 data, delete/undelete/destroy, and `metadata` (`kvv2/data|delete|undelete|destroy|metadata//*`); +- `read` on `kvv1/google/credentials` (the GCS backend SA), `kvv1/gitea/tofu_module_reader` (the bot SSH key that lets CI pull the `app_roles` Git module); +- CRUD on `kvv1/cloudflare/*` and `kvv1/ovh/*` (cloud DNS/edge secrets scoped to the app). + +> [!NOTE] +> The policy document is post-processed with two `replace()` calls. The Vault provider serializes the whitelisted list parameters as a JSON-encoded string (`"["webapp"]"`); the replaces strip the outer quotes so Vault receives a real list. If you change those `allowed_parameter` blocks, keep the replaces in sync. + +### Apps wired in `terraform.tfvars` + +[`terraform.tfvars`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/terraform.tfvars) declares the `applications` set the central `app_policy` `for_each` walks: + +| `` | Extra SA | Extra ns | Extra ops policy | Notes | +|---|---|---|---|---| +| `webapp` | — | — | — | defaults: SA `webapp` / ns `webapp` | +| `erp` | — | — | — | defaults | +| `cms` | `cloudflared` | — | `factory__cf_r2_arcodange_tf` | extra SA for the Cloudflare tunnel; extra ops policy for the CF R2 Terraform-state bucket | +| `crowdsec` | — | `tools` | — | runs in ns `tools` | +| `plausible` | — | `tools` | — | runs in ns `tools` | + +> [!NOTE] +> `terraform.tfvars` uses the key `ops_policies` for the CMS extra policy while `variables.tf` declares the optional attribute as `policies`; the central `main.tf` passes `each.value.policies` into the module's `ops_policies` input. Read these together when adding a new app so the extra-policy list actually lands on the JWT role. + +--- + +## 3) VSO CRDs — how a secret becomes a Kubernetes Secret + +The [Vault Secrets Operator](https://developer.hashicorp.com/vault/docs/platform/k8s/vso) watches three custom resources and writes plain Kubernetes `Secret` objects that pods consume normally (env / volume). The app repo ships the CRDs; the operator does the Vault round-trips. + +| CRD | What it does | Refresh / rotation | +|---|---|---| +| `VaultAuth` | Picks the auth method (`kubernetes`), the `mount`, the Vault `role` (= ``), and the pod **ServiceAccount** (= ``) used to log in; references a `VaultConnection` (here the in-cluster `default` → `http://hashicorp-vault.tools.svc.cluster.local:8200`) | n/a — used by the other two CRDs via `vaultAuthRef` | +| `VaultStaticSecret` | Reads a **KV-v2** path → writes a k8s `Secret` | `refreshAfter` (the lab uses `30s`) | +| `VaultDynamicSecret` | Reads `postgres/creds/` (a **dynamic** lease) → writes a k8s `Secret`; `rolloutRestartTargets` lists Deployments to restart when creds rotate | follows the Vault lease TTL (1h); VSO renews/re-issues and restarts the targets | + +### Worked example — Plausible (`tools` namespace) + +Files under [`plausible/resources`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/resources): + +1. **`VaultAuth` `plausible`** ([`vaultauth.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/resources/vaultauth.yaml)) — `method: kubernetes`, `role: plausible`, `serviceAccount: plausible`, `audiences: [vault]`. This is the Vault role `app_roles` created in [`plausible/iac/main.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/iac/main.tf). +2. **`VaultStaticSecret` `plausible`** ([`vaultsecret.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/resources/vaultsecret.yaml)) — `kvv2` path `plausible/config` → Secret `plausible-config` (`refreshAfter: 30s`). The config payload holds **`SECRET_KEY_BASE`** and **`TOTP_VAULT_KEY`**, both **generated by Terraform** (`random_password`, base64-encoded) and written to `kvv2/plausible/config` via `vault_kv_secret_v2` in the plausible IaC. +3. **`VaultStaticSecret` `plausible-geoip`** ([`geoipsecret.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/resources/geoipsecret.yaml)) — `kvv2` path `plausible/geoip` → Secret `plausible-geoip` exposing **`LICENSE_KEY`** (the MaxMind GeoIP license, an admin-seeded value, fed to the `geoipupdate` sidecar via env `GEOIPUPDATE_LICENSE_KEY`). +4. **`VaultDynamicSecret` `plausible-db-credentials`** ([`vaultdynamicsecret.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/resources/vaultdynamicsecret.yaml)) — `postgres/creds/plausible` → Secret `plausible-db-credentials`; `rolloutRestartTargets` restarts Deployment `plausible`. An **init container** ([`add-initcontainer.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/add-initcontainer.yaml)) reads `username`/`password` from that Secret and writes `DATABASE_URL` (`postgres://${DB_USER}:${DB_PASS}@${DB_HOST}:${DB_PORT}/${DB_NAME}`) into a shared `generated-secrets` volume the app reads. + +### Worked example — CrowdSec (`tools` namespace) + +Templates under [`crowdsec/templates`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec/templates): + +1. **`VaultAuth` `crowdsec`** ([`vaultauth.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec/templates/vaultauth.yaml)) — `role: crowdsec`, `serviceAccount: crowdsec`. +2. **`VaultDynamicSecret` `crowdsec-db-credentials`** ([`vaultdynamicsecret.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec/templates/vaultdynamicsecret.yaml)) — `postgres/creds/crowdsec` → Secret `crowdsec-db-credentials`; `rolloutRestartTargets` restarts Deployment **`crowdsec-lapi`** (the Local API that owns the DB connection). + +### `factory_auth.tf` — the Ansible CrowdSec/Traefik plugin reader + +Separately from the per-app machinery, [`factory_auth.tf`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/factory_auth.tf) wires a Kubernetes auth role **`factory_crowdsec_conf`** for SA **`factory-ansible-tool-crowdsec-traefik-plugin`** in ns **`kube-system`** (`token_ttl = 3600`). It carries policy `factory_crowdsec_conf`, which grants `read,list` on **`kvv2/data/cms/factory/*`**. This is how the Ansible-deployed CrowdSec/Traefik bouncer plugin reads the **Turnstile** configuration that the [`cms` repo](https://gitea.arcodange.lab/arcodange-org/cms) writes into `kvv2/cms/factory/*` — a cross-repo handoff entirely through Vault, with no shared file. The producer side (the Turnstile widget and the `vault_kv_secret_v2` write) is documented on the [CMS Cloudflare page](../cms/cloudflare.md). + +--- + +## 4) Secret-paths inventory + +| Path | Engine | Holds | Producer | Consumer | +|---|---|---|---|---| +| `kvv2//config` | KV v2 | App runtime config | app CI (KV CRUD via `-ops`) | `VaultStaticSecret` → pod | +| `kvv2/plausible/config` | KV v2 | `SECRET_KEY_BASE`, `TOTP_VAULT_KEY` | Plausible IaC (`random_password` → `vault_kv_secret_v2`) | `VaultStaticSecret plausible` → `plausible-config` | +| `kvv2/plausible/geoip` | KV v2 | `LICENSE_KEY` (MaxMind) | admin-seeded | `VaultStaticSecret plausible-geoip` → `geoipupdate` sidecar | +| `kvv2/cms/factory/turnstile` | KV v2 | Cloudflare Turnstile config | `cms` repo IaC | `factory_crowdsec_conf` k8s role → Ansible CrowdSec/Traefik plugin | +| `postgres/creds/` | database | Ephemeral DB user (`username`/`password`, 1h lease) | Vault on demand (role ``, `GRANT _role`) | `VaultDynamicSecret` → pod (e.g. `plausible-db-credentials`, `crowdsec-db-credentials`) | +| `transit/.../vso-client-cache` | transit | VSO client-cache encryption key | Vault admin IaC | VSO controller (encrypt/decrypt its cache) | +| `kvv1/cloudflare/*` | KV v1 | Cloudflare DNS/edge secrets | admin | app CI (`-ops` CRUD) | +| `kvv1/ovh/*` | KV v1 | OVH secrets | admin | app CI (`-ops` CRUD) | +| `kvv1/gitea/tofu_module_reader` | KV v1 | Bot SSH key to pull the `app_roles` Git module | admin | app CI (`-ops` read) | +| `kvv1/google/credentials` | KV v1 | GCS Terraform-backend SA key | admin | every IaC CI job (read) | + +--- + +## 5) Secrets flow + +```mermaid +%%{init: {'theme': 'base'}}%% +flowchart TB + classDef eng fill:#7c3aed,stroke:#5b21b6,color:#ffffff + classDef auth fill:#b45309,stroke:#92400e,color:#ffffff + classDef crd fill:#059669,stroke:#047857,color:#ffffff + classDef k8s fill:#2563eb,stroke:#1e40af,color:#ffffff + classDef ci fill:#be123c,stroke:#9f1239,color:#ffffff + + subgraph VAULT["Vault (tools ns)"] + KV2["kvv2 engine
kvv2/<app>/*"]:::eng + PG["postgres engine
postgres/creds/<app>"]:::eng + TR["transit
vso-client-cache"]:::eng + KKUB["kubernetes auth
role <app> (SA AND ns)"]:::auth + KJWT["gitea_jwt auth
gitea_cicd_<app>"]:::auth + end + + subgraph RUNTIME["Runtime path"] + VA["VaultAuth
role <app>, SA <app>"]:::crd + VSS["VaultStaticSecret
kvv2/<app>/config"]:::crd + VDS["VaultDynamicSecret
postgres/creds/<app>"]:::crd + SEC["k8s Secret
<app>-config / -db-credentials"]:::k8s + POD["App pod
(SA <app>)"]:::k8s + end + + subgraph CICD["CI path"] + GHA["Gitea Actions
OpenTofu job"]:::ci + TOFU["apply app_roles
(under <app>-ops)"]:::ci + end + + KKUB --> VA + VA --> VSS + VA --> VDS + KV2 --> VSS + PG --> VDS + VSS --> SEC + VDS -- "rolloutRestart on rotation" --> SEC + SEC --> POD + TR -. "encrypts client cache" .-> VA + + GHA -- "JWT login" --> KJWT + KJWT --> TOFU + TOFU -- "creates" --> KKUB + TOFU -- "creates" --> PG +``` + +1. **Vault** mounts the engines (`kvv2`, `postgres`, `transit`) and the two auth backends (`kubernetes`, `gitea_jwt`), all in the `tools` namespace. +2. A pod's `VaultAuth` logs in through the **`kubernetes`** backend with SA `` against role ``; the role accepts only when **both** the SA name **and** its namespace match (AND). +3. `VaultStaticSecret` reads `kvv2//config` and `VaultDynamicSecret` reads `postgres/creds/` using that auth; VSO writes the values into ordinary k8s `Secret` objects. +4. The pod consumes the Secret (env or volume); on a dynamic-cred **rotation** VSO restarts the `rolloutRestartTargets` Deployment so it picks up the new credentials. +5. The **`transit`** key `vso-client-cache` encrypts VSO's client cache so an operator restart doesn't trigger a re-auth storm. +6. On the CI side, a **Gitea Actions** OpenTofu job logs into the **`gitea_jwt`** backend as `gitea_cicd_` (audience = the Gitea OAuth app id, identity from the `email` claim). +7. Running under the `-ops` policy, that job **applies the `app_roles` module**, creating/updating the Kubernetes auth role and the Postgres dynamic role for `` — closing the loop so the runtime path in steps 2-4 works. + +--- + +## Gotchas + +- **Vault must be unsealed after every restart.** Sealed Vault → all VSO reads fail → dynamic-secret consumers won't start. See [storage-and-recovery](../lab-ecosystem/storage-and-recovery.md). +- **The Kubernetes auth role binds SA *and* namespace (AND).** The wrong namespace, or a different SA in the right namespace, is rejected. Apps in ns `tools` (CrowdSec, Plausible) widen the binding via `service_account_namespaces`. +- **The Postgres dynamic role depends on `_role` existing.** `GRANT _role TO {{name}}` (create) and `REASSIGN OWNED BY {{name}} TO _role` (revoke) both fail if factory's [postgres IaC](../factory-provisioning/opentofu/postgres-iac.md) hasn't created the `_role` non-login role first. Order: **factory postgres/iac → tools app_roles**. +- **The `ops_policies` vs `policies` key mismatch** in `terraform.tfvars` / `variables.tf` (see §2) — read both when adding an app's extra ops policy. +- **The sandbox uses a separate Vault.** Per the [safe-env ADR](../../ADR/0001-safe-prod-like-environment.md), the prod-like sandbox stands up its own Vault instance; none of the paths or roles above are shared with it. Don't assume a secret seeded in prod exists in the sandbox.