From 2d76eb45c1b816e10f42998ef9a9836c2762f257 Mon Sep 17 00:00:00 2001 From: Gabriel Radureau Date: Tue, 23 Jun 2026 22:22:09 +0200 Subject: [PATCH] docs(vibe): add new-tool and new-app runbooks (grounded in real PRs) Two agent-oriented runbooks under vibe/runbooks/ with [AGENT]/[HUMAN] step markers, grounded in real diffs: - new-tool.md : add a platform component to the tools repo so ArgoCD deploys it into the tools namespace (wrapper Chart.yaml + the tool library + a row in chart/values.yaml; optional iac/ for secrets). Mirrors the prometheus/crowdsec additions. - new-app.md : stand up a brand-new application across THREE repos (app + factory + tools) with the strict ordering dependency and the TERRAFORM_SSH_KEY pitfall. Phase-by-phase mapped to the dance-lessons-coach onboarding PRs (#89/#97/#98/#99/#100), factory #1/#2, tools #1; the FR doc/runbooks/new-web-app is linked as the detailed companion. 2 mermaid diagrams MCP-validated; zero dead links across the vibe tree. Co-Authored-By: Claude Opus 4.8 --- vibe/runbooks/README.md | 2 + vibe/runbooks/new-app.md | 269 ++++++++++++++++++++++++++++++++++++ vibe/runbooks/new-tool.md | 279 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 550 insertions(+) create mode 100644 vibe/runbooks/new-app.md create mode 100644 vibe/runbooks/new-tool.md diff --git a/vibe/runbooks/README.md b/vibe/runbooks/README.md index 3286606..1237ba4 100644 --- a/vibe/runbooks/README.md +++ b/vibe/runbooks/README.md @@ -45,6 +45,8 @@ flowchart LR | Runbook | Summary | Status | |---|---|---| | [_template](_template.md) | Skeleton for new agent-oriented runbooks (`[AGENT]`/`[HUMAN]` markers, copy-paste commands, verification + rollback) | ✅ Active | +| [Set up a new tool](new-tool.md) | Add a platform component to the `tools` repo so ArgoCD deploys it | ✅ | +| [Set up a new app](new-app.md) | Stand up a brand-new application — its own repo, chart, CI/CD with IaC, and database access, across the app + factory + tools repos | ✅ | > [!NOTE] > The first **concrete** runbook — a local sandbox game-day for the safe prod-like environment — ships with **PRD Phase 1** ([safe-prod-like-environment PRD](../PRD/safe-prod-like-environment/README.md)). Until then this folder holds the conventions and the template only. diff --git a/vibe/runbooks/new-app.md b/vibe/runbooks/new-app.md new file mode 100644 index 0000000..c6ef0a0 --- /dev/null +++ b/vibe/runbooks/new-app.md @@ -0,0 +1,269 @@ +[vibe](../README.md) > [Runbooks](README.md) > **Set up a new app** + +# Set up a new app + +> **Status:** ✅ Active +> **Audience:** platform operator + agents (English). For the detailed human-facing procedure see the French [new-web-app runbook](../../doc/runbooks/new-web-app/README.md). +> **Last Updated:** 2026-06-23 + +## TL;DR + +> [!TIP] +> Standing up a brand-new application touches **three repos** — the app's own Gitea repo, [`factory`](../../argocd/values.yaml), and [`tools`](../guidebooks/tools/secrets-and-vso.md) — with a **strict ordering dependency**. An agent may write every file and open every PR (`[AGENT]`), but each **merge/apply is `[HUMAN]`-gated**. The single rule that everything else hangs on: the **factory** Postgres DB+role and the **tools** Vault JWT role MUST be applied **before** the app's own `iac/` runs. Ship the app in **degraded mode first** (no DB/Vault), wire the platform sides, then turn on dynamic credentials last. The detailed companion is the French [new-web-app runbook](../../doc/runbooks/new-web-app/README.md); this page is its agent-oriented English mirror. + +> [!CAUTION] +> **Ordering is load-bearing — do not reorder the phases.** +> - The app's own `iac/` (Phase 6) calls the shared `app_roles` module, which issues `GRANT _role TO …` on every dynamic credential and authenticates to Vault as `gitea_cicd_`. So **both** of these must already exist: +> - the Postgres role `_role` + database `` → created by the **factory** side (Phase 4). +> - the Vault JWT role `gitea_cicd_` + policies `` / `-ops` → created by the **tools** side (Phase 5). +> - The app's `vault.yaml` CI needs the **`TERRAFORM_SSH_KEY`** Actions secret (the `tofu_module_reader` SSH key from Vault) or `terraform init` cannot clone the `app_roles` module over `git::ssh://`. This is the canonical pitfall — it sank the first `iac/` push and was fixed in [dance-lessons-coach PR #100](https://gitea.arcodange.lab/arcodange/dance-lessons-coach/pulls/100). +> Apply Phases 4 and 5 **before** merging Phase 6. + +## Scope + +This runbook covers standing up a **brand-new application** end-to-end: its own Gitea repo, a Helm `chart/`, CI/CD with IaC (`iac/` + `.gitea/workflows/`), and database access — all deployed by factory **ArgoCD** into a dedicated namespace. Systems touched: **Gitea** (repo + Actions + container registry), **Postgres** (DB + owner role via factory), **Vault** (JWT CI role, policies, dynamic DB creds via tools + app), **k3s** (namespace, pod, SA), **ArgoCD** (Application sync + image-updater), and **Traefik** (ingress). + +It does **not** cover: writing the application code itself, the one-time platform foundations (Vault mounts, the Vault→Postgres connection, the `gitea_cicd` bootstrap JWT role, the `tofu_module_reader` bot, org-level Actions secrets — all already in place), or adding a non-application platform component (see [Set up a new tool](new-tool.md)). + +The reference onboarding is **`dance-lessons-coach`** (verified from its merged PRs), with **[webapp](../guidebooks/applications/webapp.md)** as the canonical app to clone. + +## Preconditions + +- [ ] Working in a worktree under `.claude/worktrees//` (never the trunk). +- [ ] You can create a Gitea repo under `arcodange-org` (default) or `arcodange` (for some apps). +- [ ] Local clones of `factory` and `tools` are available and on synced `main`. +- [ ] The `` name is chosen — **kebab-case, lowercase**. This is the **universal join key**: the same string is reused verbatim across Gitea, Postgres, Vault, Kubernetes, ArgoCD, GCS, and DNS. One typo silently breaks the chain. See [naming-conventions](../guidebooks/lab-ecosystem/naming-conventions.md) and the FR [conventions](../../doc/runbooks/new-web-app/conventions.md). +- [ ] The platform foundations exist (Vault mounts `kvv2`/`postgres`/`transit` + auth `kubernetes`, the Vault→Postgres connection via `credentials_editor`, the bootstrap `gitea_cicd` role, the `tofu_module_reader` SSH bot, and org Actions secrets `HOMELAB_CA_CERT` / `vault_oauth__sh_b64` / `PACKAGES_TOKEN`). + +## The three-repo onboarding (ordering) + +```mermaid +%%{init: {'theme':'base','themeVariables':{'fontSize':'14px'}}}%% +flowchart TB + classDef app fill:#2563eb,stroke:#1e40af,color:#ffffff + classDef plat fill:#059669,stroke:#047857,color:#ffffff + classDef tools fill:#7c3aed,stroke:#6d28d9,color:#ffffff + classDef run fill:#b45309,stroke:#92400e,color:#ffffff + + P1["Phase 1-3 · APP repo
chart/ degraded + Vault-ready (gated) + TLS
(serves, no DB/Vault yet)"]:::app + P4["Phase 4 · FACTORY repo
argocd/values.yaml + postgres/iac
→ DB <app> + role <app>_role"]:::plat + P5["Phase 5 · TOOLS repo
hashicorp-vault/iac
→ gitea_cicd_<app> + policies"]:::tools + P6["Phase 6 · APP repo
iac/ (app_roles module) + vault.yaml
+ TERRAFORM_SSH_KEY secret"]:::app + P7["Phase 7-8 · APP repo
vault.enabled=true + dockerimage.yaml
→ dynamic creds on, image rollout"]:::run + + P1 --> P4 + P1 --> P5 + P4 --> P6 + P5 --> P6 + P6 --> P7 +``` + +1. **Phases 1-3 (app repo):** ship the chart in degraded mode, make it Vault-ready behind a default-off gate, and set the right ingress — none of this needs the platform sides yet. +2. **Phase 4 (factory) and Phase 5 (tools)** are independent of each other but **both** must be applied before Phase 6. +3. **Phase 6 (app repo)** applies the app's own `iac/`, which depends on the role/JWT created in 4 and 5, and needs the `TERRAFORM_SSH_KEY` secret. +4. **Phases 7-8 (app repo)** flip `vault.enabled=true` for live dynamic DB creds, then add the image-build CI so ArgoCD's image-updater rolls out releases. + +## Procedure + +### Phase 0 — Choose the name and create the repo + +1. **[HUMAN]** Fix `` (kebab-case) and the Gitea org. Default org is **`arcodange-org`**; some apps live under **`arcodange`** (e.g. `dance-lessons-coach`, `telegram-gateway`). Create the empty repo under the chosen org. Inheriting org-level Actions secrets is why the org choice matters. + +### Phase 1 — App in degraded mode + +Mirrors [dance-lessons-coach PR #89](https://gitea.arcodange.lab/arcodange/dance-lessons-coach/pulls/89). Clone the [webapp](../guidebooks/applications/webapp.md) pattern. + +2. **[AGENT]** Add a `Dockerfile` and a Helm `chart/` (`deployment`, `service`, `ingress`, `serviceaccount`, `configmap`, `_helpers.tpl`, `NOTES.txt`) with **no DB/Vault wiring**. Set: + - ingress host `.arcodange.lab` (internal) and/or `.arcodange.fr` (public) — TLS details land in Phase 3; + - a `nodeSelector` of `kubernetes.io/hostname: pi1` (network entrypoint, preserves the user IP, avoids NAT); + - `/healthz` (or the app's real path, e.g. `dance-lessons-coach` uses `/api/healthz`) for **both** liveness and readiness probes; + - leave any DB host empty so the pod serves in degraded mode. + + ```bash + # [AGENT] lint + render before opening the PR — safe, no cluster contact + helm lint chart/ + helm template chart/ --set image.repository=test --set image.tag=v1 + ``` + +3. **[HUMAN]** Open and merge the PR. Verify the app serves in degraded mode (binary + health endpoint reachable once ArgoCD picks it up in Phase 4+). + +### Phase 2 — Make the chart Vault-ready (gated, default off) + +Mirrors [dance-lessons-coach PR #97](https://gitea.arcodange.lab/arcodange/dance-lessons-coach/pulls/97). + +4. **[AGENT]** Add `VaultAuth`, `VaultStaticSecret`, and `VaultDynamicSecret` templates, each **gated behind `.Values.vault.enabled`** (default `false`) so a plain `helm install` keeps working. The reference `values.yaml` exposes: + + ```yaml + # chart/values.yaml — gate + the three Vault join keys (all derived from ) + vault: + enabled: false + role: # k8s auth backend role (matches iac/main.tf) + kvv2Path: /config # KVv2 secret path + postgresPath: creds/ # postgres dynamic creds path + ``` + + The `VaultAuth` targets the k8s role `` with the app's ServiceAccount and audience `vault`; the `VaultDynamicSecret` reads `postgres/creds/` into a `db-credentials` Secret and `rolloutRestartTargets` the Deployment. + +5. **[HUMAN]** Open and merge the PR. The chart is now Vault-ready without activating any Vault dependency. + +### Phase 3 — Ingress / TLS + +Mirrors [dance-lessons-coach PR #98](https://gitea.arcodange.lab/arcodange/dance-lessons-coach/pulls/98). Pick by host suffix: + +6. **[AGENT]** For a **`.lab`** host: `traefik.../router.entrypoints: websecure` + `router.tls: "true"` + `router.tls.certresolver: letsencrypt` (with `router.tls.domains.0.main: arcodange.lab` and `…sans: .arcodange.lab`) + `router.middlewares: localIp@file`. For a **`.fr`** host: `router.entrypoints: web` + `router.middlewares: kube-system-crowdsec@kubernetescrd`. (Convention: `.lab` = internal, websecure + localIp + letsencrypt; `.fr` = public, web + crowdsec.) + +7. **[HUMAN]** Merge the PR. + +### Phase 4 — FACTORY side (DB + role, ArgoCD enrollment) + +Mirrors [factory PR #1](https://gitea.arcodange.lab/arcodange-org/factory/pulls/1) (ArgoCD) and [factory PR #2](https://gitea.arcodange.lab/arcodange-org/factory/pulls/2) (Postgres). Link: [postgres-iac](../guidebooks/factory-provisioning/opentofu/postgres-iac.md), [ci-apply-flow](../guidebooks/factory-provisioning/opentofu/ci-apply-flow.md). + +8. **[AGENT]** Enroll `` in [`argocd/values.yaml`](../../argocd/values.yaml) under `gitea_applications`. The [apps template](../../argocd/templates/apps.yaml) defaults the org to `arcodange-org` (`{{- $org := default "arcodange-org" $app_attr.org -}}`), so add `org: arcodange` **only** if the app is not under `arcodange-org`. Add image-updater annotations for digest-based rollout: + + ```yaml + # argocd/values.yaml — under gitea_applications + : + org: arcodange # ← ONLY if not arcodange-org + annotations: + argocd-image-updater.argoproj.io/image-list: =gitea.arcodange.lab//:latest + argocd-image-updater.argoproj.io/.update-strategy: digest + ``` + +9. **[AGENT]** Add `""` to the `applications` list in [`postgres/iac/terraform.tfvars`](../../postgres/iac/terraform.tfvars). This creates the `` database, the non-login owner role `_role`, and the pgbouncer `user_lookup()` function. + + ```hcl + # postgres/iac/terraform.tfvars + applications = [ + "webapp", + "erp", + "crowdsec", + "plausible", + "dance-lessons-coach", + "", # ← add + ] + ``` + +10. **[HUMAN]** Merge both PRs. Factory CI (`postgres.yaml`) applies — the DB + role now exist. ArgoCD creates the Application and deploys the degraded chart into namespace ``. + +### Phase 5 — TOOLS side (Vault JWT role + policies) + +Mirrors [tools PR #1](https://gitea.arcodange.lab/arcodange-org/tools/pulls/1). Link: [tools secrets-and-vso](../guidebooks/tools/secrets-and-vso.md), [tools components](../guidebooks/tools/components.md). + +11. **[AGENT]** Add `{ name = "" }` to the `applications` list in [`tools/hashicorp-vault/iac/terraform.tfvars`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault/iac/terraform.tfvars). Via the `app_policy` / `app_roles` modules this creates the `gitea_cicd_` JWT role, the `` (runtime) and `-ops` (CI) policies, the `-ops` identity group, and the k8s auth role. + + ```hcl + # tools/hashicorp-vault/iac/terraform.tfvars + applications = [ + { name = "webapp" }, + { name = "erp" }, + { name = "" }, # ← add + # optional fields when needed: + # { name = "", ops_policies = ["…"], service_account_names = ["…"], service_account_namespaces = ["tools"] } + ] + ``` + +12. **[HUMAN]** Merge the PR. Tools CI (`vault.yaml`) applies — `gitea_cicd_` and the policies now exist. + +### Phase 6 — App IaC + Vault workflow + +Mirrors [dance-lessons-coach PR #99](https://gitea.arcodange.lab/arcodange/dance-lessons-coach/pulls/99) and the [#100 fix](https://gitea.arcodange.lab/arcodange/dance-lessons-coach/pulls/100). See [05-app-terraform](../../doc/runbooks/new-web-app/05-app-terraform.md) for the module contract. + +> [!CAUTION] +> **Phases 4 and 5 must already be applied** before merging this phase, or the first `tofu apply` fails (no `_role` to GRANT, or Vault auth fails on the missing `gitea_cicd_` role). + +13. **[AGENT]** Add the app's `iac/`: + - `providers.tf` — Vault provider with `auth_login_jwt { mount = "gitea_jwt", role = "gitea_cicd_" }`. + - `backend.tf` — GCS backend `bucket = "arcodange-tf"`, `prefix = "/main"`. + - `main.tf` — call the shared module (the exact source string used by every app): + + ```hcl + module "app_roles" { + source = "git::ssh://git@192.168.1.202:2222/arcodange-org/tools.git//hashicorp-vault/iac/modules/app_roles?depth=1&ref=main" + name = "" + } + ``` + + This provisions `postgres/creds/` (dynamic DB role inheriting `_role`) and the k8s auth role ``. Add any app-specific `kvv2//config` secrets alongside. + +14. **[AGENT]** Add `.gitea/workflows/vault.yaml` that authenticates via Gitea OIDC and runs `tofu apply iac/`. The `vault-action` step's `role:` and `providers.tf`'s `role` **must both** be `gitea_cicd_` (the copy-paste trap — `erp` still carries a stale `gitea_cicd_webapp`). The secrets block must read the SSH key: + + ```yaml + # .gitea/workflows/vault.yaml — vault-action secrets block + secrets: | + kvv1/google/credentials credentials | GOOGLE_BACKEND_CREDENTIALS ; + kvv1/gitea/tofu_module_reader ssh_private_key | TERRAFORM_SSH_KEY ; + ``` + +15. **[HUMAN]** Add the **`TERRAFORM_SSH_KEY`** secret (the `tofu_module_reader` SSH key, read from Vault at `kvv1/gitea/tofu_module_reader`) to the app repo's **Actions secrets**. Without it, `terraform init` cannot clone the `app_roles` module over `git::ssh://` — the canonical pitfall fixed in [PR #100](https://gitea.arcodange.lab/arcodange/dance-lessons-coach/pulls/100). + +16. **[HUMAN]** Merge the PR. The app's `vault.yaml` runs `tofu apply` — `postgres/creds/` and the k8s role `` now exist. + +### Phase 7 — Turn on dynamic DB credentials + +17. **[AGENT]** Set `vault.enabled=true` in `chart/values.yaml` (and point the app's DB env at `pgbouncer.tools:5432`). On next ArgoCD sync, VSO authenticates with the k8s role ``, fetches dynamic Postgres creds from `postgres/creds/` into the `db-credentials` Secret, and the pod reaches the DB through **pgbouncer.tools** with a short-lived user that inherits `_role`. See [webapp](../guidebooks/applications/webapp.md) and [erp](../guidebooks/erp/README.md) for the consumption pattern. + +18. **[HUMAN]** Merge the PR. + +### Phase 8 — Image CI + deploy + +19. **[AGENT]** Add `.gitea/workflows/dockerimage.yaml` that builds the image and pushes it to the Gitea registry (`gitea.arcodange.lab//:latest` + branch tag), logging in with `PACKAGES_TOKEN`. No deploy step is needed — the ArgoCD image-updater annotations from Phase 4 watch `latest` (digest strategy) and roll it out. Skip this phase entirely for apps that run a public upstream image (e.g. `erp`/Dolibarr). + +20. **[HUMAN]** Merge the PR. + +## Verification + +The convention chain must resolve end-to-end (this is the same parity check the [safe-env PRD](../ADR/0001-safe-prod-like-environment.md) rehearses in the sandbox). All checks below are **[AGENT]** read-only: + +```bash +# [AGENT] Gitea repo exists under the chosen org +git ls-remote https://gitea.arcodange.lab// &>/dev/null && echo "repo OK" + +# [AGENT] Postgres DB + owner role exist (run from a host with psql access to the engine) +psql -h 192.168.1.202 -U credentials_editor -tAc \ + "SELECT datname FROM pg_database WHERE datname='';" +psql -h 192.168.1.202 -U credentials_editor -tAc \ + "SELECT rolname FROM pg_roles WHERE rolname='_role';" + +# [AGENT] Vault: dynamic role, policies, and CI JWT role exist +vault read postgres/roles/ +vault policy read +vault policy read -ops +vault read auth/gitea_jwt/role/gitea_cicd_ + +# [AGENT] ArgoCD Application is Synced + Healthy +kubectl --context -n argocd get application \ + -o jsonpath='{.status.sync.status}/{.status.health.status}' +# expected: Synced/Healthy + +# [AGENT] VSO created the db-credentials Secret + pod is Running + ingress resolves +kubectl --context -n get secret db-credentials +kubectl --context -n get pods +curl -fsS https://.arcodange.lab/healthz # or the app's real health path +``` + +Expected: repo present; PG `` DB + `_role` exist; Vault `postgres/creds/` + policies ``/`-ops` + `gitea_cicd_` exist; ArgoCD Application `Synced/Healthy`; the `db-credentials` Secret was created by VSO; the pod is `Running`; the ingress resolves. + +## Rollback + +Revert the per-repo PRs **in reverse order**: app → tools → factory. Tag each undo just like the procedure. + +1. **[HUMAN]** App repo: revert Phase 8 → 7 → 6 PRs. Reverting the Phase 6 `iac/` removes `postgres/creds/` and the k8s role on the next CI run; setting `vault.enabled=false` returns the chart to degraded mode. +2. **[HUMAN]** Tools repo: remove the `{ name = "" }` entry; tools CI prunes `gitea_cicd_` + policies. +3. **[HUMAN]** Factory repo: remove the `` entry from `argocd/values.yaml` — ArgoCD **prunes the Application** (and its namespace) — and remove `""` from `postgres/iac/terraform.tfvars` to drop the DB + role. +4. **[HUMAN]** For a full cluster-level recovery (power-cut, lost unseal key) consult `CLUSTER_RECOVERY.md`. + +> [!WARNING] +> Removing the Postgres entry **drops the database** `` and its data. Back up first if the app already holds state. + +## References + +- French human-operator procedure: [new-web-app runbook](../../doc/runbooks/new-web-app/README.md) + [conventions](../../doc/runbooks/new-web-app/conventions.md) (the universal `` join key). +- Exemplars: [webapp](../guidebooks/applications/webapp.md) (in-house image + DB) and [erp](../guidebooks/erp/README.md) (public image + DB). +- Platform mechanics: [tools secrets-and-vso](../guidebooks/tools/secrets-and-vso.md), [tools components](../guidebooks/tools/components.md), [postgres-iac](../guidebooks/factory-provisioning/opentofu/postgres-iac.md), [ci-apply-flow](../guidebooks/factory-provisioning/opentofu/ci-apply-flow.md), [naming-conventions](../guidebooks/lab-ecosystem/naming-conventions.md), [secrets-and-vault](../guidebooks/lab-ecosystem/secrets-and-vault.md). +- Companion runbook: [Set up a new tool](new-tool.md). +- Parity rehearsal: [safe-prod-like-environment ADR/PRD](../ADR/0001-safe-prod-like-environment.md). +- Factory files: [argocd/values.yaml](../../argocd/values.yaml), [argocd/templates/apps.yaml](../../argocd/templates/apps.yaml), [postgres/iac/terraform.tfvars](../../postgres/iac/terraform.tfvars). +- Reference PRs (verified, all merged): + - app `dance-lessons-coach`: [#89 degraded](https://gitea.arcodange.lab/arcodange/dance-lessons-coach/pulls/89) · [#97 Vault-ready gate](https://gitea.arcodange.lab/arcodange/dance-lessons-coach/pulls/97) · [#98 TLS ingress](https://gitea.arcodange.lab/arcodange/dance-lessons-coach/pulls/98) · [#99 iac + workflow](https://gitea.arcodange.lab/arcodange/dance-lessons-coach/pulls/99) · [#100 TERRAFORM_SSH_KEY fix](https://gitea.arcodange.lab/arcodange/dance-lessons-coach/pulls/100) + - factory: [#1 ArgoCD enroll + org override](https://gitea.arcodange.lab/arcodange-org/factory/pulls/1) · [#2 Postgres DB + role](https://gitea.arcodange.lab/arcodange-org/factory/pulls/2) + - tools: [#1 Vault JWT role + policy](https://gitea.arcodange.lab/arcodange-org/tools/pulls/1) diff --git a/vibe/runbooks/new-tool.md b/vibe/runbooks/new-tool.md new file mode 100644 index 0000000..414ff6f --- /dev/null +++ b/vibe/runbooks/new-tool.md @@ -0,0 +1,279 @@ +[vibe](../README.md) > [Runbooks](README.md) > **Set up a new tool** + +# Set up a new tool + +> **Status:** ✅ Active +> **Audience:** platform operator + agents (English). For the application-onboarding equivalent see [Set up a new app](new-app.md). +> **Last Updated:** 2026-06-23 + +## TL;DR + +> [!TIP] +> Adding a platform component means dropping a small **wrapper chart** into the `tools` repo and registering it in the app-of-apps. An agent can do the bulk of it: scaffold `tools//` (a wrapper `Chart.yaml` that depends on the upstream chart + the local `tool` library chart, the two `helm-chart*.yaml` templates, and a `values.yaml`), add one key under `tools:` in [`tools/chart/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/values.yaml), and lint it locally. The **human approval gate** sits at two places: (1) any Vault/database wiring under `tools//iac/` and (2) opening + merging the PR — ArgoCD auto-syncs the new Application the moment it lands on `main`. + +## Scope + +This runbook covers adding a **new platform component** (monitoring, cache, security engine, connection pooler, analytics, …) to the [`tools` repo](https://gitea.arcodange.lab/arcodange-org/tools) so the factory ArgoCD `tools` project renders an Application for it and deploys it into the **`tools` namespace**. + +Systems touched: Gitea (`tools` repo), ArgoCD (the `tools` AppProject), k3s (the helm-controller that materialises each `HelmChart` CR), and — only for secret-backed tools — Vault + the Vault Secrets Operator (VSO). + +This runbook does **not** cover standing up a brand-new business application (its own repo, chart, CI/CD, database). That is the [Set up a new app](new-app.md) runbook. It also does not cover the underlying app-of-apps wiring of the `tools` project itself — read the [tools guidebook](../guidebooks/tools/README.md) for how that works. + +## Preconditions + +- [ ] Working in a worktree under `.claude/worktrees//` of a `tools` repo clone (never the trunk). +- [ ] The tool deploys into the **`tools` namespace** (the `tools` AppProject only permits that destination). +- [ ] You know the **upstream Helm chart** (chart name + repo URL) and a **pinned version**, OR you have decided this tool needs **Kustomize + helm inflation** (charts that require post-render patching, like `clickhouse`/`plausible`). +- [ ] `helm` (with the upstream repo reachable) and, for the Kustomize path, `kustomize` available locally for the lint step. +- [ ] If the tool needs secrets or a database: confidence with the Vault `app_roles` module pattern and the `tofu-apply` CI flow — see the [tools secrets & VSO page](../guidebooks/tools/secrets-and-vso.md) and the [tofu CI apply flow](../guidebooks/factory-provisioning/opentofu/ci-apply-flow.md). + +## Procedure + +1. **[HUMAN]** Choose the tool name `` (kebab-case) and the deployment shape. + + Decide between the two supported shapes: + - **Wrapper chart (default).** A thin Helm chart that depends on the upstream chart at a pinned version and lets the local `tool` library chart emit a k3s `HelmChart` custom resource. Used by [`prometheus`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/prometheus) and [`crowdsec`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec). + - **Kustomize + helm inflation.** For charts that need post-render JSON6902 patches or extra `resources/`. Used by [`clickhouse`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/clickhouse) and [`plausible`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible). + + Pin the upstream chart **version** now — it goes verbatim into the next step. + +2. **[AGENT]** Scaffold `tools//` (wrapper-chart shape). + + Create four files. The `Chart.yaml` declares **two** dependencies — the local `tool` library chart (served from the Gitea Helm package registry) and the upstream chart pinned to your chosen version: + + ```yaml + # tools//Chart.yaml + apiVersion: v2 + name: + description: A Helm chart for Kubernetes + + dependencies: + - name: tool + version: 0.1.0 + repository: https://gitea.arcodange.lab/api/packages/arcodange-org/helm + - name: + version: + repository: https:// + type: application + version: 0.1.0 + ``` + + The two template files are one-liners that delegate to the `tool` library (they only render when `tool.kind` is `HelmChart`; under `SubChart` they are inert and the upstream chart is pulled as a normal dependency): + + ```yaml + # tools//templates/helm-chart.yaml + {{- if eq .Values.tool.kind "HelmChart" -}} + {{- include "tool.helm-chart.tpl" . -}} + {{- end -}} + ``` + + ```yaml + # tools//templates/helm-chart-config.yaml + {{- if eq .Values.tool.kind "HelmChart" -}} + {{- include "tool.helm-chart-config.tpl" . -}} + {{- end -}} + ``` + + The `values.yaml` carries the upstream values under a YAML anchor and re-references it from the `tool:` block. Web-facing tools set an ingress host `.arcodange.lab`; stateful tools set persistence with the longhorn storage class and resource requests/limits. The shape, taken from [`prometheus/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/prometheus/values.yaml): + + ```yaml + # tools//values.yaml + : &_config + # ── upstream values go here ── + # web-facing tools: expose an ingress host + ingress: + enabled: true + hosts: + - .arcodange.lab + # stateful tools: pin storage class + size + persistence: + enabled: true + storageClass: longhorn + size: 8Gi + resources: + requests: + cpu: 100m + memory: 256Mi + limits: + cpu: 500m + memory: 512Mi + + tool: + # kind 'SubChart': pull the upstream chart as a dependency and pass it the values below. + # kind 'HelmChart': let the tool library emit a k3s HelmChart CR instead. + kind: 'SubChart' + repo: https:// + chart: + version: + values: *_config + ``` + + > [!NOTE] + > Under `tool.kind: 'HelmChart'` the local [`tool` library chart](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/tool) emits a `helm.cattle.io/v1` `HelmChart` CR (and an optional `HelmChartConfig`) pinned to `namespace: tools` / `targetNamespace: tools`, and the k3s helm-controller installs the upstream chart. Under `'SubChart'` (the default that prometheus and crowdsec use) the upstream chart is just a Helm dependency rendered in-line. Pick `SubChart` unless you specifically need the helm-controller to own the release. + + For the **Kustomize shape** instead, skip the wrapper `Chart.yaml`/templates and create a `kustomization.yaml` that inflates the upstream chart plus any `resources/`, mirroring [`plausible/kustomization.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/kustomization.yaml): + + ```yaml + # tools//kustomization.yaml + apiVersion: kustomize.config.k8s.io/v1beta1 + kind: Kustomization + namespace: tools + + helmCharts: + - name: + repo: https:// + version: + releaseName: + valuesFile: Values.yaml + namespace: tools + + resources: + - resources/ingressroute.yaml + # patches: / patchesJson6902: ← post-render tweaks, see plausible for a worked example + ``` + +3. **[AGENT]** Register the tool in the app-of-apps. + + Add a single key for `` under `tools:` in [`tools/chart/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/values.yaml): + + ```yaml + # tools/chart/values.yaml + tools: + pgbouncer: {} + hashicorp-vault: {} + crowdsec: {} + # …existing entries… + : {} + ``` + + The `chart/templates/apps.yaml` template ranges over `.Values.tools` and renders one ArgoCD `Application` per key, with `path: ` and `destination.namespace: tools` under the `tools` AppProject. The key **must match the directory name** you created in step 2. See the [tools guidebook](../guidebooks/tools/README.md) for how the app-of-apps meta-chart drives this. + +4. **[HUMAN]** If the tool needs **secrets** or a **database**, wire Vault + VSO and a tofu-apply workflow. + + This step mutates Vault (creates roles/secrets) and so is gated. Use [`crowdsec`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec) (dynamic Postgres role) and [`plausible`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible) (kvv2 static secrets) as the worked examples, and read the [tools secrets & VSO page](../guidebooks/tools/secrets-and-vso.md). + + a. Add `tools//iac/` — OpenTofu that configures Vault. For a dynamic Postgres role, reuse the shared `app_roles` module exactly as crowdsec does: + + ```hcl + # tools//iac/main.tf + module "app_roles" { + source = "git::ssh://git@192.168.1.202:2222/arcodange-org/tools.git//hashicorp-vault/iac/modules/app_roles?depth=1&ref=main" + name = "" + service_account_namespaces = ["tools"] + } + # for kvv2 static config, add vault_kv_secret_v2 resources (see plausible/iac/main.tf) + ``` + + Pair it with a `backend.tf` (GCS state at `prefix = "tools//main"`) and a `providers.tf` whose `auth_login_jwt` role is `gitea_cicd_` — both copied from crowdsec. + + b. Add the VSO CRDs to the chart templates so VSO mints a k8s Secret the workload consumes. A `serviceaccount.yaml`, a `VaultAuth` bound to a Vault `kubernetes` role named ``, and a `VaultDynamicSecret` (or `VaultStaticSecret` for kvv2) pointing at the Vault path: + + ```yaml + # tools//templates/vaultauth.yaml + apiVersion: secrets.hashicorp.com/v1beta1 + kind: VaultAuth + metadata: + name: + namespace: {{ .Release.Namespace }} + spec: + vaultConnectionRef: default + method: kubernetes + mount: kubernetes + kubernetes: + role: + serviceAccount: + audiences: + - vault + ``` + + ```yaml + # tools//templates/vaultdynamicsecret.yaml + apiVersion: secrets.hashicorp.com/v1beta1 + kind: VaultDynamicSecret + metadata: + name: -db-credentials + namespace: {{ .Release.Namespace }} + spec: + mount: postgres + path: creds/ + destination: + create: true + name: -db-credentials + rolloutRestartTargets: + - kind: Deployment + name: + vaultAuthRef: + ``` + + Then reference the VSO-created secret from the workload (env `valueFrom.secretKeyRef`), as crowdsec's `values.yaml` does for `DB_USER`/`DB_PASSWORD`. For the Kustomize shape, add these CRDs as files under `resources/` and list them in `kustomization.yaml` instead of `templates/`. + + c. Add a `.gitea/workflows/.yaml` that tofu-applies `/iac` on changes, mirroring [`crowdsec.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/.gitea/workflows/crowdsec.yaml): a path filter on `'/**/*.tf'`, a Gitea→Vault JWT auth job, and a `dflook/terraform-apply` step with `path: /iac`. See the [tofu CI apply flow](../guidebooks/factory-provisioning/opentofu/ci-apply-flow.md) for what that pipeline does end to end. + +5. **[AGENT]** Lint and render locally before opening the PR. + + For the wrapper-chart shape: + + ```bash + helm dependency update tools/ + helm lint tools/ + helm template tools/ | head -n 60 + # render the app-of-apps Application for : + helm template tools-apps tools/chart | grep -A12 "name: " + ``` + + For the Kustomize shape: + + ```bash + kustomize build --enable-helm tools/ | head -n 60 + ``` + +6. **[HUMAN]** Open a PR on the `tools` repo, get it reviewed, and merge. + + ```bash + git checkout -b arcodange/ + git add tools/ tools/chart/values.yaml + git commit -m "declare " + git push -u origin arcodange/ + ``` + + > [!IMPORTANT] + > The `tools` repo is on **Gitea**, not GitHub — open the PR with the `mcp__gitea__*` tools (load `select:mcp__gitea__pull_request_write` via `ToolSearch`), not `gh`. Once the PR merges to `main`, ArgoCD detects the new key in `chart/values.yaml`, renders the `` Application, and syncs it automatically. + +## Verification + +All read-only — an agent can run these after the PR merges and ArgoCD has reconciled. + +```bash +# 1. The ArgoCD Application for is Synced + Healthy +kubectl --context -n argocd get application \ + -o jsonpath='{.status.sync.status}/{.status.health.status}{"\n"}' +# expected: Synced/Healthy + +# 2. The pod is Running in the tools namespace +kubectl --context -n tools get pods -l app.kubernetes.io/name= +# expected: -… 1/1 Running + +# 3. Web-facing tools: the ingress is admitted and the host resolves +kubectl --context -n tools get ingress | grep +curl -sI https://.arcodange.lab | head -n1 # expected: HTTP/2 200 (or app login redirect) + +# 4. Secret-backed tools: VSO created the k8s Secret +kubectl --context -n tools get secret -db-credentials +# expected: the Secret exists with the keys the workload mounts +``` + +## Rollback + +- **[HUMAN]** Revert the `tools/chart/values.yaml` entry (remove the `:` key). On the next sync ArgoCD **prunes** the `` Application — `prune: true` is set in `apps.yaml` — which removes the deployed workload from the `tools` namespace. +- **[HUMAN]** In a follow-up PR, delete the `tools//` directory to remove the wrapper chart / Kustomize source. +- **[HUMAN]** For secret-backed tools, the Vault role/secret created by `tools//iac/` is **not** removed by ArgoCD. Destroy it explicitly (`tofu -chdir=tools//iac destroy`) or remove the IaC and let the workflow reconcile, and drop the `.gitea/workflows/.yaml` file. +- For a full cluster-level recovery (power cut, lost quorum) follow CLUSTER_RECOVERY.md. + +## References + +- [Tools guidebook](../guidebooks/tools/README.md) — how the app-of-apps meta-chart turns each `tools:` key into an ArgoCD Application. +- [Tools components](../guidebooks/tools/components.md) — the catalogue of platform components and what each provides. +- [Tools secrets & VSO](../guidebooks/tools/secrets-and-vso.md) — the Vault `app_roles` + VaultAuth/VaultDynamicSecret pattern used in step 4. +- [Tofu CI apply flow](../guidebooks/factory-provisioning/opentofu/ci-apply-flow.md) — what the `/iac` tofu-apply workflow does end to end. +- Real examples in the `tools` repo: [`prometheus`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/prometheus) and [`crowdsec`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec) (wrapper-chart shape), the shared [`tool` library chart](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/tool), and [`clickhouse`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/clickhouse)/[`plausible`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible) (Kustomize shape). +- [Set up a new app](new-app.md) — the sibling runbook for onboarding a business application (not a platform component).