docs(vibe): add new-tool and new-app runbooks (grounded in real PRs)

Two agent-oriented runbooks under vibe/runbooks/ with [AGENT]/[HUMAN] step
markers, grounded in real diffs:

- new-tool.md : add a platform component to the tools repo so ArgoCD deploys it
  into the tools namespace (wrapper Chart.yaml + the tool library + a row in
  chart/values.yaml; optional iac/ for secrets). Mirrors the prometheus/crowdsec
  additions.
- new-app.md  : stand up a brand-new application across THREE repos (app +
  factory + tools) with the strict ordering dependency and the TERRAFORM_SSH_KEY
  pitfall. Phase-by-phase mapped to the dance-lessons-coach onboarding PRs
  (#89/#97/#98/#99/#100), factory #1/#2, tools #1; the FR doc/runbooks/new-web-app
  is linked as the detailed companion.

2 mermaid diagrams MCP-validated; zero dead links across the vibe tree.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-23 22:22:09 +02:00
parent 7bf83e75ed
commit 2d76eb45c1
3 changed files with 550 additions and 0 deletions

279
vibe/runbooks/new-tool.md Normal file
View File

@@ -0,0 +1,279 @@
[vibe](../README.md) > [Runbooks](README.md) > **Set up a new tool**
# Set up a new tool
> **Status:** ✅ Active
> **Audience:** platform operator + agents (English). For the application-onboarding equivalent see [Set up a new app](new-app.md).
> **Last Updated:** 2026-06-23
## TL;DR
> [!TIP]
> Adding a platform component means dropping a small **wrapper chart** into the `tools` repo and registering it in the app-of-apps. An agent can do the bulk of it: scaffold `tools/<tool>/` (a wrapper `Chart.yaml` that depends on the upstream chart + the local `tool` library chart, the two `helm-chart*.yaml` templates, and a `values.yaml`), add one key under `tools:` in [`tools/chart/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/values.yaml), and lint it locally. The **human approval gate** sits at two places: (1) any Vault/database wiring under `tools/<tool>/iac/` and (2) opening + merging the PR — ArgoCD auto-syncs the new Application the moment it lands on `main`.
## Scope
This runbook covers adding a **new platform component** (monitoring, cache, security engine, connection pooler, analytics, …) to the [`tools` repo](https://gitea.arcodange.lab/arcodange-org/tools) so the factory ArgoCD `tools` project renders an Application for it and deploys it into the **`tools` namespace**.
Systems touched: Gitea (`tools` repo), ArgoCD (the `tools` AppProject), k3s (the helm-controller that materialises each `HelmChart` CR), and — only for secret-backed tools — Vault + the Vault Secrets Operator (VSO).
This runbook does **not** cover standing up a brand-new business application (its own repo, chart, CI/CD, database). That is the [Set up a new app](new-app.md) runbook. It also does not cover the underlying app-of-apps wiring of the `tools` project itself — read the [tools guidebook](../guidebooks/tools/README.md) for how that works.
## Preconditions
- [ ] Working in a worktree under `.claude/worktrees/<slug>/` of a `tools` repo clone (never the trunk).
- [ ] The tool deploys into the **`tools` namespace** (the `tools` AppProject only permits that destination).
- [ ] You know the **upstream Helm chart** (chart name + repo URL) and a **pinned version**, OR you have decided this tool needs **Kustomize + helm inflation** (charts that require post-render patching, like `clickhouse`/`plausible`).
- [ ] `helm` (with the upstream repo reachable) and, for the Kustomize path, `kustomize` available locally for the lint step.
- [ ] If the tool needs secrets or a database: confidence with the Vault `app_roles` module pattern and the `tofu-apply` CI flow — see the [tools secrets & VSO page](../guidebooks/tools/secrets-and-vso.md) and the [tofu CI apply flow](../guidebooks/factory-provisioning/opentofu/ci-apply-flow.md).
## Procedure
1. **[HUMAN]** Choose the tool name `<tool>` (kebab-case) and the deployment shape.
Decide between the two supported shapes:
- **Wrapper chart (default).** A thin Helm chart that depends on the upstream chart at a pinned version and lets the local `tool` library chart emit a k3s `HelmChart` custom resource. Used by [`prometheus`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/prometheus) and [`crowdsec`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec).
- **Kustomize + helm inflation.** For charts that need post-render JSON6902 patches or extra `resources/`. Used by [`clickhouse`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/clickhouse) and [`plausible`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible).
Pin the upstream chart **version** now — it goes verbatim into the next step.
2. **[AGENT]** Scaffold `tools/<tool>/` (wrapper-chart shape).
Create four files. The `Chart.yaml` declares **two** dependencies — the local `tool` library chart (served from the Gitea Helm package registry) and the upstream chart pinned to your chosen version:
```yaml
# tools/<tool>/Chart.yaml
apiVersion: v2
name: <tool>
description: A Helm chart for Kubernetes
dependencies:
- name: tool
version: 0.1.0
repository: https://gitea.arcodange.lab/api/packages/arcodange-org/helm
- name: <upstream-chart>
version: <pinned-version>
repository: https://<upstream-helm-repo>
type: application
version: 0.1.0
```
The two template files are one-liners that delegate to the `tool` library (they only render when `tool.kind` is `HelmChart`; under `SubChart` they are inert and the upstream chart is pulled as a normal dependency):
```yaml
# tools/<tool>/templates/helm-chart.yaml
{{- if eq .Values.tool.kind "HelmChart" -}}
{{- include "tool.helm-chart.tpl" . -}}
{{- end -}}
```
```yaml
# tools/<tool>/templates/helm-chart-config.yaml
{{- if eq .Values.tool.kind "HelmChart" -}}
{{- include "tool.helm-chart-config.tpl" . -}}
{{- end -}}
```
The `values.yaml` carries the upstream values under a YAML anchor and re-references it from the `tool:` block. Web-facing tools set an ingress host `<tool>.arcodange.lab`; stateful tools set persistence with the longhorn storage class and resource requests/limits. The shape, taken from [`prometheus/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/prometheus/values.yaml):
```yaml
# tools/<tool>/values.yaml
<upstream-chart>: &<tool>_config
# ── upstream values go here ──
# web-facing tools: expose an ingress host
ingress:
enabled: true
hosts:
- <tool>.arcodange.lab
# stateful tools: pin storage class + size
persistence:
enabled: true
storageClass: longhorn
size: 8Gi
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
tool:
# kind 'SubChart': pull the upstream chart as a dependency and pass it the values below.
# kind 'HelmChart': let the tool library emit a k3s HelmChart CR instead.
kind: 'SubChart'
repo: https://<upstream-helm-repo>
chart: <upstream-chart>
version: <pinned-version>
values: *<tool>_config
```
> [!NOTE]
> Under `tool.kind: 'HelmChart'` the local [`tool` library chart](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/tool) emits a `helm.cattle.io/v1` `HelmChart` CR (and an optional `HelmChartConfig`) pinned to `namespace: tools` / `targetNamespace: tools`, and the k3s helm-controller installs the upstream chart. Under `'SubChart'` (the default that prometheus and crowdsec use) the upstream chart is just a Helm dependency rendered in-line. Pick `SubChart` unless you specifically need the helm-controller to own the release.
For the **Kustomize shape** instead, skip the wrapper `Chart.yaml`/templates and create a `kustomization.yaml` that inflates the upstream chart plus any `resources/`, mirroring [`plausible/kustomization.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible/kustomization.yaml):
```yaml
# tools/<tool>/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: tools
helmCharts:
- name: <upstream-chart>
repo: https://<upstream-helm-repo>
version: <pinned-version>
releaseName: <tool>
valuesFile: <tool>Values.yaml
namespace: tools
resources:
- resources/ingressroute.yaml
# patches: / patchesJson6902: ← post-render tweaks, see plausible for a worked example
```
3. **[AGENT]** Register the tool in the app-of-apps.
Add a single key for `<tool>` under `tools:` in [`tools/chart/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/values.yaml):
```yaml
# tools/chart/values.yaml
tools:
pgbouncer: {}
hashicorp-vault: {}
crowdsec: {}
# …existing entries…
<tool>: {}
```
The `chart/templates/apps.yaml` template ranges over `.Values.tools` and renders one ArgoCD `Application` per key, with `path: <tool>` and `destination.namespace: tools` under the `tools` AppProject. The key **must match the directory name** you created in step 2. See the [tools guidebook](../guidebooks/tools/README.md) for how the app-of-apps meta-chart drives this.
4. **[HUMAN]** If the tool needs **secrets** or a **database**, wire Vault + VSO and a tofu-apply workflow.
This step mutates Vault (creates roles/secrets) and so is gated. Use [`crowdsec`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec) (dynamic Postgres role) and [`plausible`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible) (kvv2 static secrets) as the worked examples, and read the [tools secrets & VSO page](../guidebooks/tools/secrets-and-vso.md).
a. Add `tools/<tool>/iac/` — OpenTofu that configures Vault. For a dynamic Postgres role, reuse the shared `app_roles` module exactly as crowdsec does:
```hcl
# tools/<tool>/iac/main.tf
module "app_roles" {
source = "git::ssh://git@192.168.1.202:2222/arcodange-org/tools.git//hashicorp-vault/iac/modules/app_roles?depth=1&ref=main"
name = "<tool>"
service_account_namespaces = ["tools"]
}
# for kvv2 static config, add vault_kv_secret_v2 resources (see plausible/iac/main.tf)
```
Pair it with a `backend.tf` (GCS state at `prefix = "tools/<tool>/main"`) and a `providers.tf` whose `auth_login_jwt` role is `gitea_cicd_<tool>` — both copied from crowdsec.
b. Add the VSO CRDs to the chart templates so VSO mints a k8s Secret the workload consumes. A `serviceaccount.yaml`, a `VaultAuth` bound to a Vault `kubernetes` role named `<tool>`, and a `VaultDynamicSecret` (or `VaultStaticSecret` for kvv2) pointing at the Vault path:
```yaml
# tools/<tool>/templates/vaultauth.yaml
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultAuth
metadata:
name: <tool>
namespace: {{ .Release.Namespace }}
spec:
vaultConnectionRef: default
method: kubernetes
mount: kubernetes
kubernetes:
role: <tool>
serviceAccount: <tool>
audiences:
- vault
```
```yaml
# tools/<tool>/templates/vaultdynamicsecret.yaml
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultDynamicSecret
metadata:
name: <tool>-db-credentials
namespace: {{ .Release.Namespace }}
spec:
mount: postgres
path: creds/<tool>
destination:
create: true
name: <tool>-db-credentials
rolloutRestartTargets:
- kind: Deployment
name: <tool>
vaultAuthRef: <tool>
```
Then reference the VSO-created secret from the workload (env `valueFrom.secretKeyRef`), as crowdsec's `values.yaml` does for `DB_USER`/`DB_PASSWORD`. For the Kustomize shape, add these CRDs as files under `resources/` and list them in `kustomization.yaml` instead of `templates/`.
c. Add a `.gitea/workflows/<tool>.yaml` that tofu-applies `<tool>/iac` on changes, mirroring [`crowdsec.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/.gitea/workflows/crowdsec.yaml): a path filter on `'<tool>/**/*.tf'`, a Gitea→Vault JWT auth job, and a `dflook/terraform-apply` step with `path: <tool>/iac`. See the [tofu CI apply flow](../guidebooks/factory-provisioning/opentofu/ci-apply-flow.md) for what that pipeline does end to end.
5. **[AGENT]** Lint and render locally before opening the PR.
For the wrapper-chart shape:
```bash
helm dependency update tools/<tool>
helm lint tools/<tool>
helm template <tool> tools/<tool> | head -n 60
# render the app-of-apps Application for <tool>:
helm template tools-apps tools/chart | grep -A12 "name: <tool>"
```
For the Kustomize shape:
```bash
kustomize build --enable-helm tools/<tool> | head -n 60
```
6. **[HUMAN]** Open a PR on the `tools` repo, get it reviewed, and merge.
```bash
git checkout -b arcodange/<slug>
git add tools/<tool> tools/chart/values.yaml
git commit -m "declare <tool>"
git push -u origin arcodange/<slug>
```
> [!IMPORTANT]
> The `tools` repo is on **Gitea**, not GitHub — open the PR with the `mcp__gitea__*` tools (load `select:mcp__gitea__pull_request_write` via `ToolSearch`), not `gh`. Once the PR merges to `main`, ArgoCD detects the new key in `chart/values.yaml`, renders the `<tool>` Application, and syncs it automatically.
## Verification
All read-only — an agent can run these after the PR merges and ArgoCD has reconciled.
```bash
# 1. The ArgoCD Application for <tool> is Synced + Healthy
kubectl --context <ctx> -n argocd get application <tool> \
-o jsonpath='{.status.sync.status}/{.status.health.status}{"\n"}'
# expected: Synced/Healthy
# 2. The pod is Running in the tools namespace
kubectl --context <ctx> -n tools get pods -l app.kubernetes.io/name=<tool>
# expected: <tool>-… 1/1 Running
# 3. Web-facing tools: the ingress is admitted and the host resolves
kubectl --context <ctx> -n tools get ingress | grep <tool>
curl -sI https://<tool>.arcodange.lab | head -n1 # expected: HTTP/2 200 (or app login redirect)
# 4. Secret-backed tools: VSO created the k8s Secret
kubectl --context <ctx> -n tools get secret <tool>-db-credentials
# expected: the Secret exists with the keys the workload mounts
```
## Rollback
- **[HUMAN]** Revert the `tools/chart/values.yaml` entry (remove the `<tool>:` key). On the next sync ArgoCD **prunes** the `<tool>` Application — `prune: true` is set in `apps.yaml` — which removes the deployed workload from the `tools` namespace.
- **[HUMAN]** In a follow-up PR, delete the `tools/<tool>/` directory to remove the wrapper chart / Kustomize source.
- **[HUMAN]** For secret-backed tools, the Vault role/secret created by `tools/<tool>/iac/` is **not** removed by ArgoCD. Destroy it explicitly (`tofu -chdir=tools/<tool>/iac destroy`) or remove the IaC and let the workflow reconcile, and drop the `.gitea/workflows/<tool>.yaml` file.
- For a full cluster-level recovery (power cut, lost quorum) follow CLUSTER_RECOVERY.md.
## References
- [Tools guidebook](../guidebooks/tools/README.md) — how the app-of-apps meta-chart turns each `tools:` key into an ArgoCD Application.
- [Tools components](../guidebooks/tools/components.md) — the catalogue of platform components and what each provides.
- [Tools secrets & VSO](../guidebooks/tools/secrets-and-vso.md) — the Vault `app_roles` + VaultAuth/VaultDynamicSecret pattern used in step 4.
- [Tofu CI apply flow](../guidebooks/factory-provisioning/opentofu/ci-apply-flow.md) — what the `<tool>/iac` tofu-apply workflow does end to end.
- Real examples in the `tools` repo: [`prometheus`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/prometheus) and [`crowdsec`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec) (wrapper-chart shape), the shared [`tool` library chart](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/tool), and [`clickhouse`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/clickhouse)/[`plausible`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible) (Kustomize shape).
- [Set up a new app](new-app.md) — the sibling runbook for onboarding a business application (not a platform component).