Two agent-oriented runbooks under vibe/runbooks/ with [AGENT]/[HUMAN] step markers, grounded in real diffs: - new-tool.md : add a platform component to the tools repo so ArgoCD deploys it into the tools namespace (wrapper Chart.yaml + the tool library + a row in chart/values.yaml; optional iac/ for secrets). Mirrors the prometheus/crowdsec additions. - new-app.md : stand up a brand-new application across THREE repos (app + factory + tools) with the strict ordering dependency and the TERRAFORM_SSH_KEY pitfall. Phase-by-phase mapped to the dance-lessons-coach onboarding PRs (#89/#97/#98/#99/#100), factory #1/#2, tools #1; the FR doc/runbooks/new-web-app is linked as the detailed companion. 2 mermaid diagrams MCP-validated; zero dead links across the vibe tree. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
19 KiB
vibe > Runbooks > Set up a new app
Set up a new app
Status: ✅ Active Audience: platform operator + agents (English). For the detailed human-facing procedure see the French new-web-app runbook. Last Updated: 2026-06-23
TL;DR
Tip
Standing up a brand-new application touches three repos — the app's own Gitea repo,
factory, andtools— with a strict ordering dependency. An agent may write every file and open every PR ([AGENT]), but each merge/apply is[HUMAN]-gated. The single rule that everything else hangs on: the factory Postgres DB+role and the tools Vault JWT role MUST be applied before the app's owniac/runs. Ship the app in degraded mode first (no DB/Vault), wire the platform sides, then turn on dynamic credentials last. The detailed companion is the French new-web-app runbook; this page is its agent-oriented English mirror.
Caution
Ordering is load-bearing — do not reorder the phases.
- The app's own
iac/(Phase 6) calls the sharedapp_rolesmodule, which issuesGRANT <app>_role TO …on every dynamic credential and authenticates to Vault asgitea_cicd_<app>. So both of these must already exist:
- the Postgres role
<app>_role+ database<app>→ created by the factory side (Phase 4).- the Vault JWT role
gitea_cicd_<app>+ policies<app>/<app>-ops→ created by the tools side (Phase 5).- The app's
vault.yamlCI needs theTERRAFORM_SSH_KEYActions secret (thetofu_module_readerSSH key from Vault) orterraform initcannot clone theapp_rolesmodule overgit::ssh://. This is the canonical pitfall — it sank the firstiac/push and was fixed in dance-lessons-coach PR #100. Apply Phases 4 and 5 before merging Phase 6.
Scope
This runbook covers standing up a brand-new application end-to-end: its own Gitea repo, a Helm chart/, CI/CD with IaC (iac/ + .gitea/workflows/), and database access — all deployed by factory ArgoCD into a dedicated namespace. Systems touched: Gitea (repo + Actions + container registry), Postgres (DB + owner role via factory), Vault (JWT CI role, policies, dynamic DB creds via tools + app), k3s (namespace, pod, SA), ArgoCD (Application sync + image-updater), and Traefik (ingress).
It does not cover: writing the application code itself, the one-time platform foundations (Vault mounts, the Vault→Postgres connection, the gitea_cicd bootstrap JWT role, the tofu_module_reader bot, org-level Actions secrets — all already in place), or adding a non-application platform component (see Set up a new tool).
The reference onboarding is dance-lessons-coach (verified from its merged PRs), with webapp as the canonical app to clone.
Preconditions
- Working in a worktree under
.claude/worktrees/<slug>/(never the trunk). - You can create a Gitea repo under
arcodange-org(default) orarcodange(for some apps). - Local clones of
factoryandtoolsare available and on syncedmain. - The
<app>name is chosen — kebab-case, lowercase. This is the universal join key: the same string is reused verbatim across Gitea, Postgres, Vault, Kubernetes, ArgoCD, GCS, and DNS. One typo silently breaks the chain. See naming-conventions and the FR conventions. - The platform foundations exist (Vault mounts
kvv2/postgres/transit+ authkubernetes, the Vault→Postgres connection viacredentials_editor, the bootstrapgitea_cicdrole, thetofu_module_readerSSH bot, and org Actions secretsHOMELAB_CA_CERT/vault_oauth__sh_b64/PACKAGES_TOKEN).
The three-repo onboarding (ordering)
%%{init: {'theme':'base','themeVariables':{'fontSize':'14px'}}}%%
flowchart TB
classDef app fill:#2563eb,stroke:#1e40af,color:#ffffff
classDef plat fill:#059669,stroke:#047857,color:#ffffff
classDef tools fill:#7c3aed,stroke:#6d28d9,color:#ffffff
classDef run fill:#b45309,stroke:#92400e,color:#ffffff
P1["Phase 1-3 · APP repo<br>chart/ degraded + Vault-ready (gated) + TLS<br>(serves, no DB/Vault yet)"]:::app
P4["Phase 4 · FACTORY repo<br>argocd/values.yaml + postgres/iac<br>→ DB <app> + role <app>_role"]:::plat
P5["Phase 5 · TOOLS repo<br>hashicorp-vault/iac<br>→ gitea_cicd_<app> + policies"]:::tools
P6["Phase 6 · APP repo<br>iac/ (app_roles module) + vault.yaml<br>+ TERRAFORM_SSH_KEY secret"]:::app
P7["Phase 7-8 · APP repo<br>vault.enabled=true + dockerimage.yaml<br>→ dynamic creds on, image rollout"]:::run
P1 --> P4
P1 --> P5
P4 --> P6
P5 --> P6
P6 --> P7
- Phases 1-3 (app repo): ship the chart in degraded mode, make it Vault-ready behind a default-off gate, and set the right ingress — none of this needs the platform sides yet.
- Phase 4 (factory) and Phase 5 (tools) are independent of each other but both must be applied before Phase 6.
- Phase 6 (app repo) applies the app's own
iac/, which depends on the role/JWT created in 4 and 5, and needs theTERRAFORM_SSH_KEYsecret. - Phases 7-8 (app repo) flip
vault.enabled=truefor live dynamic DB creds, then add the image-build CI so ArgoCD's image-updater rolls out releases.
Procedure
Phase 0 — Choose the name and create the repo
- [HUMAN] Fix
<app>(kebab-case) and the Gitea org. Default org isarcodange-org; some apps live underarcodange(e.g.dance-lessons-coach,telegram-gateway). Create the empty repo under the chosen org. Inheriting org-level Actions secrets is why the org choice matters.
Phase 1 — App in degraded mode
Mirrors dance-lessons-coach PR #89. Clone the webapp pattern.
-
[AGENT] Add a
Dockerfileand a Helmchart/(deployment,service,ingress,serviceaccount,configmap,_helpers.tpl,NOTES.txt) with no DB/Vault wiring. Set:- ingress host
<app>.arcodange.lab(internal) and/or<app>.arcodange.fr(public) — TLS details land in Phase 3; - a
nodeSelectorofkubernetes.io/hostname: pi1(network entrypoint, preserves the user IP, avoids NAT); /healthz(or the app's real path, e.g.dance-lessons-coachuses/api/healthz) for both liveness and readiness probes;- leave any DB host empty so the pod serves in degraded mode.
# [AGENT] lint + render before opening the PR — safe, no cluster contact helm lint chart/ helm template chart/ --set image.repository=test --set image.tag=v1 - ingress host
-
[HUMAN] Open and merge the PR. Verify the app serves in degraded mode (binary + health endpoint reachable once ArgoCD picks it up in Phase 4+).
Phase 2 — Make the chart Vault-ready (gated, default off)
Mirrors dance-lessons-coach PR #97.
-
[AGENT] Add
VaultAuth,VaultStaticSecret, andVaultDynamicSecrettemplates, each gated behind.Values.vault.enabled(defaultfalse) so a plainhelm installkeeps working. The referencevalues.yamlexposes:# chart/values.yaml — gate + the three Vault join keys (all derived from <app>) vault: enabled: false role: <app> # k8s auth backend role (matches iac/main.tf) kvv2Path: <app>/config # KVv2 secret path postgresPath: creds/<app> # postgres dynamic creds pathThe
VaultAuthtargets the k8s role<app>with the app's ServiceAccount and audiencevault; theVaultDynamicSecretreadspostgres/creds/<app>into adb-credentialsSecret androlloutRestartTargetsthe Deployment. -
[HUMAN] Open and merge the PR. The chart is now Vault-ready without activating any Vault dependency.
Phase 3 — Ingress / TLS
Mirrors dance-lessons-coach PR #98. Pick by host suffix:
-
[AGENT] For a
.labhost:traefik.../router.entrypoints: websecure+router.tls: "true"+router.tls.certresolver: letsencrypt(withrouter.tls.domains.0.main: arcodange.laband…sans: <app>.arcodange.lab) +router.middlewares: localIp@file. For a.frhost:router.entrypoints: web+router.middlewares: kube-system-crowdsec@kubernetescrd. (Convention:.lab= internal, websecure + localIp + letsencrypt;.fr= public, web + crowdsec.) -
[HUMAN] Merge the PR.
Phase 4 — FACTORY side (DB + role, ArgoCD enrollment)
Mirrors factory PR #1 (ArgoCD) and factory PR #2 (Postgres). Link: postgres-iac, ci-apply-flow.
-
[AGENT] Enroll
<app>inargocd/values.yamlundergitea_applications. The apps template defaults the org toarcodange-org({{- $org := default "arcodange-org" $app_attr.org -}}), so addorg: arcodangeonly if the app is not underarcodange-org. Add image-updater annotations for digest-based rollout:# argocd/values.yaml — under gitea_applications <app>: org: arcodange # ← ONLY if not arcodange-org annotations: argocd-image-updater.argoproj.io/image-list: <app>=gitea.arcodange.lab/<org>/<app>:latest argocd-image-updater.argoproj.io/<app>.update-strategy: digest -
[AGENT] Add
"<app>"to theapplicationslist inpostgres/iac/terraform.tfvars. This creates the<app>database, the non-login owner role<app>_role, and the pgbounceruser_lookup()function.# postgres/iac/terraform.tfvars applications = [ "webapp", "erp", "crowdsec", "plausible", "dance-lessons-coach", "<app>", # ← add ] -
[HUMAN] Merge both PRs. Factory CI (
postgres.yaml) applies — the DB + role now exist. ArgoCD creates the Application and deploys the degraded chart into namespace<app>.
Phase 5 — TOOLS side (Vault JWT role + policies)
Mirrors tools PR #1. Link: tools secrets-and-vso, tools components.
-
[AGENT] Add
{ name = "<app>" }to theapplicationslist intools/hashicorp-vault/iac/terraform.tfvars. Via theapp_policy/app_rolesmodules this creates thegitea_cicd_<app>JWT role, the<app>(runtime) and<app>-ops(CI) policies, the<app>-opsidentity group, and the k8s auth role.# tools/hashicorp-vault/iac/terraform.tfvars applications = [ { name = "webapp" }, { name = "erp" }, { name = "<app>" }, # ← add # optional fields when needed: # { name = "<app>", ops_policies = ["…"], service_account_names = ["…"], service_account_namespaces = ["tools"] } ] -
[HUMAN] Merge the PR. Tools CI (
vault.yaml) applies —gitea_cicd_<app>and the policies now exist.
Phase 6 — App IaC + Vault workflow
Mirrors dance-lessons-coach PR #99 and the #100 fix. See 05-app-terraform for the module contract.
Caution
Phases 4 and 5 must already be applied before merging this phase, or the first
tofu applyfails (no<app>_roleto GRANT, or Vault auth fails on the missinggitea_cicd_<app>role).
-
[AGENT] Add the app's
iac/:-
providers.tf— Vault provider withauth_login_jwt { mount = "gitea_jwt", role = "gitea_cicd_<app>" }. -
backend.tf— GCS backendbucket = "arcodange-tf",prefix = "<app>/main". -
main.tf— call the shared module (the exact source string used by every app):module "app_roles" { source = "git::ssh://git@192.168.1.202:2222/arcodange-org/tools.git//hashicorp-vault/iac/modules/app_roles?depth=1&ref=main" name = "<app>" }This provisions
postgres/creds/<app>(dynamic DB role inheriting<app>_role) and the k8s auth role<app>. Add any app-specifickvv2/<app>/configsecrets alongside.
-
-
[AGENT] Add
.gitea/workflows/vault.yamlthat authenticates via Gitea OIDC and runstofu apply iac/. Thevault-actionstep'srole:andproviders.tf'srolemust both begitea_cicd_<app>(the copy-paste trap —erpstill carries a stalegitea_cicd_webapp). The secrets block must read the SSH key:# .gitea/workflows/vault.yaml — vault-action secrets block secrets: | kvv1/google/credentials credentials | GOOGLE_BACKEND_CREDENTIALS ; kvv1/gitea/tofu_module_reader ssh_private_key | TERRAFORM_SSH_KEY ; -
[HUMAN] Add the
TERRAFORM_SSH_KEYsecret (thetofu_module_readerSSH key, read from Vault atkvv1/gitea/tofu_module_reader) to the app repo's Actions secrets. Without it,terraform initcannot clone theapp_rolesmodule overgit::ssh://— the canonical pitfall fixed in PR #100. -
[HUMAN] Merge the PR. The app's
vault.yamlrunstofu apply—postgres/creds/<app>and the k8s role<app>now exist.
Phase 7 — Turn on dynamic DB credentials
-
[AGENT] Set
vault.enabled=trueinchart/values.yaml(and point the app's DB env atpgbouncer.tools:5432). On next ArgoCD sync, VSO authenticates with the k8s role<app>, fetches dynamic Postgres creds frompostgres/creds/<app>into thedb-credentialsSecret, and the pod reaches the DB through pgbouncer.tools with a short-lived user that inherits<app>_role. See webapp and erp for the consumption pattern. -
[HUMAN] Merge the PR.
Phase 8 — Image CI + deploy
-
[AGENT] Add
.gitea/workflows/dockerimage.yamlthat builds the image and pushes it to the Gitea registry (gitea.arcodange.lab/<org>/<app>:latest+ branch tag), logging in withPACKAGES_TOKEN. No deploy step is needed — the ArgoCD image-updater annotations from Phase 4 watchlatest(digest strategy) and roll it out. Skip this phase entirely for apps that run a public upstream image (e.g.erp/Dolibarr). -
[HUMAN] Merge the PR.
Verification
The convention chain must resolve end-to-end (this is the same parity check the safe-env PRD rehearses in the sandbox). All checks below are [AGENT] read-only:
# [AGENT] Gitea repo exists under the chosen org
git ls-remote https://gitea.arcodange.lab/<org>/<app> &>/dev/null && echo "repo OK"
# [AGENT] Postgres DB + owner role exist (run from a host with psql access to the engine)
psql -h 192.168.1.202 -U credentials_editor -tAc \
"SELECT datname FROM pg_database WHERE datname='<app>';"
psql -h 192.168.1.202 -U credentials_editor -tAc \
"SELECT rolname FROM pg_roles WHERE rolname='<app>_role';"
# [AGENT] Vault: dynamic role, policies, and CI JWT role exist
vault read postgres/roles/<app>
vault policy read <app>
vault policy read <app>-ops
vault read auth/gitea_jwt/role/gitea_cicd_<app>
# [AGENT] ArgoCD Application is Synced + Healthy
kubectl --context <ctx> -n argocd get application <app> \
-o jsonpath='{.status.sync.status}/{.status.health.status}'
# expected: Synced/Healthy
# [AGENT] VSO created the db-credentials Secret + pod is Running + ingress resolves
kubectl --context <ctx> -n <app> get secret db-credentials
kubectl --context <ctx> -n <app> get pods
curl -fsS https://<app>.arcodange.lab/healthz # or the app's real health path
Expected: repo present; PG <app> DB + <app>_role exist; Vault postgres/creds/<app> + policies <app>/<app>-ops + gitea_cicd_<app> exist; ArgoCD Application Synced/Healthy; the db-credentials Secret was created by VSO; the pod is Running; the ingress resolves.
Rollback
Revert the per-repo PRs in reverse order: app → tools → factory. Tag each undo just like the procedure.
- [HUMAN] App repo: revert Phase 8 → 7 → 6 PRs. Reverting the Phase 6
iac/removespostgres/creds/<app>and the k8s role on the next CI run; settingvault.enabled=falsereturns the chart to degraded mode. - [HUMAN] Tools repo: remove the
{ name = "<app>" }entry; tools CI prunesgitea_cicd_<app>+ policies. - [HUMAN] Factory repo: remove the
<app>entry fromargocd/values.yaml— ArgoCD prunes the Application (and its namespace) — and remove"<app>"frompostgres/iac/terraform.tfvarsto drop the DB + role. - [HUMAN] For a full cluster-level recovery (power-cut, lost unseal key) consult
CLUSTER_RECOVERY.md.
Warning
Removing the Postgres entry drops the database
<app>and its data. Back up first if the app already holds state.
References
- French human-operator procedure: new-web-app runbook + conventions (the universal
<app>join key). - Exemplars: webapp (in-house image + DB) and erp (public image + DB).
- Platform mechanics: tools secrets-and-vso, tools components, postgres-iac, ci-apply-flow, naming-conventions, secrets-and-vault.
- Companion runbook: Set up a new tool.
- Parity rehearsal: safe-prod-like-environment ADR/PRD.
- Factory files: argocd/values.yaml, argocd/templates/apps.yaml, postgres/iac/terraform.tfvars.
- Reference PRs (verified, all merged):
- app
dance-lessons-coach: #89 degraded · #97 Vault-ready gate · #98 TLS ingress · #99 iac + workflow · #100 TERRAFORM_SSH_KEY fix - factory: #1 ArgoCD enroll + org override · #2 Postgres DB + role
- tools: #1 Vault JWT role + policy
- app