Files
factory/vibe/runbooks/new-app.md
Gabriel Radureau 2d76eb45c1 docs(vibe): add new-tool and new-app runbooks (grounded in real PRs)
Two agent-oriented runbooks under vibe/runbooks/ with [AGENT]/[HUMAN] step
markers, grounded in real diffs:

- new-tool.md : add a platform component to the tools repo so ArgoCD deploys it
  into the tools namespace (wrapper Chart.yaml + the tool library + a row in
  chart/values.yaml; optional iac/ for secrets). Mirrors the prometheus/crowdsec
  additions.
- new-app.md  : stand up a brand-new application across THREE repos (app +
  factory + tools) with the strict ordering dependency and the TERRAFORM_SSH_KEY
  pitfall. Phase-by-phase mapped to the dance-lessons-coach onboarding PRs
  (#89/#97/#98/#99/#100), factory #1/#2, tools #1; the FR doc/runbooks/new-web-app
  is linked as the detailed companion.

2 mermaid diagrams MCP-validated; zero dead links across the vibe tree.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 22:22:09 +02:00

19 KiB

vibe > Runbooks > Set up a new app

Set up a new app

Status: Active Audience: platform operator + agents (English). For the detailed human-facing procedure see the French new-web-app runbook. Last Updated: 2026-06-23

TL;DR

Tip

Standing up a brand-new application touches three repos — the app's own Gitea repo, factory, and tools — with a strict ordering dependency. An agent may write every file and open every PR ([AGENT]), but each merge/apply is [HUMAN]-gated. The single rule that everything else hangs on: the factory Postgres DB+role and the tools Vault JWT role MUST be applied before the app's own iac/ runs. Ship the app in degraded mode first (no DB/Vault), wire the platform sides, then turn on dynamic credentials last. The detailed companion is the French new-web-app runbook; this page is its agent-oriented English mirror.

Caution

Ordering is load-bearing — do not reorder the phases.

  • The app's own iac/ (Phase 6) calls the shared app_roles module, which issues GRANT <app>_role TO … on every dynamic credential and authenticates to Vault as gitea_cicd_<app>. So both of these must already exist:
    • the Postgres role <app>_role + database <app> → created by the factory side (Phase 4).
    • the Vault JWT role gitea_cicd_<app> + policies <app> / <app>-ops → created by the tools side (Phase 5).
  • The app's vault.yaml CI needs the TERRAFORM_SSH_KEY Actions secret (the tofu_module_reader SSH key from Vault) or terraform init cannot clone the app_roles module over git::ssh://. This is the canonical pitfall — it sank the first iac/ push and was fixed in dance-lessons-coach PR #100. Apply Phases 4 and 5 before merging Phase 6.

Scope

This runbook covers standing up a brand-new application end-to-end: its own Gitea repo, a Helm chart/, CI/CD with IaC (iac/ + .gitea/workflows/), and database access — all deployed by factory ArgoCD into a dedicated namespace. Systems touched: Gitea (repo + Actions + container registry), Postgres (DB + owner role via factory), Vault (JWT CI role, policies, dynamic DB creds via tools + app), k3s (namespace, pod, SA), ArgoCD (Application sync + image-updater), and Traefik (ingress).

It does not cover: writing the application code itself, the one-time platform foundations (Vault mounts, the Vault→Postgres connection, the gitea_cicd bootstrap JWT role, the tofu_module_reader bot, org-level Actions secrets — all already in place), or adding a non-application platform component (see Set up a new tool).

The reference onboarding is dance-lessons-coach (verified from its merged PRs), with webapp as the canonical app to clone.

Preconditions

  • Working in a worktree under .claude/worktrees/<slug>/ (never the trunk).
  • You can create a Gitea repo under arcodange-org (default) or arcodange (for some apps).
  • Local clones of factory and tools are available and on synced main.
  • The <app> name is chosen — kebab-case, lowercase. This is the universal join key: the same string is reused verbatim across Gitea, Postgres, Vault, Kubernetes, ArgoCD, GCS, and DNS. One typo silently breaks the chain. See naming-conventions and the FR conventions.
  • The platform foundations exist (Vault mounts kvv2/postgres/transit + auth kubernetes, the Vault→Postgres connection via credentials_editor, the bootstrap gitea_cicd role, the tofu_module_reader SSH bot, and org Actions secrets HOMELAB_CA_CERT / vault_oauth__sh_b64 / PACKAGES_TOKEN).

The three-repo onboarding (ordering)

%%{init: {'theme':'base','themeVariables':{'fontSize':'14px'}}}%%
flowchart TB
    classDef app fill:#2563eb,stroke:#1e40af,color:#ffffff
    classDef plat fill:#059669,stroke:#047857,color:#ffffff
    classDef tools fill:#7c3aed,stroke:#6d28d9,color:#ffffff
    classDef run fill:#b45309,stroke:#92400e,color:#ffffff

    P1["Phase 1-3 · APP repo<br>chart/ degraded + Vault-ready (gated) + TLS<br>(serves, no DB/Vault yet)"]:::app
    P4["Phase 4 · FACTORY repo<br>argocd/values.yaml + postgres/iac<br>→ DB &lt;app&gt; + role &lt;app&gt;_role"]:::plat
    P5["Phase 5 · TOOLS repo<br>hashicorp-vault/iac<br>→ gitea_cicd_&lt;app&gt; + policies"]:::tools
    P6["Phase 6 · APP repo<br>iac/ (app_roles module) + vault.yaml<br>+ TERRAFORM_SSH_KEY secret"]:::app
    P7["Phase 7-8 · APP repo<br>vault.enabled=true + dockerimage.yaml<br>→ dynamic creds on, image rollout"]:::run

    P1 --> P4
    P1 --> P5
    P4 --> P6
    P5 --> P6
    P6 --> P7
  1. Phases 1-3 (app repo): ship the chart in degraded mode, make it Vault-ready behind a default-off gate, and set the right ingress — none of this needs the platform sides yet.
  2. Phase 4 (factory) and Phase 5 (tools) are independent of each other but both must be applied before Phase 6.
  3. Phase 6 (app repo) applies the app's own iac/, which depends on the role/JWT created in 4 and 5, and needs the TERRAFORM_SSH_KEY secret.
  4. Phases 7-8 (app repo) flip vault.enabled=true for live dynamic DB creds, then add the image-build CI so ArgoCD's image-updater rolls out releases.

Procedure

Phase 0 — Choose the name and create the repo

  1. [HUMAN] Fix <app> (kebab-case) and the Gitea org. Default org is arcodange-org; some apps live under arcodange (e.g. dance-lessons-coach, telegram-gateway). Create the empty repo under the chosen org. Inheriting org-level Actions secrets is why the org choice matters.

Phase 1 — App in degraded mode

Mirrors dance-lessons-coach PR #89. Clone the webapp pattern.

  1. [AGENT] Add a Dockerfile and a Helm chart/ (deployment, service, ingress, serviceaccount, configmap, _helpers.tpl, NOTES.txt) with no DB/Vault wiring. Set:

    • ingress host <app>.arcodange.lab (internal) and/or <app>.arcodange.fr (public) — TLS details land in Phase 3;
    • a nodeSelector of kubernetes.io/hostname: pi1 (network entrypoint, preserves the user IP, avoids NAT);
    • /healthz (or the app's real path, e.g. dance-lessons-coach uses /api/healthz) for both liveness and readiness probes;
    • leave any DB host empty so the pod serves in degraded mode.
    # [AGENT] lint + render before opening the PR — safe, no cluster contact
    helm lint chart/
    helm template chart/ --set image.repository=test --set image.tag=v1
    
  2. [HUMAN] Open and merge the PR. Verify the app serves in degraded mode (binary + health endpoint reachable once ArgoCD picks it up in Phase 4+).

Phase 2 — Make the chart Vault-ready (gated, default off)

Mirrors dance-lessons-coach PR #97.

  1. [AGENT] Add VaultAuth, VaultStaticSecret, and VaultDynamicSecret templates, each gated behind .Values.vault.enabled (default false) so a plain helm install keeps working. The reference values.yaml exposes:

    # chart/values.yaml — gate + the three Vault join keys (all derived from <app>)
    vault:
      enabled: false
      role: <app>                     # k8s auth backend role (matches iac/main.tf)
      kvv2Path: <app>/config          # KVv2 secret path
      postgresPath: creds/<app>       # postgres dynamic creds path
    

    The VaultAuth targets the k8s role <app> with the app's ServiceAccount and audience vault; the VaultDynamicSecret reads postgres/creds/<app> into a db-credentials Secret and rolloutRestartTargets the Deployment.

  2. [HUMAN] Open and merge the PR. The chart is now Vault-ready without activating any Vault dependency.

Phase 3 — Ingress / TLS

Mirrors dance-lessons-coach PR #98. Pick by host suffix:

  1. [AGENT] For a .lab host: traefik.../router.entrypoints: websecure + router.tls: "true" + router.tls.certresolver: letsencrypt (with router.tls.domains.0.main: arcodange.lab and …sans: <app>.arcodange.lab) + router.middlewares: localIp@file. For a .fr host: router.entrypoints: web + router.middlewares: kube-system-crowdsec@kubernetescrd. (Convention: .lab = internal, websecure + localIp + letsencrypt; .fr = public, web + crowdsec.)

  2. [HUMAN] Merge the PR.

Phase 4 — FACTORY side (DB + role, ArgoCD enrollment)

Mirrors factory PR #1 (ArgoCD) and factory PR #2 (Postgres). Link: postgres-iac, ci-apply-flow.

  1. [AGENT] Enroll <app> in argocd/values.yaml under gitea_applications. The apps template defaults the org to arcodange-org ({{- $org := default "arcodange-org" $app_attr.org -}}), so add org: arcodange only if the app is not under arcodange-org. Add image-updater annotations for digest-based rollout:

    # argocd/values.yaml — under gitea_applications
    <app>:
      org: arcodange                 # ← ONLY if not arcodange-org
      annotations:
        argocd-image-updater.argoproj.io/image-list: <app>=gitea.arcodange.lab/<org>/<app>:latest
        argocd-image-updater.argoproj.io/<app>.update-strategy: digest
    
  2. [AGENT] Add "<app>" to the applications list in postgres/iac/terraform.tfvars. This creates the <app> database, the non-login owner role <app>_role, and the pgbouncer user_lookup() function.

    # postgres/iac/terraform.tfvars
    applications = [
        "webapp",
        "erp",
        "crowdsec",
        "plausible",
        "dance-lessons-coach",
        "<app>",   # ← add
    ]
    
  3. [HUMAN] Merge both PRs. Factory CI (postgres.yaml) applies — the DB + role now exist. ArgoCD creates the Application and deploys the degraded chart into namespace <app>.

Phase 5 — TOOLS side (Vault JWT role + policies)

Mirrors tools PR #1. Link: tools secrets-and-vso, tools components.

  1. [AGENT] Add { name = "<app>" } to the applications list in tools/hashicorp-vault/iac/terraform.tfvars. Via the app_policy / app_roles modules this creates the gitea_cicd_<app> JWT role, the <app> (runtime) and <app>-ops (CI) policies, the <app>-ops identity group, and the k8s auth role.

    # tools/hashicorp-vault/iac/terraform.tfvars
    applications = [
      { name = "webapp" },
      { name = "erp" },
      { name = "<app>" },   # ← add
      # optional fields when needed:
      # { name = "<app>", ops_policies = ["…"], service_account_names = ["…"], service_account_namespaces = ["tools"] }
    ]
    
  2. [HUMAN] Merge the PR. Tools CI (vault.yaml) applies — gitea_cicd_<app> and the policies now exist.

Phase 6 — App IaC + Vault workflow

Mirrors dance-lessons-coach PR #99 and the #100 fix. See 05-app-terraform for the module contract.

Caution

Phases 4 and 5 must already be applied before merging this phase, or the first tofu apply fails (no <app>_role to GRANT, or Vault auth fails on the missing gitea_cicd_<app> role).

  1. [AGENT] Add the app's iac/:

    • providers.tf — Vault provider with auth_login_jwt { mount = "gitea_jwt", role = "gitea_cicd_<app>" }.

    • backend.tf — GCS backend bucket = "arcodange-tf", prefix = "<app>/main".

    • main.tf — call the shared module (the exact source string used by every app):

      module "app_roles" {
        source = "git::ssh://git@192.168.1.202:2222/arcodange-org/tools.git//hashicorp-vault/iac/modules/app_roles?depth=1&ref=main"
        name   = "<app>"
      }
      

      This provisions postgres/creds/<app> (dynamic DB role inheriting <app>_role) and the k8s auth role <app>. Add any app-specific kvv2/<app>/config secrets alongside.

  2. [AGENT] Add .gitea/workflows/vault.yaml that authenticates via Gitea OIDC and runs tofu apply iac/. The vault-action step's role: and providers.tf's role must both be gitea_cicd_<app> (the copy-paste trap — erp still carries a stale gitea_cicd_webapp). The secrets block must read the SSH key:

    # .gitea/workflows/vault.yaml — vault-action secrets block
    secrets: |
        kvv1/google/credentials credentials | GOOGLE_BACKEND_CREDENTIALS ;
        kvv1/gitea/tofu_module_reader ssh_private_key | TERRAFORM_SSH_KEY ;
    
  3. [HUMAN] Add the TERRAFORM_SSH_KEY secret (the tofu_module_reader SSH key, read from Vault at kvv1/gitea/tofu_module_reader) to the app repo's Actions secrets. Without it, terraform init cannot clone the app_roles module over git::ssh:// — the canonical pitfall fixed in PR #100.

  4. [HUMAN] Merge the PR. The app's vault.yaml runs tofu applypostgres/creds/<app> and the k8s role <app> now exist.

Phase 7 — Turn on dynamic DB credentials

  1. [AGENT] Set vault.enabled=true in chart/values.yaml (and point the app's DB env at pgbouncer.tools:5432). On next ArgoCD sync, VSO authenticates with the k8s role <app>, fetches dynamic Postgres creds from postgres/creds/<app> into the db-credentials Secret, and the pod reaches the DB through pgbouncer.tools with a short-lived user that inherits <app>_role. See webapp and erp for the consumption pattern.

  2. [HUMAN] Merge the PR.

Phase 8 — Image CI + deploy

  1. [AGENT] Add .gitea/workflows/dockerimage.yaml that builds the image and pushes it to the Gitea registry (gitea.arcodange.lab/<org>/<app>:latest + branch tag), logging in with PACKAGES_TOKEN. No deploy step is needed — the ArgoCD image-updater annotations from Phase 4 watch latest (digest strategy) and roll it out. Skip this phase entirely for apps that run a public upstream image (e.g. erp/Dolibarr).

  2. [HUMAN] Merge the PR.

Verification

The convention chain must resolve end-to-end (this is the same parity check the safe-env PRD rehearses in the sandbox). All checks below are [AGENT] read-only:

# [AGENT] Gitea repo exists under the chosen org
git ls-remote https://gitea.arcodange.lab/<org>/<app> &>/dev/null && echo "repo OK"

# [AGENT] Postgres DB + owner role exist (run from a host with psql access to the engine)
psql -h 192.168.1.202 -U credentials_editor -tAc \
  "SELECT datname FROM pg_database WHERE datname='<app>';"
psql -h 192.168.1.202 -U credentials_editor -tAc \
  "SELECT rolname FROM pg_roles WHERE rolname='<app>_role';"

# [AGENT] Vault: dynamic role, policies, and CI JWT role exist
vault read   postgres/roles/<app>
vault policy read <app>
vault policy read <app>-ops
vault read   auth/gitea_jwt/role/gitea_cicd_<app>

# [AGENT] ArgoCD Application is Synced + Healthy
kubectl --context <ctx> -n argocd get application <app> \
  -o jsonpath='{.status.sync.status}/{.status.health.status}'
# expected: Synced/Healthy

# [AGENT] VSO created the db-credentials Secret + pod is Running + ingress resolves
kubectl --context <ctx> -n <app> get secret db-credentials
kubectl --context <ctx> -n <app> get pods
curl -fsS https://<app>.arcodange.lab/healthz   # or the app's real health path

Expected: repo present; PG <app> DB + <app>_role exist; Vault postgres/creds/<app> + policies <app>/<app>-ops + gitea_cicd_<app> exist; ArgoCD Application Synced/Healthy; the db-credentials Secret was created by VSO; the pod is Running; the ingress resolves.

Rollback

Revert the per-repo PRs in reverse order: app → tools → factory. Tag each undo just like the procedure.

  1. [HUMAN] App repo: revert Phase 8 → 7 → 6 PRs. Reverting the Phase 6 iac/ removes postgres/creds/<app> and the k8s role on the next CI run; setting vault.enabled=false returns the chart to degraded mode.
  2. [HUMAN] Tools repo: remove the { name = "<app>" } entry; tools CI prunes gitea_cicd_<app> + policies.
  3. [HUMAN] Factory repo: remove the <app> entry from argocd/values.yaml — ArgoCD prunes the Application (and its namespace) — and remove "<app>" from postgres/iac/terraform.tfvars to drop the DB + role.
  4. [HUMAN] For a full cluster-level recovery (power-cut, lost unseal key) consult CLUSTER_RECOVERY.md.

Warning

Removing the Postgres entry drops the database <app> and its data. Back up first if the app already holds state.

References