From 7bf83e75ed75ccc2241d8945df6e27b8061bda57 Mon Sep 17 00:00:00 2001 From: Gabriel Radureau Date: Tue, 23 Jun 2026 22:12:11 +0200 Subject: [PATCH] docs(vibe): add erp/ guidebook (Dolibarr deployment + backup/recovery + ops) Dedicated tree-docs guidebook under vibe/guidebooks/erp/ for the lab's most data-critical app, cross-linked from the applications hub (bidirectional): - README.md : Dolibarr 22.0.4 on Postgres; data-criticality; overview diagram; the Vault-unseal-before-scale recovery ordering (CAUTION). - deployment.md : upstream image + custom entrypoint (MySQL->psql), the 50Gi Longhorn RWX documents PVC, Vault CRDs + the shared app_roles iac, init scripts (conf.php creds, table-ownership), ingress, CI. - backup-and-recovery.md: the Ansible CronJob pg_dump (daily 04:00, 15-day retention) + restore Job (scale-0 -> restore -> scale-1); the cluster recovery ordering (Longhorn -> Vault unseal -> erp scale-up). - operations.md : the read-only bin/arcodange CLI, static/company.json, Deno+Playwright tests, day-2 ops. erp code via full gitea URLs; CLUSTER_RECOVERY.md by name; 2 mermaid diagrams MCP-validated; zero dead links. Co-Authored-By: Claude Opus 4.8 --- vibe/guidebooks/README.md | 1 + vibe/guidebooks/applications/README.md | 4 +- vibe/guidebooks/erp/README.md | 87 +++++++++ vibe/guidebooks/erp/backup-and-recovery.md | 207 +++++++++++++++++++++ vibe/guidebooks/erp/deployment.md | 189 +++++++++++++++++++ vibe/guidebooks/erp/operations.md | 173 +++++++++++++++++ 6 files changed, 659 insertions(+), 2 deletions(-) create mode 100644 vibe/guidebooks/erp/README.md create mode 100644 vibe/guidebooks/erp/backup-and-recovery.md create mode 100644 vibe/guidebooks/erp/deployment.md create mode 100644 vibe/guidebooks/erp/operations.md diff --git a/vibe/guidebooks/README.md b/vibe/guidebooks/README.md index 55e1ff2..0df40d0 100644 --- a/vibe/guidebooks/README.md +++ b/vibe/guidebooks/README.md @@ -39,6 +39,7 @@ flowchart LR | [Tools](tools/README.md) | Deep dive into the lab platform services in the `tools` namespace (Vault+VSO, Prometheus, Grafana, CrowdSec, poolers, Redis, Plausible, ClickHouse) | ✅ Active | | [CMS](cms/README.md) | Deep dive into the public Nuxt site arcodange.fr + its Cloudflare DNS/tunnel/Turnstile and Zoho email IaC | ✅ Active | | [Applications](applications/README.md) | The deployed apps and the common pattern they share — webapp (Go + Postgres) and url-shortener (Rust + SQLite); erp has its own guidebook | ✅ Active | +| [ERP](erp/README.md) | The lab’s Dolibarr ERP — its deployment on Postgres, its document storage + backup/restore, and the read-only ops CLI (the most data-critical app) | ✅ Active | ## Rules to contribute diff --git a/vibe/guidebooks/applications/README.md b/vibe/guidebooks/applications/README.md index 29c4ffc..96913dd 100644 --- a/vibe/guidebooks/applications/README.md +++ b/vibe/guidebooks/applications/README.md @@ -10,7 +10,7 @@ This guidebook maps the **deployed applications** — the workloads ArgoCD runs in their own `` namespace — and, more importantly, the **single repeatable pattern** every one of them follows. Once you know the pattern, every app reads as a variation on the same skeleton: a Gitea repo whose contents (Dockerfile + Helm chart + optional Vault IaC + CI) and whose `` name fully determine how it builds, deploys, gets its secrets, and is reached from the network. -Two apps are presented in depth as the canonical archetypes: [webapp](webapp.md) (Go + external Postgres) and [url-shortener](url-shortener.md) (Rust + embedded SQLite). Other apps in the cluster — `erp`, `dance-lessons-coach`, `telegram-gateway`, `plausible` — instantiate the same pattern; `erp` has its own guidebook (forthcoming) and is not linked here yet. +Two apps are presented in depth as the canonical archetypes: [webapp](webapp.md) (Go + external Postgres) and [url-shortener](url-shortener.md) (Rust + embedded SQLite). Other apps in the cluster — `erp`, `dance-lessons-coach`, `telegram-gateway`, `plausible` — instantiate the same pattern; `erp` has its own [ERP guidebook](../erp/README.md) because it carries far more moving parts than the two archetypes. ## The common app pattern @@ -102,7 +102,7 @@ flowchart LR | [webapp](webapp.md) | Canonical **Go + external Postgres** exemplar — `iac/` + Vault dynamic creds, scalable stateless pods | ✅ Active | | [url-shortener](url-shortener.md) | **Rust + embedded SQLite** counterpart — single replica on a Longhorn RWO PVC, no Vault | ✅ Active | -`erp` and the other apps (`dance-lessons-coach`, `telegram-gateway`, `plausible`) follow the same pattern; `erp` will be cross-linked here once its dedicated guidebook ships. +`erp` and the other apps (`dance-lessons-coach`, `telegram-gateway`, `plausible`) follow the same pattern; `erp` is documented in depth in its own [ERP guidebook](../erp/README.md). ## Maintenance rule diff --git a/vibe/guidebooks/erp/README.md b/vibe/guidebooks/erp/README.md new file mode 100644 index 0000000..ba8e1a8 --- /dev/null +++ b/vibe/guidebooks/erp/README.md @@ -0,0 +1,87 @@ +[vibe](../../README.md) > [Guidebooks](../README.md) > **ERP** + +# ERP + +> **Status:** ✅ Active +> **Last Updated:** 2026-06-23 +> **Upstream:** [Applications hub](../applications/README.md) · [01 · factory](../lab-ecosystem/01-factory.md) +> **Downstream:** [Deployment](deployment.md) · [Backup & recovery](backup-and-recovery.md) · [Operations](operations.md) +> **Related:** [tools secrets-and-vso](../tools/secrets-and-vso.md) · [factory postgres-iac](../factory-provisioning/opentofu/postgres-iac.md) · [storage concept](../lab-ecosystem/storage-and-recovery.md) · [factory recover playbooks](../factory-provisioning/ansible/06-recover.md) · [safe-prod-like-environment ADR](../../ADR/0001-safe-prod-like-environment.md) + +This guidebook maps **erp** — the lab's [Dolibarr **22.0.4**](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/Chart.yaml) accounting/business ERP and its **single most data-critical application**. It is a PHP/Apache workload built from the upstream `dolibarr/dolibarr` image, served internally at `erp.arcodange.lab` (Traefik `websecure` + `localIp@file` + a `letsencrypt`-resolver cert). Everything a reader needs to deploy it, keep its data safe, and operate it lives in the three child pages below; this page is the orientation map. + +## What makes erp special + +`erp` is the complex sibling of the [webapp / url-shortener archetypes](../applications/README.md). It carries the **same four-ingredient app pattern** (Dockerfile-less reuse of an upstream image, a `chart/`, an `iac/`, `.gitea/workflows`) but layers several things on top that the archetypes do not: + +| Trait | erp specifics | Why it matters | +|---|---|---| +| **Upstream image** | `dolibarr/dolibarr:22.0.4` — not a repo-built image | No custom Dockerfile; the chart adapts the upstream container at runtime | +| **Postgres, not MySQL** | Dolibarr classically assumes MySQL; erp runs on **PostgreSQL** (`DOLI_DB_TYPE: pgsql`) | A [custom entrypoint](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/scripts/custom_entrypoint.sh) rewrites the upstream `docker-run.sh` `mysql` invocation into `psql` before launch | +| **DB path** | pod → `pgbouncer.tools:5432` → the `erp` Postgres database on `pi2` | Shares the [tools pgbouncer pooler](../tools/secrets-and-vso.md) like the webapp archetype | +| **Vault wiring** | **dynamic** rotating DB creds (`postgres/creds/erp`) + **static** KV config (`kvv2` `erp/config`) via the shared [`app_roles` module](../tools/secrets-and-vso.md) | The pod cannot start without VSO-injected `DOLI_DB_USER` / `DOLI_DB_PASSWORD` | +| **Document persistence** | a **50Gi Longhorn RWX PVC** (`storageClassName: longhorn`, `accessModes: ReadWriteMany`, `helm.sh/resource-policy: keep`) mounting `/var/www/documents`, `/var/www/html/custom`, and `/var/backups` | Uploaded invoices/PDFs/attachments are real business records — losing them is the worst case | +| **Backup + ops** | its own [backup/restore subsystem](backup-and-recovery.md) plus a **read-only ops CLI** (`bin/arcodange`) | Data-criticality demands both an escape hatch for restores and a safe way to inspect live state | + +## Overview — how erp is wired + +```mermaid +%%{init: {'theme': 'base'}}%% +flowchart LR + classDef src fill:#2563eb,stroke:#1e40af,color:#fff + classDef proc fill:#059669,stroke:#047857,color:#fff + classDef store fill:#7c3aed,stroke:#6d28d9,color:#fff + classDef net fill:#b45309,stroke:#92400e,color:#fff + + CI["factory / erp CI
tofu apply (iac/)"]:::src + VAULT["Vault
postgres/creds/erp (dynamic)
kvv2 erp/config (static)"]:::store + ARGO["ArgoCD
syncs chart/ (ns erp)"]:::proc + POD["Dolibarr pod
dolibarr/dolibarr:22.0.4
custom entrypoint → psql"]:::proc + VSO["VSO
VaultAuth + VaultDynamicSecret
+ VaultStaticSecret"]:::proc + PGB["pgbouncer.tools:5432"]:::net + PG["Postgres erp db
(pi2)"]:::store + PVC["50Gi Longhorn RWX PVC
/var/www/documents"]:::store + BK["backup CronJob / runner
pg_dump → documents/admin/backup"]:::proc + + CI --> VAULT + ARGO --> POD + VAULT -. "creds + config" .-> VSO + VSO -- "vso-db-credentials + secretkv" --> POD + POD --> PGB --> PG + PVC -- "mounts /var/www/documents" --- POD + BK -- "dumps DB + writes to" --> PVC +``` + +1. **factory / erp CI** runs `tofu apply` over `iac/` to declare erp's Vault objects — a Postgres dynamic-secret role and a Kubernetes auth role — through the shared [`app_roles` module](../tools/secrets-and-vso.md), and seeds the static `kvv2` `erp/config` KV (admin login, instance UUID). +2. **ArgoCD** (factory's [app-of-apps](../lab-ecosystem/01-factory.md)) syncs the `chart/` into the `erp` namespace. +3. The **Dolibarr pod** comes up from `dolibarr/dolibarr:22.0.4`; its [custom entrypoint](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/scripts/custom_entrypoint.sh) rewrites the upstream `docker-run.sh` so SQL runs through `psql` instead of `mysql`. +4. **VSO** authenticates to **Vault** (the `auth` `VaultAuth` CRD), materialising `vso-db-credentials` (dynamic, rotating DB user/password from `postgres/creds/erp`) and `secretkv` (static config from `kvv2` `erp/config`); both are injected into the pod, and a credential rotation triggers a rollout restart. +5. The pod connects to the **`erp` Postgres database** through the [tools `pgbouncer.tools:5432` pooler](../tools/secrets-and-vso.md). +6. A **50Gi Longhorn RWX PVC** mounts `/var/www/documents` (plus `/var/www/html/custom` and `/var/backups`), holding every uploaded document and generated PDF. +7. The **backup subsystem** dumps the `erp` database with a version-matched `pg_dump` and lands the archive under `documents/admin/backup` on that same PVC — see [Backup & recovery](backup-and-recovery.md). + +> [!CAUTION] +> **Recovery ordering: Vault MUST be unsealed before erp is scaled up.** The Dolibarr pod has no usable DB credentials of its own — it depends entirely on VSO materialising `vso-db-credentials` from `postgres/creds/erp`. If erp is scaled up while Vault is still sealed, the pod crash-loops with no database access. During a cluster rebuild, unseal Vault first, confirm VSO has reconciled the erp secrets, and only then scale erp. The full sequence (cluster bring-up → Vault unseal → storage → apps) is covered by [Backup & recovery](backup-and-recovery.md), the [storage concept](../lab-ecosystem/storage-and-recovery.md), the [factory recover playbooks](../factory-provisioning/ansible/06-recover.md), and the cluster-wide CLUSTER_RECOVERY.md runbook. + +## Index + +| Page | What it covers | Status | +|---|---|---| +| [Deployment](deployment.md) | The chart, the upstream image + custom entrypoint, the Postgres-over-pgbouncer wiring, the Vault CRDs (dynamic creds + static config), and the ingress | ✅ Active | +| [Backup & recovery](backup-and-recovery.md) | The document PVC, the `pg_dump`-based backup subsystem, restore procedure, and where erp sits in cluster-recovery ordering | ✅ Active | +| [Operations](operations.md) | The read-only `bin/arcodange` ops CLI and day-to-day operational tasks (table-ownership fix-ups, liveness checks, audits) | ✅ Active | + +## Maintenance rule + +> [!IMPORTANT] +> **When the erp repo changes shape, these pages change in the same PR.** If you alter the chart structure, the custom entrypoint, the Vault wiring, the document PVC, the backup subsystem, or the ops CLI, update this hub and the relevant child page in the same change. A reference map that drifts from the real `chart/`, `iac/`, and `backup/` sends agents confidently down dead paths — and for the lab's most data-critical app that risk is highest here. + +## Cross-references + +- [Applications hub](../applications/README.md) — the common four-ingredient app pattern; erp is its complex sibling, beside the webapp and url-shortener archetypes. +- [01 · factory](../lab-ecosystem/01-factory.md) — the ArgoCD app-of-apps that emits erp's `Application` CRD. +- [tools secrets-and-vso](../tools/secrets-and-vso.md) — the `app_roles` module + VSO runtime that delivers erp's dynamic DB creds and static config, and the pgbouncer pooler the pod connects through. +- [factory postgres-iac](../factory-provisioning/opentofu/postgres-iac.md) — the per-app `erp` PostgreSQL database + role erp runs on. +- [storage concept](../lab-ecosystem/storage-and-recovery.md) — how the 50Gi Longhorn RWX document PVC is provisioned and recovered. +- [factory recover playbooks](../factory-provisioning/ansible/06-recover.md) — the Ansible recovery steps that must precede scaling erp back up. +- [safe-prod-like-environment ADR](../../ADR/0001-safe-prod-like-environment.md) — why the lab keeps erp deployed prod-like and the data-criticality trade-offs behind it. diff --git a/vibe/guidebooks/erp/backup-and-recovery.md b/vibe/guidebooks/erp/backup-and-recovery.md new file mode 100644 index 0000000..e4528f5 --- /dev/null +++ b/vibe/guidebooks/erp/backup-and-recovery.md @@ -0,0 +1,207 @@ +[vibe](../../README.md) > [Guidebooks](../README.md) > [ERP](README.md) > **Backup & recovery** + +# Backup & recovery + +> **Status:** ✅ Active +> **Last Updated:** 2026-06-23 +> **Upstream:** [ERP](README.md) · [Deployment](deployment.md) +> **Downstream:** [Operations](operations.md) +> **Related:** [storage concept](../lab-ecosystem/storage-and-recovery.md) · [factory recover playbooks](../factory-provisioning/ansible/06-recover.md) · [tools secrets-and-vso](../tools/secrets-and-vso.md) · [factory postgres-iac](../factory-provisioning/opentofu/postgres-iac.md) + +`erp` is the lab's **single most data-critical application**, so it carries its own backup/restore subsystem layered on top of the cluster's storage and secrets machinery. Two independent data stores have to survive an incident: the **`erp` PostgreSQL database** (captured by a `pg_dump`) and the **uploaded documents** on the Longhorn PVC (captured by Longhorn snapshots/backups, *not* the `pg_dump`). This page covers both, the daily backup CronJob, the restore Job, and the load-bearing recovery ordering that keeps erp from crash-looping during a cluster rebuild. + +## Backup mechanism + +The recurring backup is an Ansible-deployed Kubernetes **CronJob** named `dolibarr-backup` in namespace `erp`, declared by [`ansible/arcodange/erp/playbooks/recurrentBackup.yml`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/ansible/arcodange/erp/playbooks/recurrentBackup.yml). Each scheduled tick spawns a one-shot `postgres:16.3` Job that takes a **logical** dump of the `erp` database, gzips it, and lands the archive on the same Longhorn PVC that holds erp's documents. + +The pipeline inside each run: + +1. **Detect version** — `psql ... -c "SELECT value FROM llx_const WHERE name='MAIN_VERSION_LAST_UPGRADE';"` reads the live Dolibarr version straight from the database, so the archive name records exactly which schema it came from. +2. **Dump + compress** — `pg_dump -d erp --no-tablespaces --inserts | gzip > `. The `--inserts` flag emits row-by-row `INSERT` statements (portable, version-tolerant restores) and `--no-tablespaces` strips host-specific tablespace clauses. +3. **Write to PVC** — the archive lands at `/documents/admin/backup/pg_dump_erp__.sql.gz`, where the container mounts the `erp` PVC with `subPath: documents/admin/backup`. +4. **Prune** — `find /documents/admin/backup -name "pg_dump_erp_*.sql.gz" -type f -mtime +15 -delete` removes anything older than 15 days. + +DB credentials are supplied by the **VSO-materialised `vso-db-credentials` secret** (`envFrom` + `PGPASSWORD` from its `password` key), the same dynamic `postgres/creds/erp` secret the pod uses — see [tools secrets-and-vso](../tools/secrets-and-vso.md). The Job runs `backoffLimit: 0` with `restartPolicy: Never`, so a failed run leaves an inspectable terminated pod rather than retrying blindly. + +### Schedule, retention & artifacts + +| Property | Value | Source | +|---|---|---| +| Resource | CronJob `dolibarr-backup` (ns `erp`) | [recurrentBackup.yml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/ansible/arcodange/erp/playbooks/recurrentBackup.yml) | +| Schedule | `0 4 * * *` (04:00 daily) | `spec.schedule` | +| Successful job history | `successfulJobsHistoryLimit: 3` | `spec` | +| Failed job history | `failedJobsHistoryLimit: 3` | `spec` | +| Retention | 15 days (`find -mtime +15 -delete`) | dump script | +| Dump image | `postgres:16.3` | `jobTemplate` container | +| Dump command | `pg_dump --no-tablespaces --inserts` (logical) | dump script | +| Compression | `gzip` (CronJob) / `tar -czf` (ad-hoc) | dump scripts | +| Archive path | `/documents/admin/backup/pg_dump_erp__.sql.gz` | mount + dump script | +| Mount | PVC `erp`, `subPath: documents/admin/backup` | `volumeMounts` | +| Failure policy | `backoffLimit: 0`, `restartPolicy: Never` | `jobTemplate` | + +### Ad-hoc & manual alternatives + +Two escape hatches exist for an on-demand dump outside the 04:00 schedule: + +| Tool | What it does | When to reach for it | +|---|---|---| +| [`ansible/.../playbooks/backup.yml`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/ansible/arcodange/erp/playbooks/backup.yml) | One-shot Ansible **Job** `dolibarr-backup` (`postgres:16.3`); fetches the ERP version by scraping `https://erp.arcodange.lab/`, dumps with the same `--no-tablespaces --inserts` flags, `tar -czf` into the PVC, and waits for completion | A single immediate dump driven from a control host that can reach the cluster API | +| [`backup/create_backup.sh`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/backup/create_backup.sh) | Pure `kubectl` shell: `kubectl run pg-dump-temp` (`postgres:16.3`) to `pg_dump -c` locally, then `kubectl cp` the archive into the running erp pod's `/var/www/documents/admin/backup/` | A laptop with `kubectl` access but no Ansible setup | + +> [!WARNING] +> **`pg_dump` and the server must match major versions.** The lab's Postgres is **16.3**, so every dump/restore container pins `postgres:16.3`. Dolibarr's built-in *Tools → Database backup* page (`/admin/tools/dolibarr_export.php`) historically shells out to the image's bundled `pg_dump` (e.g. 11.x), which aborts with `server version mismatch`. Use the CronJob, the Ansible playbooks, or `create_backup.sh` — never the in-app export against a newer server. + +## What is — and is NOT — in the dump + +The `pg_dump` captures **only the relational database**. Everything users *upload* lives on the Longhorn PVC and is protected by a completely separate mechanism. Conflating the two is the classic way to lose business records. + +| Data | Where it lives | Protected by | +|---|---|---| +| Invoices, third parties, accounting rows, config rows (`llx_*` tables) | `erp` Postgres DB | `pg_dump` archives (this page) | +| Uploaded documents, generated PDFs, attachments | `/var/www/documents` on the Longhorn RWX PVC | **Longhorn** snapshots / backups | +| Custom modules / overrides | `/var/www/html/custom` on the same PVC | **Longhorn** snapshots / backups | + +> [!IMPORTANT] +> A `pg_dump` alone does **not** make erp recoverable. A full recovery needs *both* the latest `pg_dump_erp_*.sql.gz` **and** the Longhorn-restored document volume. The backup archive itself sits on that same PVC (`/documents/admin/backup`), so it rides along with the Longhorn snapshot — but treat the database and the documents as two artifacts that must be restored together. See the [storage concept](../lab-ecosystem/storage-and-recovery.md) for how the Longhorn volume is snapshotted and recovered. + +## Restore + +The restore is the Ansible-driven Job `dolibarr-restore` from [`ansible/.../playbooks/restore.yml`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/ansible/arcodange/erp/playbooks/restore.yml) (`postgres:16.3`). It **auto-discovers the most recent** `pg_dump_erp_*.sql.gz` via `ls -t ... | head -n1`, or you can pin a specific archive with `-e backup_file=...`. It runs `backoffLimit: 0` / `restartPolicy: Never` so a failed restore leaves a terminated pod you can inspect. **Restoring into a live database corrupts state** — scale erp to zero first. + +Ordered procedure: + +1. **[AGENT]** Confirm the cluster is healthy and inspect available archives (read-only). +2. **[HUMAN]** Scale the `erp` Deployment to **0** to stop all writes. +3. **[HUMAN]** Run the restore Job (latest archive, or a pinned `backup_file`); it `tar -xzf`s the archive and `psql -f`s it into the `erp` DB. +4. **[AGENT]** Watch the Job to completion and read its logs. +5. **[HUMAN]** Scale the `erp` Deployment back to **1**. +6. **[AGENT]** Validate erp is serving and the data is present. + +```bash +# [AGENT] read-only: cluster health + list backup archives on the PVC +kubectl get deploy,pods -n erp +kubectl exec -n erp deploy/erp -- ls -t /var/www/documents/admin/backup/pg_dump_erp_*.sql.gz +``` + +```bash +# [HUMAN] prod-mutating: stop writes before restoring +kubectl scale deploy/erp -n erp --replicas=0 +kubectl rollout status deploy/erp -n erp --watch=false # expect 0 replicas +``` + +```bash +# [HUMAN] prod-mutating: run the restore Job +# default = newest pg_dump_erp_*.sql.gz auto-discovered on the PVC +ansible-playbook ansible/arcodange/erp/playbooks/restore.yml +# or pin an explicit archive: +ansible-playbook ansible/arcodange/erp/playbooks/restore.yml \ + -e backup_file=/documents/admin/backup/pg_dump_erp_22.0.4_2606231819.sql.gz +``` + +```bash +# [AGENT] read-only: follow the restore Job and its logs +kubectl get job/dolibarr-restore -n erp -o wide +kubectl logs -n erp job/dolibarr-restore +``` + +```bash +# [HUMAN] prod-mutating: bring erp back up +kubectl scale deploy/erp -n erp --replicas=1 +kubectl rollout status deploy/erp -n erp +``` + +```bash +# [AGENT] read-only: validate erp is serving after restore +kubectl get pods -n erp +kubectl exec -n erp deploy/erp -- curl -sf -o /dev/null -w '%{http_code}\n' http://localhost/ +``` + +> [!WARNING] +> **Always scale erp to 0 before restoring.** The restore loads SQL straight into the live `erp` database; concurrent writes from a running Dolibarr pod produce a half-restored, inconsistent state. Scaling back to 1 only after the Job succeeds is part of the procedure, not an optional flourish. + +## Recovery ordering (cluster rebuild) + +> [!CAUTION] +> **Vault MUST be unsealed before erp is scaled up.** The Dolibarr pod has no DB credentials of its own — it depends entirely on VSO materialising `vso-db-credentials` from `postgres/creds/erp` (`DOLI_DB_USER` / `DOLI_DB_PASSWORD`). If erp is scaled up while Vault is still sealed, VSO cannot reconcile the secret and the pod crash-loops with no database access. During a cluster rebuild the order is fixed: +> +> 1. **Recover Longhorn volumes** — bring the document PVC (and the `documents/admin/backup` archives riding on it) back online. +> 2. **Unseal Vault** — so VSO can issue erp's dynamic DB credentials and static config. +> 3. **Scale erp to 1** — only now does the pod come up with usable creds. +> 4. **(Optional) restore data** — if the DB needs rolling back to a `pg_dump`, scale to 0, run the restore Job, scale back to 1 (see [Restore](#restore) above). +> +> This sequence is the storage→secrets→apps backbone described in the [storage concept](../lab-ecosystem/storage-and-recovery.md) and executed by the [factory recover playbooks](../factory-provisioning/ansible/06-recover.md); the cluster-wide ordering lives in the CLUSTER_RECOVERY.md runbook. + +## The ownership fix + +After activating a new Dolibarr module — or whenever the dynamic DB user rotates and creates tables under a fresh role — `public` schema tables can end up owned by a credential that no longer exists, breaking subsequent migrations and dumps. Two SQL helpers reassign ownership back to the stable **`erp_role`**: + +| Script | Mechanism | Use | +|---|---|---| +| [`backup/erp_role_as_table_owner.sql`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/backup/erp_role_as_table_owner.sql) | Loops every `public` table and `ALTER TABLE ... OWNER TO erp_role` | Force-set ownership table-by-table | +| [`chart/scripts/update_ownership.sql`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/scripts/update_ownership.sql) | Detects the current schema owner and `REASSIGN OWNED BY TO erp_role` only when it differs | The idempotent, chart-shipped fix | + +The chart wires `update_ownership.sql` into a `pg-fix-table-ownership` CronJob; trigger it on demand after a module activation: + +```bash +# [HUMAN] prod-mutating: reassign public-schema table ownership to erp_role +kubectl create job \ + --from=cronjob/pg-fix-table-ownership \ + pg-fix-table-ownership-manual-trigger-$(date +%Y%m%d%H%M%S) \ + -n kube-system +``` + +Run this **before** a backup if you suspect ownership drift, so the dump records the correct owner. More on day-to-day fix-ups and audits in [Operations](operations.md). + +## Flow + +```mermaid +%%{init: {'theme': 'base'}}%% +flowchart LR + classDef sched fill:#2563eb,stroke:#1e40af,color:#fff + classDef proc fill:#059669,stroke:#047857,color:#fff + classDef store fill:#7c3aed,stroke:#6d28d9,color:#fff + classDef db fill:#b45309,stroke:#92400e,color:#fff + + VSO["VSO secret
vso-db-credentials
(postgres/creds/erp)"]:::store + CRON["CronJob dolibarr-backup
schedule 0 4 * * *"]:::sched + DUMPJOB["pg_dump Job
postgres:16.3"]:::proc + GZIP["gzip stream
--inserts --no-tablespaces"]:::proc + PVC["Longhorn RWX PVC
/documents/admin/backup
pg_dump_erp_*.sql.gz"]:::store + RESTOREJOB["Restore Job dolibarr-restore
postgres:16.3"]:::proc + PSQL["psql -f dump.sql"]:::proc + DB["erp Postgres DB
via pgbouncer.tools"]:::db + + CRON -- "spawns" --> DUMPJOB + DUMPJOB -- "pg_dump" --> GZIP + GZIP -- "writes archive" --> PVC + PVC -- "ls -t latest .sql.gz" --> RESTOREJOB + RESTOREJOB -- "tar -xzf then" --> PSQL + PSQL -- "loads into" --> DB + DUMPJOB -- "dumps from" --> DB + VSO -. "DB creds" .-> DUMPJOB + VSO -. "DB creds" .-> RESTOREJOB +``` + +1. The **CronJob `dolibarr-backup`** fires at `0 4 * * *` and **spawns** a `pg_dump` Job (`postgres:16.3`). +2. The Job **dumps** the live `erp` database (logical, `--inserts --no-tablespaces`) — reading credentials from the **VSO `vso-db-credentials`** secret. +3. The dump streams through **gzip** and the resulting `pg_dump_erp__.sql.gz` is **written** to `/documents/admin/backup` on the **Longhorn RWX PVC**; archives older than 15 days are pruned. +4. On restore, the **`dolibarr-restore` Job** picks the **newest** `.sql.gz` (`ls -t | head -n1`, or a pinned `backup_file`) from the PVC — also using the **VSO** credentials. +5. The restore Job **`tar -xzf`s** the archive and **`psql -f`s** it back **into** the `erp` database (with erp scaled to 0 first). + +## Gotchas + +> [!WARNING] +> - **15-day retention only.** The CronJob deletes any `pg_dump_erp_*.sql.gz` older than 15 days. If you need long-term or compliance copies, pull archives **off-cluster** before they age out — nothing here keeps a month-old dump. +> - **Version match is mandatory.** `pg_dump`/`psql` major version must equal the server's (16.x). Every Job pins `postgres:16.3`; the in-app Dolibarr export against the newer server aborts with `server version mismatch`. +> - **Scale to 0 before restore.** Restoring into a running erp produces an inconsistent database; scale the Deployment to 0, restore, then back to 1. +> - **Vault unseal precedes scale-up.** erp's DB creds come from VSO; a sealed Vault means a crash-looping pod. Follow the [recovery ordering](#recovery-ordering-cluster-rebuild) on any rebuild. +> - **The admin/Postgres password lives in OpenTofu state.** The per-app database and role are declared in IaC, so the authoritative credential material is held in the **TF state** — treat that state as a secret and recover it alongside Vault. See [factory postgres-iac](../factory-provisioning/opentofu/postgres-iac.md). + +## Cross-references + +- [Deployment](deployment.md) — the chart, the document PVC, and the Vault CRDs (dynamic creds + static config) that this subsystem depends on. +- [Operations](operations.md) — day-to-day operational tasks including the table-ownership fix-ups and liveness checks. +- [storage concept](../lab-ecosystem/storage-and-recovery.md) — how the Longhorn document PVC (and its riding backup archives) is snapshotted and recovered. +- [factory recover playbooks](../factory-provisioning/ansible/06-recover.md) — the Ansible recovery steps that must run before erp is scaled back up. +- [tools secrets-and-vso](../tools/secrets-and-vso.md) — the VSO runtime that materialises `vso-db-credentials`, feeding both the backup and restore Jobs. +- [factory postgres-iac](../factory-provisioning/opentofu/postgres-iac.md) — the per-app `erp` PostgreSQL database + role, and the TF state that holds its admin password. diff --git a/vibe/guidebooks/erp/deployment.md b/vibe/guidebooks/erp/deployment.md new file mode 100644 index 0000000..4c7ad07 --- /dev/null +++ b/vibe/guidebooks/erp/deployment.md @@ -0,0 +1,189 @@ +[vibe](../../README.md) > [Guidebooks](../README.md) > [ERP](README.md) > **Deployment** + +# Deployment + +> **Status:** ✅ Active +> **Last Updated:** 2026-06-23 +> **Upstream:** [ERP hub](README.md) · [Applications hub](../applications/README.md) · [01 · factory](../lab-ecosystem/01-factory.md) +> **Downstream:** [Backup & recovery](backup-and-recovery.md) · [Operations](operations.md) +> **Related:** [tools secrets-and-vso](../tools/secrets-and-vso.md) · [factory postgres-iac](../factory-provisioning/opentofu/postgres-iac.md) · [factory ci-apply-flow](../factory-provisioning/opentofu/ci-apply-flow.md) · [naming-conventions](../lab-ecosystem/naming-conventions.md) · [webapp](../applications/webapp.md) + +This page maps how **erp** is deployed: the chart that wraps the **upstream Dolibarr image**, the runtime trick that makes a MySQL-assuming application speak **PostgreSQL**, the **50Gi document PVC** that holds every business record, the Vault CRDs that feed it credentials, and the OpenTofu + CI that declare its Vault objects. It is the most data-critical app in the lab; the `iac/` runs through the same `tofu apply` pipeline as every other app — see [factory ci-apply-flow](../factory-provisioning/opentofu/ci-apply-flow.md). + +## 1 · App & image + +erp is **Dolibarr** pulled straight from the upstream `dolibarr/dolibarr` Docker Hub image — there is **no repo-built image** and no `Dockerfile`. The chart adapts the upstream container at runtime instead of forking it. + +| Field | Value | Source | +|---|---|---| +| Application | Dolibarr ERP/CRM (PHP / Apache) | [chart/Chart.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/Chart.yaml) | +| Version | **22.0.4** (chart `appVersion: "22.0.4"`) | [chart/Chart.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/Chart.yaml) | +| Image | `dolibarr/dolibarr:22.0.4` — upstream, `pullPolicy: IfNotPresent` | [chart/values.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/values.yaml) | +| Image tag | `image.tag` empty → defaults to chart `appVersion` (`{{ .Values.image.tag \| default .Chart.AppVersion }}`) | [chart/templates/deployment.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/templates/deployment.yaml) | +| Served at | `https://erp.arcodange.lab` (internal only) | [chart/templates/config.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/templates/config.yaml) | +| Container command | `["/bin/bash", "/usr/local/bin/custom-entrypoint.sh", "apache2-foreground"]` | [chart/templates/deployment.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/templates/deployment.yaml) | + +> [!NOTE] +> Because erp consumes an upstream image, there is **no `docker-build-and-push` workflow** in `.gitea/workflows/` — unlike [webapp](../applications/webapp.md), which builds and pushes its own image. erp's only workflow is the OpenTofu/Vault one (see [§8](#8--ci--vaultyaml)). + +## 2 · Postgres, not MySQL + +Dolibarr classically assumes MySQL, but erp runs on **PostgreSQL**. Two pieces make that work, both at startup, both inside the [custom entrypoint](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/scripts/custom_entrypoint.sh) which wraps the upstream `docker-run.sh`: + +1. **MySQL → psql rewrite.** When `DOLI_DB_TYPE == "pgsql"`, the entrypoint `sed`s the upstream `/usr/local/bin/docker-run.sh` in place, replacing its `mysql -u ... < ${file}` SQL invocation with `PGPASSWORD=... psql -U ... -h ... -p ... -d ... < ${file}`. +2. **Apache `ServerName`.** It strips the scheme from `DOLI_URL_ROOT` and sets the Apache `ServerName` in `000-default.conf` (and appends to `apache2.conf`) so the vhost matches `erp.arcodange.lab`. +3. It then `exec`s the original `docker-run.sh "$@"` (i.e. `apache2-foreground`). + +The non-secret database wiring lives in the `erp-config` ConfigMap, injected via `envFrom`: + +| Env var | Value | Meaning | +|---|---|---| +| `DOLI_DB_TYPE` | `pgsql` | Selects PostgreSQL — triggers the entrypoint rewrite | +| `DOLI_DB_HOST` | `pgbouncer.tools` | Connects through the [tools pgbouncer pooler](../tools/secrets-and-vso.md) | +| `DOLI_DB_HOST_PORT` | `5432` | Pooler port | +| `DOLI_DB_NAME` | `erp` | The per-app database (provisioned by [factory postgres-iac](../factory-provisioning/opentofu/postgres-iac.md)) | +| `DOLI_URL_ROOT` | `https://erp.arcodange.lab` | Drives the Apache `ServerName` | +| `DOLI_ENABLE_MODULES` | `Societe,Facture` | Third-parties + invoicing modules | +| `DOLI_COMPANY_NAME` | `Arcodange` | Seeded company name | +| `DOLI_COMPANY_COUNTRYCODE` | `FR` | Seeded country | +| `PHP_INI_DATE_TIMEZONE` | `Europe/Paris` | PHP timezone | +| `DOLI_AUTH` | `dolibarr` | Native Dolibarr auth | +| `DOLI_CRON` | `0` | In-container cron disabled | + +`DOLI_DB_USER` / `DOLI_DB_PASSWORD` are **not** in the ConfigMap — they come from Vault (see [§5](#5--vault-crds)). + +> [!WARNING] +> The psql rewrite is a **textual `sed` against an upstream file**. If a future Dolibarr image changes the exact `mysql ... < ${file}` line in `docker-run.sh`, the substitution silently stops matching and SQL imports fall back to `mysql` (which is absent) — startup SQL then fails. Re-verify the entrypoint pattern whenever the `appVersion` is bumped. + +## 3 · Persistence — the document PVC + +A single PVC named `erp` holds every business record. It is the most important object in the chart. + +| Field | Value | Source | +|---|---|---| +| Name | `erp` (`erp.fullname`) | [chart/templates/pvc.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/templates/pvc.yaml) | +| Access mode | `ReadWriteMany` (RWX) | [chart/templates/pvc.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/templates/pvc.yaml) | +| Size | `50Gi` | [chart/templates/pvc.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/templates/pvc.yaml) | +| StorageClass | `longhorn` | [chart/templates/pvc.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/templates/pvc.yaml) | +| Retention | annotation `helm.sh/resource-policy: keep` — survives a `helm uninstall` | [chart/templates/pvc.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/templates/pvc.yaml) | + +The Deployment mounts the **same PVC** at three paths via `subPath`: + +| Mount path | subPath | Holds | +|---|---|---| +| `/var/www/documents` | `documents` | **Invoices, attachments, generated PDFs — the critical business data** | +| `/var/www/html/custom` | `custom` | Custom/installed Dolibarr modules | +| `/var/backups` | `backups` | In-pod backup landing area | + +> [!CAUTION] +> **Losing this PVC loses all business documents.** `/var/www/documents` contains the only copy of uploaded invoices, attachments, and generated PDFs — these are real accounting records, not regenerable cache. The `helm.sh/resource-policy: keep` annotation protects it from a chart uninstall, but it does **not** protect against a Longhorn-volume loss or a node failure. Treat the PVC as primary data and rely on [Backup & recovery](backup-and-recovery.md) for off-volume copies. + +## 4 · Chart shape + +| Aspect | Value | Source | +|---|---|---| +| `replicaCount` | **1** (single replica) | [chart/values.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/values.yaml) | +| Autoscaling | **disabled** (`autoscaling.enabled: false`; no HPA rendered) | [chart/values.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/values.yaml) | +| Service | `ClusterIP`, port `80` → `targetPort: http` | [chart/templates/service.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/templates/service.yaml) | +| Ingress host | `erp.arcodange.lab`, path `/` (`Prefix`) | [chart/templates/ingress.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/templates/ingress.yaml) | +| Ingress entrypoint | Traefik `websecure` + `router.tls: "true"` | [chart/values.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/values.yaml) | +| TLS cert | `certresolver: letsencrypt`, domain `arcodange.lab` / SAN `erp.arcodange.lab` | [chart/values.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/values.yaml) | +| Middleware | `localIp@file` — **internal only**, no public `.fr` host | [chart/values.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/values.yaml) | +| revisionHistoryLimit | `5` | [chart/templates/deployment.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/templates/deployment.yaml) | + +> [!WARNING] +> **Single replica on an RWX PVC.** `replicaCount: 1` with autoscaling off means erp has **no redundancy** — a node or pod failure is a full outage until rescheduled. A credential rotation or config change triggers a rollout that briefly takes the only pod down. This is deliberate for a stateful, low-traffic internal app, but do not raise the replica count without first confirming Dolibarr tolerates concurrent writes to the shared `documents` volume. + +The Deployment carries `configmap-hash` / `configmap2-hash` / `configmap3-hash` annotations (sha256 of the three ConfigMaps) so a change to config or the init scripts forces a pod roll. + +## 5 · Vault CRDs + +erp cannot start without VSO-injected credentials. Three CRDs (from the chart) wire it to Vault — see [tools secrets-and-vso](../tools/secrets-and-vso.md) for the VSO runtime. + +| CRD | Name | What it does | +|---|---|---| +| `VaultAuth` | `auth` | Kubernetes auth — `mount: kubernetes`, `role: erp`, ServiceAccount `erp`, audience `vault`. Every other CRD references it via `vaultAuthRef: auth`. | +| `VaultStaticSecret` | `vault-kv-app` | `type: kv-v2`, `mount: kvv2`, `path: erp/config` → k8s Secret **`secretkv`**, `refreshAfter: 24h`. Injected via `envFrom` `secretRef`. Holds `DOLI_ADMIN_LOGIN`, `DOLI_ADMIN_PASSWORD`, `DOLI_INSTANCE_UNIQUE_ID`. | +| `VaultDynamicSecret` | `vso-db` | `mount: postgres`, `path: creds/erp` → k8s Secret **`vso-db-credentials`** (rotating DB user/password). `rolloutRestartTargets` the erp Deployment so a rotation rolls the pod. `DOLI_DB_USER` / `DOLI_DB_PASSWORD` are wired into the pod via `secretKeyRef`. | + +Credential delivery in the Deployment: + +- `envFrom: secretRef: secretkv` — static admin config + instance UUID. +- `env: DOLI_DB_USER` / `DOLI_DB_PASSWORD` ← `secretKeyRef` on `vso-db-credentials` (`username` / `password`). + +| Sources | [chart/templates/vaultauth.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/templates/vaultauth.yaml) · [vaultsecret.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/templates/vaultsecret.yaml) · [vaultdynamicsecret.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/templates/vaultdynamicsecret.yaml) | +|---|---| + +## 6 · Init scripts (mounted from ConfigMaps) + +Three scripts ship in `chart/scripts/` and are mounted into the pod via ConfigMaps. The entrypoint runs at container start; the `before-starting.d/` scripts run before Apache. + +| Script | Mounted at | Role | +|---|---|---| +| [custom_entrypoint.sh](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/scripts/custom_entrypoint.sh) | `/usr/local/bin/custom-entrypoint.sh` (ConfigMap `dolibarr-custom-entrypoint-script`) | Wraps `docker-run.sh`: MySQL→psql `sed` rewrite + Apache `ServerName` from `DOLI_URL_ROOT` (see [§2](#2--postgres-not-mysql)) | +| [update_conf_db_credentials.sh](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/scripts/update_conf_db_credentials.sh) | `/var/www/scripts/before-starting.d/` (ConfigMap `dolibarr-before-start-scripts`) | `sed`s the Vault-injected `DOLI_DB_USER` / `DOLI_DB_PASSWORD` into Dolibarr's `conf.php` at startup, so the running app uses the freshly rotated creds | +| [update_ownership.sql](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/scripts/update_ownership.sql) | `/var/www/scripts/before-starting.d/update_table_ownership.sql` | `REASSIGN OWNED BY` the current `public`-schema owner → `erp_role`. Run if you hit read-only-filesystem / permission errors after a credential change | + +> [!CAUTION] +> **The ownership SQL must run after the DB role behind the dynamic creds changes.** Because `postgres/creds/erp` mints a **new** Postgres user on each rotation, freshly created tables can end up owned by a transient user. If the in-pod `update_table_ownership.sql` cannot write its temp file (`Read-only file system`), it is skipped and Dolibarr eventually loses query rights once Vault rotates creds. The fix is to run that SQL by hand against the `erp` database — see [Operations](operations.md). The script reassigns ownership to the stable **`erp_role`** created by [factory postgres-iac](../factory-provisioning/opentofu/postgres-iac.md). + +## 7 · iac/ — Vault objects via the shared module + +erp's `iac/` declares only its Vault footprint; the Postgres database and `erp_role` themselves come from factory ([postgres-iac](../factory-provisioning/opentofu/postgres-iac.md)). + +| Element | Value | Source | +|---|---|---| +| Shared module | `app_roles` from `arcodange-org/tools` (`hashicorp-vault/iac/modules/app_roles`, `ref=main`), `name = "erp"` | [iac/main.tf](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/iac/main.tf) | +| What the module provisions | the `postgres/creds/erp` dynamic role + a Kubernetes auth `role erp` + the `kvv2` path prefix | [tools secrets-and-vso](../tools/secrets-and-vso.md) | +| Admin password | `random_password.admin_initial_password` (length 32) → `DOLI_ADMIN_PASSWORD` | [iac/main.tf](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/iac/main.tf) | +| Instance ID | `random_uuid.dolibarr_id` with `lifecycle { prevent_destroy = true }` → `DOLI_INSTANCE_UNIQUE_ID` (encryption salt + module licensing) | [iac/main.tf](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/iac/main.tf) | +| KV secret | `vault_kv_secret_v2` at `config` (i.e. `erp/config`), data = `DOLI_ADMIN_LOGIN` + `DOLI_ADMIN_PASSWORD` + `DOLI_INSTANCE_UNIQUE_ID` | [iac/main.tf](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/iac/main.tf) | +| Backend | GCS bucket `arcodange-tf`, prefix `erp/main` | [iac/backend.tf](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/iac/backend.tf) | +| Vault provider | `address = https://vault.arcodange.lab`, `auth_login_jwt` `mount = gitea_jwt`, `role = gitea_cicd_erp`, provider `vault` `4.4.0` | [iac/providers.tf](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/iac/providers.tf) | + +The Postgres dynamic role created here `GRANT`s **`erp_role`** — the stable role created by factory ([postgres-iac](../factory-provisioning/opentofu/postgres-iac.md)) — so every rotated DB user inherits the right schema privileges. This is the same KV secret the chart's `VaultStaticSecret` reads back as `secretkv`, closing the loop between `iac/` (writes config) and `chart/` (consumes it). + +> [!WARNING] +> **The OpenTofu state holds the plaintext admin password.** `random_password.admin_initial_password` is stored unencrypted in the GCS state at `arcodange-tf/erp/main`. Anyone with read access to that state bucket can read `DOLI_ADMIN_PASSWORD`. Treat the `erp/main` state prefix as a secret; do not copy it locally unprotected. The `random_uuid` instance ID is similarly in state but is guarded by `prevent_destroy` because losing it breaks decryption of stored data and invalidates purchased modules. + +## 8 · CI — vault.yaml + +| Element | Value | Source | +|---|---|---| +| Workflow | `Hashicorp Vault` (`.gitea/workflows/vault.yaml`) | [.gitea/workflows/vault.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/.gitea/workflows/vault.yaml) | +| Triggers | `workflow_dispatch`, plus `push` / `pull_request` on `iac/*.tf` | [.gitea/workflows/vault.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/.gitea/workflows/vault.yaml) | +| Job 1 | `gitea_vault_auth` — mints a Gitea OIDC JWT for Vault | [.gitea/workflows/vault.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/.gitea/workflows/vault.yaml) | +| Job 2 | `tofu` — `dflook/terraform-apply` over `iac/`, `auto_approve: true`, **OpenTofu `1.8.2`** | [.gitea/workflows/vault.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/.gitea/workflows/vault.yaml) | +| Secrets | `TERRAFORM_SSH_KEY` (SSH key to clone the `app_roles` module from `tools`) + `HOMELAB_CA_CERT` (Vault self-signed CA) + `GOOGLE_BACKEND_CREDENTIALS` (GCS state) | [.gitea/workflows/vault.yaml](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/.gitea/workflows/vault.yaml) | + +This `tofu apply` follows the lab-wide pattern documented in [factory ci-apply-flow](../factory-provisioning/opentofu/ci-apply-flow.md). There is **no application-image build step** — the chart is delivered by ArgoCD ([01 · factory](../lab-ecosystem/01-factory.md)), and the image is upstream. + +## 9 · `` convention mapping + +erp follows the lab's per-app naming convention — see [naming-conventions](../lab-ecosystem/naming-conventions.md). With ` = erp`: + +| `` slot | erp value | +|---|---| +| Repo | `arcodange-org/erp` | +| K8s namespace | `erp` | +| Internal host | `erp.arcodange.lab` | +| ServiceAccount | `erp` | +| Vault Kubernetes auth role | `erp` | +| Vault KV path | `kvv2` `erp/config` → Secret `secretkv` | +| Vault dynamic DB path | `postgres/creds/erp` → Secret `vso-db-credentials` | +| Postgres database | `erp` | +| Postgres stable role | `erp_role` | +| OpenTofu state prefix | GCS `arcodange-tf/erp/main` | +| Gitea CI Vault role | `gitea_cicd_erp` | +| Document PVC | `erp` (50Gi Longhorn RWX) | + +## Cross-references + +- [ERP hub](README.md) — the orientation map for the whole guidebook. +- [Backup & recovery](backup-and-recovery.md) — protecting the 50Gi document PVC and the `erp` database; cluster-recovery ordering (unseal Vault before scaling erp up). +- [Operations](operations.md) — day-to-day operational tasks, including running the table-ownership SQL by hand. +- [tools secrets-and-vso](../tools/secrets-and-vso.md) — the `app_roles` module, the VSO runtime that materialises `secretkv` + `vso-db-credentials`, and the `pgbouncer.tools` pooler. +- [factory postgres-iac](../factory-provisioning/opentofu/postgres-iac.md) — provisions the `erp` database and the stable `erp_role` the dynamic creds inherit. +- [factory ci-apply-flow](../factory-provisioning/opentofu/ci-apply-flow.md) — the shared `tofu apply` CI pattern erp's `vault.yaml` follows. +- [naming-conventions](../lab-ecosystem/naming-conventions.md) — the `` slots filled in [§9](#9--app-convention-mapping). +- [webapp](../applications/webapp.md) — the archetype that *does* build its own image; erp differs by reusing the upstream Dolibarr image. diff --git a/vibe/guidebooks/erp/operations.md b/vibe/guidebooks/erp/operations.md new file mode 100644 index 0000000..9e9b54b --- /dev/null +++ b/vibe/guidebooks/erp/operations.md @@ -0,0 +1,173 @@ +[vibe](../../README.md) > [Guidebooks](../README.md) > [ERP](README.md) > **Operations** + +# ERP Operations + +> **Status:** ✅ Active +> **Last Updated:** 2026-06-23 +> **Upstream:** [ERP hub](README.md) · [Deployment](deployment.md) +> **Downstream:** [Backup & recovery](backup-and-recovery.md) +> **Related:** [Applications hub](../applications/README.md) · [Web app](../applications/webapp.md) · [Secrets & VSO](../tools/secrets-and-vso.md) · [Postgres IaC](../factory-provisioning/opentofu/postgres-iac.md) + +This page covers day-2 operation of the Arcodange Dolibarr ERP: the read-only operations CLI, the static identity assets, the Playwright bootstrap test suite, and the recurring scaling / module-activation / storage chores. For how the workload is deployed onto the cluster, see [Deployment](deployment.md). For backups and disaster recovery, see [Backup & recovery](backup-and-recovery.md). + +--- + +## 1. The read-only ops CLI — `bin/arcodange` + +[`bin/arcodange`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/bin/arcodange) is a Bash dispatcher that gives a human-friendly entry point to safe, **strictly read-only** Dolibarr operations against `erp.arcodange.lab`. Every subcommand `exec`s a script under `.claude/skills//scripts/`; the dispatcher itself only locates the project root (via `git rev-parse --show-toplevel`, falling back to walking up from the script) and routes arguments. + +> [!IMPORTANT] +> The CLI authenticates with credentials read from [`.claude/skills/dolibarr/.env`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/bin/arcodange) — a **gitignored** file expected at **mode `600`**. The underlying API key belongs to the `ai_agent` service account, which has **no write permissions**: the CLI cannot mutate Dolibarr. Corrections always go through the Dolibarr web UI. + +### Command map + +| Command | Subcommand | What it does | +| --- | --- | --- | +| `ping` | — | `GET /status` — liveness probe + reports the running Dolibarr version | +| `whoami` | — | `GET /users/info` — confirms auth is the `ai_agent` service account | +| `invoice` | `list [--since YYYY-MM-DD]` | Table of KissMetrics customer invoices with payment state | +| `invoice` | `audit ` | JSON facts + PDF mandatory-mention audit for one invoice | +| `payments` | `state [--since YYYY-MM-DD]` | Per-invoice TTC vs payments reconciliation | +| `payments` | `timeline [--year\|--since\|--until]` | Payment timeline with cumulative balance | +| `payments` | `by-month [--year\|--all-clients]` | Monthly cash-receipt aggregation | +| `tva` | `summary [--year\|--since\|--until]` | CA3-ready monthly TVA summary (collectée − déductible) | +| `tva` | `collect` / `collect-detail` | TVA collectée by month × rate (CA3 A1/A4/E2) + per-line audit | +| `tva` | `deductible` / `deductible-detail` | TVA déductible by month × rate (CA3 19/20/17+24) + per-line audit | +| `thirdparty` | `audit ` | Country-aware completeness audit for one thirdparty | +| `thirdparty` | `audit-all [--clients-only\|--suppliers-only]` | Audit every visible thirdparty | +| `templates` | `list [--max-id N]` / `inspect ` | Enumerate / health-check recurring invoice templates | +| `bank` | `probe` / `balance` / `match` / `qonto-transactions` / `wise-transactions` / `curl` | Qonto + Wise bank data and Dolibarr reconciliation | +| `email` | `list` / `inspect ` / `curl` | Supplier-invoice ingestion from the Zoho mailbox | +| `snapshot` | `--out FILE` (or `--print-only`) | Bundle the full read-only state into one JSON dump (with `content_hash`) | +| `curl` | `` | Raw read-only call through `dol-curl.sh` (e.g. `arcodange curl /invoices/12`) | +| `help` | `[command]` | Full command tree, or per-command help | + +### Health checks first + +| Check | Command | Expected outcome | +| --- | --- | --- | +| Is Dolibarr up? | `bin/arcodange ping` | HTTP `200` + Dolibarr version string | +| Is auth wired? | `bin/arcodange whoami` | The `ai_agent` user record | +| Full state dump | `bin/arcodange snapshot --out /tmp/erp.json` | One JSON file with a `content_hash` | + +> [!TIP] +> `snapshot` is the fastest way to capture a point-in-time, read-only view of the ERP (invoices, payments, TVA, thirdparties, templates) for offline diffing or for attaching to an incident. It does not touch the database — it only reads. + +The CLI's per-domain credentials beyond Dolibarr (Qonto/Wise for `bank`, Zoho OAuth for `email`) also live in the same gitignored `.env`. The skills' `SKILL.md` files remain the source of business-logic documentation; the CLI is just the ergonomic front door. + +--- + +## 2. Static identity assets — `static/` + +[`static/`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/static) holds the company's legal identity and branding, consumed by the Playwright bootstrap suite when it configures a fresh Dolibarr install. + +| Path | Purpose | +| --- | --- | +| [`static/config/company.json`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/static/config/company.json) | Legal identity used for Dolibarr company setup and display | +| [`static/img/logo512.png`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/static/img/logo512.png) | Company logo (referenced from `company.json` as `$IMG/logo512.png`) | +| [`static/img/loginBackground.jpeg`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/static/img/loginBackground.jpeg) | Login-page background image | + +`company.json` carries two blocks — `info` (postal/contact identity) and `ID` (legal identity): + +| Field | Value | +| --- | --- | +| Raison sociale | Arcodange | +| Forme juridique | SAS (Société par actions simplifiée) | +| Adresse | 73 Boulevard de l'Yerres, 91000 Évry-Courcouronnes, France (FR) | +| Site / email | arcodange.fr · gabrielradureau@arcodange.fr | +| SIREN / SIRET | (legal registration identifiers) | +| NAF / APE | 62.02A | +| N° TVA | (intra-community VAT number) | +| Capital | (share capital) | +| RCS | R.C.S. Évry | +| Mois début d'exercice | Juillet | +| Logo | `$IMG/logo512.png` | + +> [!NOTE] +> The `$IMG` token in `company.json` resolves to `static/img/` via the test harness's `IMG_FOLDER` (see §3). The same image folder feeds the optional login-page background and logo upload during display setup. + +--- + +## 3. Bootstrap test suite — `test/` + +[`test/`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/test) is a **Deno + Playwright** UI suite that drives a real browser through the Dolibarr first-install and admin configuration flows. It is not a unit-test runner — it is the scripted bootstrap that stands up a fresh Dolibarr instance and applies the company identity. + +| File | Role | +| --- | --- | +| [`test/main.ts`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/test/main.ts) | Entry point: launches Chromium (`fr-FR` locale), wires `globalCtx`, runs install + admin setup steps | +| [`test/deno.json`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/test/deno.json) | Imports `npm:playwright` and `jsr:@std/dotenv/load`; `checkJs: true` | +| [`test/.env.example`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/test/.env.example) | Template for `DOLIBARR_ADDRESS`, DB password, admin login, `ROOT_FOLDER` | +| [`test/scripts/admin/`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/test/scripts/admin) | `initialSetup.ts`, `companySetup.ts`, `displaySetup.ts`, `moduleSetup.ts` | + +### Run the suite + +1. Install Deno and the Playwright browsers: + - `curl -fsSL https://deno.land/install.sh | sh` + - `deno run --allow-all npm:playwright install` +2. Populate `test/.env` from the live cluster secrets (DB password from the `vso-db-credentials` secret; admin password from `secretkv`). See [Secrets & VSO](../tools/secrets-and-vso.md) for how those secrets land in the namespace. +3. Run: `deno run --allow-all main.ts`. + +### Lock the installer after install + +> [!CAUTION] +> Dolibarr's `install/` wizard stays reachable until an `install.lock` exists. After a successful first install, **always** create the lock — an unlocked installer is a live takeover risk on a production-like instance. + +The post-install step touches the lock file inside the pod and chowns it to `www-data`: + +```sh +kubectl -n erp exec $(kubectl get pod -n erp -l app.kubernetes.io/name=erp -o name) -- \ + /bin/bash -c '/usr/bin/touch /var/www/html/install.lock && /bin/chown www-data:www-data /var/www/html/install.lock' +``` + +`initialSetup.isUpgradeLocked()` checks for the same locked state before deciding whether to (re)run the installer, so the lock is both a safety gate and the suite's idempotency signal. + +--- + +## 4. Day-2 operations + +### Scaling — manual only + +| Setting | Value | Source | +| --- | --- | --- | +| `replicaCount` | `1` | [`chart/values.yaml`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/values.yaml) | +| `autoscaling.enabled` | `false` | [`chart/values.yaml`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/values.yaml) | + +Dolibarr runs as a **single replica with no HorizontalPodAutoscaler**. The instance is backed by a `ReadWriteOnce` filesystem PVC, so scaling out is not a supported topology — scaling is a deliberate, manual `replicaCount` change in the chart values, applied through the normal [deployment](deployment.md) path. Treat the workload as a single-writer system. + +### After activating a new Dolibarr module — fix table ownership + +Activating a Dolibarr module creates new SQL tables, and Dolibarr's migration runner creates them under whatever role the live VSO-rotated credentials happen to map to. If that role is not `erp_role`, a subsequent credential rotation by Vault can leave the new tables unreadable. + +> [!IMPORTANT] +> After enabling any new module (e.g. via `moduleSetup.configureModule`), run the table-ownership reassignment so the new objects are owned by `erp_role`. The `REASSIGN OWNED BY … TO erp_role` logic lives in [`chart/scripts/update_ownership.sql`](https://gitea.arcodange.lab/arcodange-org/erp/src/branch/main/chart/scripts/update_ownership.sql) and is also mounted into the pod entrypoint as `update_table_ownership.sql`. + +Apply it inside the pod: + +```sh +kubectl exec -n erp $(kubectl get pod -n erp -l app.kubernetes.io/name=erp -o name) -c erp -- \ + sh -c 'PGPASSWORD=${DOLI_DB_PASSWORD} psql -U ${DOLI_DB_USER} -h ${DOLI_DB_HOST} \ + -p ${DOLI_DB_HOST_PORT} ${DOLI_DB_NAME} -f /var/www/scripts/before-starting.d/update_table_ownership.sql' +``` + +If the pod logged `Read-only file system` for the `update_table_ownership.sql` step at startup (the entrypoint cannot write its temp file), the reassignment never ran — run the command above by hand. If the live DB user has already lost rights, run the same SQL with the **admin Postgres credentials** instead. The role model is described in [Postgres IaC](../factory-provisioning/opentofu/postgres-iac.md). + +### Watch PVC usage so backups do not fill the volume + +ERP data and (where co-located) backup artifacts share storage on a single PVC. If on-volume backup snapshots accumulate, they can exhaust the volume and take Dolibarr down. + +| Watch | Why | +| --- | --- | +| PVC used vs. capacity | A full volume crashes Dolibarr (no room for sessions/temp/migrations) | +| Backup artifact growth | Old dumps left on the volume eat the same space data needs | + +> [!WARNING] +> Monitor PVC usage and prune/offload old backup artifacts before they fill the **50Gi** volume. Backup retention, artifact layout, and the off-volume target are documented in [Backup & recovery](backup-and-recovery.md) — keep on-volume copies short-lived. See [Storage & recovery](../lab-ecosystem/storage-and-recovery.md) for the durable-copy model. + +--- + +## See also + +- [ERP hub](README.md) — overview and entry point for the ERP guidebook. +- [Deployment](deployment.md) — how the workload, chart, and credentials reach the cluster. +- [Backup & recovery](backup-and-recovery.md) — backup artifacts, retention, restore drills. +- [Secrets & VSO](../tools/secrets-and-vso.md) — how `vso-db-credentials` / `secretkv` land in the namespace.