Builds on the dedicated backup (erp#31).
Skip-if-unchanged: each half (DB / documents) carries a content fingerprint at
erp/<env>/.fp-{db,docs} and is dumped+uploaded only if it differs from the last
run — a quiet ERP day re-uploads nothing. Fingerprint = durable BUSINESS content
only: DB = count+max(tms) over tms tables EXCEPT volatile churn (llx_const,
llx_user, session/cron); docs EXCLUDE */temp/* (Dolibarr stats cache) — from both
the fingerprint and the tar. Proven live: 1st run uploads both, immediate 2nd run
skips both (uploaded=0).
Automation: the in-container logic moves to chart/files/backup-job.sh (single
source of truth, read by the orchestrator AND the chart). New
chart/templates/backup-cronjob.yaml renders a daily CronJob + ConfigMap +
VaultStaticSecret, gated by backup.enabled (default false). Helm-verified: off by
default (0 CronJobs), on renders correctly, env-aware (PREFIX erp/prod vs
erp/sandbox), script embedded.
Activation (documented): store GCS HMAC creds at kvv2/<backup.vaultS3Path>
(default erp/backup), grant the erp `auth` Vault role read on it (tools change),
set backup.enabled=true. Until then the orchestrator runs on demand.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
92 lines
4.4 KiB
Markdown
92 lines
4.4 KiB
Markdown
# Dolibarr dedicated backup
|
|
|
|
A backup strategy **dedicated to Dolibarr**, because the accounting data and the
|
|
issued documents are critical and legally retained **10 years** — they warrant more
|
|
than the generic platform backup.
|
|
|
|
## Why this exists (the gap it closes)
|
|
|
|
On 2026-06-30 an audit of the Longhorn external backup found that **the erp documents
|
|
volume had never been backed up offsite** (`lastBackupAt = never`): its Longhorn
|
|
volume is enrolled only in the `default` recurring-job group, but the single backup
|
|
job (`thrice-a-month-backup`) has `groups=[]`, so it serves *no* group — the erp
|
|
volume (and erp-sandbox) fell through the crack. Only in-cluster Longhorn replicas
|
|
protected `/var/www/documents` (issued invoice PDFs, supplier pieces, contracts, ECM)
|
|
— which does not survive a cluster loss / corruption / power-cut.
|
|
|
|
This tool backs up **both halves** of Dolibarr state to the existing object store
|
|
(`s3://arcodange-backup`, GCS via the S3-compatible API), under `erp/<env>/`:
|
|
|
|
| half | how | key |
|
|
|---|---|---|
|
|
| Postgres DB | `pg_dump -Fc` (restorable) | `erp/<env>/db/<ts>.dump` |
|
|
| documents PVC | `tar -czf` of `/var/www/documents` (RWX, mounted read-only) | `erp/<env>/docs/<ts>.tar.gz` |
|
|
|
|
then prunes to a **tiered retention**: daily for 30 days, monthly for 12 months,
|
|
yearly for ~10 years.
|
|
|
|
**Skip-if-unchanged:** each half carries a content fingerprint at `erp/<env>/.fp-{db,docs}`
|
|
and is dumped+uploaded only if it **differs** from the last run — so a quiet ERP day
|
|
re-uploads nothing. The fingerprint is over **durable business content only**: the DB
|
|
side is `count + max(tms)` over every `tms` table *except* volatile ones (`llx_const`,
|
|
`llx_user`, sessions/cron), and the documents side excludes `*/temp/*` (Dolibarr's
|
|
constantly-regenerated stats cache) — from both the fingerprint *and* the tar.
|
|
|
|
## Safety (mirrors `ops/sandbox/sandbox-lifecycle.sh`)
|
|
|
|
- **prod is read-only**: `pg_dump` and `tar` only read; the only writes go to the
|
|
backup bucket, never to prod. The DB is read with the env's *own* dynamic creds
|
|
(`vso-db-credentials`); prod and sandbox never cross.
|
|
- **S3 creds are never exposed**: the GCS HMAC secret is copied into a *transient*
|
|
secret in the app namespace (values stay base64), deleted on exit. The whole
|
|
in-container script is shipped base64 — no secret is ever printed.
|
|
|
|
## Usage
|
|
|
|
```sh
|
|
# one-shot backup + prune (run from anywhere; needs kubectl on the lab cluster)
|
|
ops/backup/dolibarr-backup.sh backup --env prod
|
|
ops/backup/dolibarr-backup.sh backup --env sandbox
|
|
|
|
# what's in the store
|
|
ops/backup/dolibarr-backup.sh list --env prod
|
|
```
|
|
|
|
`chart/files/backup-job.sh` is the in-container logic (env-driven: `BUCKET PREFIX
|
|
DB PGHOST` + the mounted DB/S3 creds) — the single source of truth shared by this
|
|
orchestrator and the scheduled CronJob (see "Automation" below).
|
|
|
|
**Status:** the first real prod backup was taken 2026-06-30
|
|
(`erp/prod/db/…` 1.2 MB, `erp/prod/docs/…` 12.5 MB). Proven end-to-end live on the
|
|
sandbox (dump + tar + GCS upload + retention prune).
|
|
|
|
## Restore (manual, for now)
|
|
|
|
```sh
|
|
# DB: aws s3 cp s3://arcodange-backup/erp/<env>/db/<ts>.dump - | pg_restore -h <host> -U <user> -d <db> --clean
|
|
# docs: aws s3 cp s3://arcodange-backup/erp/<env>/docs/<ts>.tar.gz - | tar -C /var/www/documents -xzf -
|
|
```
|
|
The sandbox iso-prod refresh (`ops/sandbox/sandbox-lifecycle.sh`) is the natural
|
|
restore-drill bench. A `restore` subcommand is wired next.
|
|
|
|
## Automation — the CronJob (gated on creds)
|
|
|
|
The recurring form ships in the chart (`chart/templates/backup-cronjob.yaml`,
|
|
`backup.enabled=false` by default): a daily **CronJob** (ConfigMap-mounted
|
|
`backup-job.sh`) with its **own** S3 creds via a `VaultStaticSecret` — no
|
|
cross-namespace borrowing of the Longhorn secret. To activate:
|
|
|
|
1. store the GCS HMAC creds (`AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` /
|
|
`AWS_ENDPOINTS`, same shape as `longhorn-gcs-backup-credentials`) at
|
|
`kvv2/<backup.vaultS3Path>` (default `erp/backup`);
|
|
2. grant the erp `auth` Vault role read on that path (a `tools` change) if its
|
|
policy doesn't already cover it;
|
|
3. set `backup.enabled: true` (+ tune `schedule`).
|
|
|
|
Until then, run the orchestrator above on demand / from a host cron — it works
|
|
today by borrowing the Longhorn creds transiently.
|
|
|
|
> The generic Longhorn gap (the orphaned `default` group) should be fixed too, as a
|
|
> platform concern — but this dedicated, offsite, 10-year-retention backup is the
|
|
> one that matches Dolibarr's legal criticality.
|