[vibe](../../../README.md) > [Guidebooks](../../README.md) > [Factory provisioning](../README.md) > [Ansible](README.md) > **05 · Backup** # 05 · Backup — daily cron dumps > [!NOTE] > **Status:** ✅ active · **Last Updated:** 2026-06-23 > **Upstream:** [Ansible sub-hub](README.md) · [Factory provisioning hub](../README.md) > **Downstream:** [06 · Recover](06-recover.md) — how these dumps are replayed > **Related:** [Storage & recovery](../../lab-ecosystem/storage-and-recovery.md) · [04 · Tools](04-tools.md) · [ADR-0001 safe prod-like environment](../../../ADR/0001-safe-prod-like-environment.md) Stage 5 installs three independent **cron-driven backup jobs** that protect the platform's persistent state: the PostgreSQL database, the Gitea instance, and the K3s volume metadata (PV/PVC + Longhorn CRDs). The entry point [`playbooks/05_backup.yml`](../../../../ansible/arcodange/factory/playbooks/05_backup.yml) imports [`playbooks/backup/backup.yml`](../../../../ansible/arcodange/factory/playbooks/backup/backup.yml), which chains the three sub-playbooks, each passing `backup_root_dir: /mnt/backups`. Every job follows the **same anatomy**: run a daily cron at **04:00**, write a date-stamped archive to `/mnt/backups//`, prune anything older than **3 days**, and drop a matching `restore.sh` next to the backup script. `/mnt/backups` is a Longhorn RWX volume, so Longhorn itself snapshots, replicates, and ships these archives off-site — the cron jobs only produce the dumps. > [!NOTE] > All three sub-playbooks **install** scripts and cron entries; they do not run a backup themselves (beyond a one-shot `test backup_cmd` smoke check that pipes to `/dev/null`). The actual backups fire from cron. To read failures, SSH to the host and use `sudo su` → `mails` (see [`backup/README.md`](../../../../ansible/arcodange/factory/playbooks/backup/README.md)). --- ## The three jobs | Job | Sub-playbook | Host | Backup command | Artifact | Scripts dir | | --- | --- | --- | --- | --- | --- | | **Postgres** | [`backup/postgres.yml`](../../../../ansible/arcodange/factory/playbooks/backup/postgres.yml) | `postgres` | `docker exec pg_dumpall -U ` ∣ `gzip` | `backup_YYYYMMDD.sql.gz` | `…/docker_composes/postgres/scripts` | | **Gitea** | [`backup/gitea.yml`](../../../../ansible/arcodange/factory/playbooks/backup/gitea.yml) | `gitea` | `docker exec -u git gitea dump --skip-log --skip-db --skip-package-data --type tar.gz` | `backup_YYYYMMDD.gitea.gz` | `…/docker_composes/gitea/scripts` | | **K3s PVC** | [`backup/k3s_pvc.yml`](../../../../ansible/arcodange/factory/playbooks/backup/k3s_pvc.yml) | `pi1` | `kubectl get pv,pvc` + `volumes.longhorn.io` + `settings.longhorn.io` (YAML) | `backup_YYYYMMDD.volumes` | `/opt/k3s_volumes` | All three share: `keep_days: 3`, cron `minute: 0 hour: 4 user: root`, and `backup_dir: /mnt/backups/`. ```mermaid %%{init: {'theme':'base', 'themeVariables': {'primaryColor':'#1f2937','primaryTextColor':'#f9fafb','lineColor':'#6b7280','fontSize':'14px'}}}%% flowchart TD classDef cron fill:#5f4a1e,stroke:#d97706,color:#fffbeb; classDef job fill:#1e3a5f,stroke:#3b82f6,color:#f9fafb; classDef store fill:#4c1d95,stroke:#7c3aed,color:#f5f3ff; classDef ship fill:#14532d,stroke:#22c55e,color:#f0fdf4; C["cron · daily 04:00 · user root"]:::cron PG["postgres.yml
pg_dumpall ∣ gzip"]:::job GT["gitea.yml
gitea dump tar.gz"]:::job PV["k3s_pvc.yml
PV · PVC · Longhorn CRDs"]:::job D["/mnt/backups/{postgres,gitea,k3s_pvc}/
keep 3 days"]:::store L["Longhorn:
snapshot · replicate · off-site"]:::ship C --> PG --> D C --> GT --> D C --> PV --> D D --> L ``` 1. A single daily **04:00 root cron** triggers each job's `backup.sh`. 2. **postgres.yml** runs `pg_dumpall` through `gzip`, **gitea.yml** streams a `gitea dump` tarball, **k3s_pvc.yml** serialises the volume metadata. 3. Each writes a date-stamped archive into `/mnt/backups//` and prunes files older than 3 days (`find … -mtime +3 -delete`). 4. Because `/mnt/backups` is a Longhorn RWX volume, Longhorn snapshots, replicates across nodes, and ships an off-site copy — no separate upload step in the cron. --- ## Job details ### Postgres — `postgres.yml` The backup command is built from the Postgres host's docker-compose facts (`container_name`, `POSTGRES_USER`). `pg_dumpall` captures **all databases plus globals (roles)** in one logical dump, gzipped. The generated `restore.sh` takes an optional `YYYYMMDD` argument (defaults to the latest dump), `docker cp`s it into the container, gunzips, and replays with `psql -f`. If the restore misbehaves, the script reminds you to wipe the data dir before replaying. ### Gitea — `gitea.yml` The dump runs as the `git` user with `--skip-db` (Postgres is backed up separately by the Postgres job) and `--skip-package-data`, streamed to stdout (`-f -`) so it never lands on the container's own disk. The `restore.sh` unpacks the tarball back into `/data/gitea` (config/data) and `/data/git/repositories` (repos), fixes `git:git` ownership, and **regenerates hooks** (`gitea admin regenerate hooks`) — without that step the restored repos have stale hook paths. ### K3s PVC — `k3s_pvc.yml` This job does **not** back up volume *data* (Longhorn handles the bytes). It backs up the **Kubernetes objects** needed to re-bind those volumes: all `pv` + `pvc`, the **`volumes.longhorn.io` CRDs**, and `settings.longhorn.io`, concatenated into one `.volumes` YAML (`---`-separated). It writes the dump to both `/mnt/backups/k3s_pvc/` *and* a copy alongside the script. The `restore.sh` prefers a fallback dir (`/home/pi/arcodange/backups/k3s_pvc`) then the primary, picks the latest (or a dated) dump, and `kubectl apply`s it. > [!IMPORTANT] > **Backing up the Longhorn `volumes.longhorn.io` CRDs is what enables *fast* recovery.** With the Volume CRDs in the backup, recovery is a single `kubectl apply` that re-associates the surviving on-disk replicas with their PVs (see [06 · Recover → `longhorn.yml`](06-recover.md)). **Without** the Volume CRDs, a Longhorn reinstall assigns **new engine IDs**, cannot adopt the orphaned replica directories, and you fall through to the slow **block-device data recovery** (`longhorn_data.yml`). The k3s_pvc backup_cmd carries an inline comment to this effect and points at the [Longhorn PVC recovery ADR](../../../../ansible/arcodange/factory/docs/adr/20260414-longhorn-pvc-recovery.md). This is the prevention half of the [storage failure mode](../../lab-ecosystem/storage-and-recovery.md). --- ## Gotchas > [!WARNING] > - **3-day retention is tight.** A failure that goes unnoticed for 3 days loses all recoverable history. The off-site Longhorn copy is the longer-horizon safety net — the local `/mnt/backups` files are short-lived. > - **The smoke test runs the real dump.** Each play has a `test backup_cmd` task that executes the backup command (output discarded) at provisioning time. If Postgres/Gitea/kubectl is unreachable when you run stage 5, provisioning fails fast — by design. > - **Cron runs as `root`, scripts live in app dirs.** The `backup.sh`/`restore.sh` are written into the app's docker-compose `scripts/` dir (or `/opt/k3s_volumes`); the cron job invokes them as root. Don't relocate the compose dirs without re-running stage 5. > - **Gitea restore needs the hook regeneration.** Skipping `gitea admin regenerate hooks` leaves repos with broken push hooks — the `restore.sh` already does it, so use the script rather than a manual untar. > - **Postgres and Gitea DB are backed up by *different* jobs.** Gitea dumps with `--skip-db`; its database rows come from the Postgres `pg_dumpall`. Restoring Gitea fully means restoring **both** archives. --- ## Where stage 5 sits ```mermaid %%{init: {'theme':'base', 'themeVariables': {'primaryColor':'#1f2937','primaryTextColor':'#f9fafb','lineColor':'#6b7280','fontSize':'14px'}}}%% flowchart LR classDef done fill:#1e3a5f,stroke:#3b82f6,color:#f9fafb; classDef here fill:#4c1d95,stroke:#7c3aed,color:#f5f3ff; classDef rec fill:#5f1e1e,stroke:#ef4444,color:#fef2f2; s04["04 · Tools"]:::done s05["05 · Backup
Postgres · Gitea · K3s PVC"]:::here rec["recover/*
(on disaster)"]:::rec s04 --> s05 s05 -. "feeds restore" .-> rec ``` 1. **04 · Tools** stood up Vault and CrowdSec — the secret store stage 5's dumps help protect. 2. **05 · Backup** (this page) is the last linear stage: it schedules the daily dumps. 3. The artifacts here are the **input** to the on-demand [06 · Recover](06-recover.md) branch — the `.volumes` dump in particular gates whether recovery is fast (CRDs present) or slow (block-device).