[vibe](../../../README.md) > [Guidebooks](../../README.md) > [Factory provisioning](../README.md) > [Ansible](README.md) > **06 · Recover**

# 06 · Recover — Longhorn disaster recovery

> [!NOTE]
> **Status:** 🟡 beta · **Last Updated:** 2026-06-23
> **Upstream:** [Ansible sub-hub](README.md) · [Factory provisioning hub](../README.md)
> **Downstream:** [05 · Backup](05-backup.md) — the dumps these playbooks consume
> **Related:** [Storage & recovery](../../lab-ecosystem/storage-and-recovery.md) · [PRD — QA strategy](../../../PRD/safe-prod-like-environment/qa-strategy.md) · [Longhorn PVC recovery ADR](../../../../ansible/arcodange/factory/docs/adr/20260414-longhorn-pvc-recovery.md)

The `recover/` playbooks are **not** part of the linear `01..05` pipeline — they are an **on-demand disaster-recovery branch**, invoked only after a power cut or data loss. There are two, and which one you run depends on a single question: **do the Longhorn Volume CRDs still exist?**

> [!IMPORTANT]
> **Decision — pick the right playbook before you start:**
> - **Volume CRDs still present** (e.g. they were captured by the [05 · Backup k3s_pvc dump](05-backup.md), or never wiped) → run [`recover/longhorn.yml`](../../../../ansible/arcodange/factory/playbooks/recover/longhorn.yml). Fast: it re-applies the CRDs and the surviving on-disk replicas are re-adopted.
> - **Volume CRDs are GONE** (a nuclear Longhorn reinstall assigned new engine IDs) but the raw replica `.img` files survive on disk → run [`recover/longhorn_data.yml`](../../../../ansible/arcodange/factory/playbooks/recover/longhorn_data.yml). Slow: it merges replica layers at the block-device level and injects the data into a fresh volume.

```mermaid
%%{init: {'theme':'base', 'themeVariables': {'primaryColor':'#1f2937','primaryTextColor':'#f9fafb','lineColor':'#6b7280','fontSize':'14px'}}}%%
flowchart TD
  classDef q fill:#5f4a1e,stroke:#d97706,color:#fffbeb;
  classDef fast fill:#14532d,stroke:#22c55e,color:#f0fdf4;
  classDef slow fill:#5f1e1e,stroke:#ef4444,color:#fef2f2;
  classDef dead fill:#6b7280,stroke:#4b5563,color:#fff;

  Q{"Do the Longhorn<br/>Volume CRDs<br/>still exist?"}:::q
  F["longhorn.yml<br/>CSI/CRD recovery (fast)"]:::fast
  S{"Raw replica<br/>.img files<br/>survive?"}:::q
  D["longhorn_data.yml<br/>block-device recovery (slow)"]:::slow
  X["Data unrecoverable<br/>(replicas zeroed)"]:::dead

  Q -- "yes" --> F
  Q -- "no" --> S
  S -- "yes" --> D
  S -- "no" --> X
```

1. **CRDs present?** Yes → `longhorn.yml` re-applies the Volume CRDs and the on-disk replicas re-attach. Done fast.
2. **CRDs gone?** Then ask whether the raw replica `.img` files survived on disk.
3. **Replicas survive?** Yes → `longhorn_data.yml` reconstructs the filesystem at the block level and injects it into a new volume.
4. **Replicas zeroed** by Longhorn reconciliation → the data is unrecoverable; there is no playbook for this.

> [!NOTE]
> This branch sits at step 1 of the broader tested startup order — **Longhorn first, then Vault unseal, then VSO re-auth, ERP scaled up last**. The full order, the engine-ID failure mode, and the once-real-once-rehearsed history are in [Storage & recovery](../../lab-ecosystem/storage-and-recovery.md). The single tested-recovery record (1-key/threshold-1 unseal, the four-step order) lives in CLUSTER_RECOVERY.md, kept at the lab root outside this repo.

---

## `longhorn.yml` — CSI/CRD recovery (CRDs present)

Runs against `raspberries:&local` as root. It diagnoses how broken Longhorn is and applies the **least invasive** fix that works, escalating only if needed. Most logic runs `run_once` on `pi1`, delegating cluster reads to `localhost`.

| Phase | What it does |
| --- | --- |
| **0 · Pre-flight** | Verifies the data dir `/mnt/arcodange/longhorn` exists on `pi1` (fails hard if missing) and that at least one `backup_*.volumes` dump exists in the primary or fallback backup dir. |
| **1 · Diagnosis** | Checks the `longhorn-system` namespace, the `driver.longhorn.io` **CSIDriver** registration, and the `longhorn-manager` pods, then sets `recovery_phase` = `soft` (CSI driver gone), `hard` (managers unhealthy), or `none`. |
| **2 · Soft** | Touches `longhorn-install.yaml` to make k3s reconcile the HelmChart, waits, and checks pods recreate. |
| **3 · Hard** | Force-deletes the `longhorn-driver-deployer` pods so the HelmChart recreates them. |
| **4 · Nuclear** | Full reinstall: delete the HelmChart, strip finalizers off all Longhorn CRs / PVCs / the namespace, delete + redeploy the `longhorn-install` HelmChart manifest (`v1.9.1`, `defaultDataPath` preserved), wait for pods. |
| **5 · Restore** | Waits for managers to be ready, then `kubectl apply`s the latest `backup_*.volumes` dump (PV/PVC + Longhorn CRDs) and any `longhorn_metadata_*.yaml`. |
| **6 · Verify** | Polls until the CSIDriver is registered, ≥3 managers are Running, the CSI socket exists, and the replica data dir is present; prints a summary. |

> [!IMPORTANT]
> Phase 5 is exactly where the [05 · Backup k3s_pvc dump](05-backup.md) pays off: re-applying the captured **Volume CRDs** lets Longhorn re-adopt the surviving replica directories instead of forcing the block-device path. The playbook is **idempotent** — it re-diagnoses and escalates only as far as needed, so re-running after a partial recovery is safe.

---

## `longhorn_data.yml` — block-device data recovery (CRDs gone)

This is the fallback when a nuclear reinstall has destroyed the Volume CRDs and assigned new engine IDs, leaving the real data in **orphaned** replica directories. It bypasses Kubernetes objects entirely and reconstructs the filesystem at the block level. It is **driven by a vars file** — `vars/recovery_volumes.yml`, one entry per volume — and the format is documented in [`longhorn_data_vars.example.yml`](../../../../ansible/arcodange/factory/playbooks/recover/longhorn_data_vars.example.yml).

```sh
ansible-playbook -i inventory/hosts.yml \
  playbooks/recover/longhorn_data.yml \
  -e @vars/recovery_volumes.yml
```

```mermaid
%%{init: {'theme':'base', 'themeVariables': {'primaryColor':'#1f2937','primaryTextColor':'#f9fafb','lineColor':'#6b7280','fontSize':'14px'}}}%%
flowchart TD
  classDef pre fill:#5f4a1e,stroke:#d97706,color:#fffbeb;
  classDef merge fill:#4c1d95,stroke:#7c3aed,color:#f5f3ff;
  classDef k8s fill:#1e3a5f,stroke:#3b82f6,color:#f9fafb;
  classDef done fill:#14532d,stroke:#22c55e,color:#f0fdf4;

  P0["Pre-flight + Phase 0:<br/>auto-discover largest replica dir (>16K)"]:::pre
  P1["Phase 1: back up untouched replica dir<br/>(safe copy before any op)"]:::merge
  P2["Phase 2: merge-longhorn-layers.py<br/>→ single .img · test-mount RO"]:::merge
  P3["Phase 3: create Volume CRD<br/>(scale down workload, clear stuck PVCs)"]:::k8s
  P5["Phase 5: attach via maintenance ticket<br/>→ /dev/longhorn/&lt;pv&gt;"]:::k8s
  P6["Phase 6: mkfs + rsync merged image<br/>into live block device"]:::merge
  P8["Phase 8: recreate PV (Retain) + PVC<br/>pinned by volumeName"]:::k8s
  P9["Phase 9: scale workload up · verify"]:::done

  P0 --> P1 --> P2 --> P3 --> P5 --> P6 --> P8 --> P9
```

1. **Pre-flight + Phase 0.** Fail fast if no volumes are defined, the merge tool is missing, or Longhorn managers aren't Running. Then **auto-discover** the best replica source for each volume — the **largest dir >16 MiB** across `pi1/pi2/pi3`, skipping any replica still `Rebuilding`. `source_node`/`source_dir` in the vars file override this.
2. **Phase 1.** `cp -a` the untouched replica dir to a backup location *before* touching anything, and verify it contains `volume.meta`.
3. **Phase 2.** Run `merge-longhorn-layers.py` to collapse the snapshot + head `.img` layers into one image, then test-mount it read-only to confirm the filesystem is sound.
4. **Phase 3.** Scale the workload to 0 and clear any stuck `Terminating` PV/PVCs *before* creating a fresh Longhorn `Volume` CRD (order matters — StatefulSet controllers re-provision empty PVCs otherwise).
5. **Phase 5.** Attach the volume via a Longhorn `VolumeAttachment` **maintenance ticket** so `/dev/longhorn/<pv>` appears on the source node, with the frontend enabled.
6. **Phase 6.** `mkfs.ext4` the live block device if unformatted, then `rsync` the merged recovery image into it (`--ignore-errors`; rsync rc=23 partial-transfer is treated as success for power-cut partitions).
7. **Phase 8.** Detach the recovery ticket, recreate the PV (`Retain`, no `claimRef`) and a PVC pinned by `volumeName`, and wait for Bound.
8. **Phase 9.** Scale the workload back up, wait for ready replicas, and run the optional per-volume `verify_cmd` inside the pod.

> [!CAUTION]
> The `merge-longhorn-layers.py` tool is invoked **per replica dir via `dmsetup`** to stack the copy-on-write layers correctly. Never recover by simply renaming the orphaned replica directory to the new engine ID — Longhorn reconciliation can pick the *empty* new replica as the rebuild source and **overwrite your data**. The block-device injection is the only proven-safe path. The full method comparison is in the [Longhorn PVC recovery ADR](../../../../ansible/arcodange/factory/docs/adr/20260414-longhorn-pvc-recovery.md).

> [!NOTE]
> **Tested 2026-04-13 power-cut.** This block-device path was proven end to end recovering the **url-shortener's SQLite database** after that power cut forced a nuclear Longhorn reinstall (verified `2026-04-14` with `sqlite3 … 'SELECT COUNT(*) FROM urls;'`). That scenario is the worked example in [`longhorn_data_vars.example.yml`](../../../../ansible/arcodange/factory/playbooks/recover/longhorn_data_vars.example.yml).

---

## Gotchas

> [!WARNING]
> - **Run `longhorn.yml` first if there is any chance the CRDs survived.** It is fast and idempotent; falling straight to `longhorn_data.yml` is unnecessary block-level work when a `kubectl apply` would have sufficed.
> - **`longhorn_data.yml` needs a healthy Longhorn control plane.** Its pre-flight aborts unless ≥1 `longhorn-manager` is Running — it recovers *data into* a working Longhorn, it does not bring Longhorn back. Use `longhorn.yml` for that.
> - **Process volumes one at a time first.** The example vars file recommends validating a single volume before batching — a misidentified `source_dir` can pin the PVC to the wrong (empty) replica.
> - **`python3` on every node.** Phase 0's replica scan and the merge tool both require `python3` on `pi1/pi2/pi3`.
> - **The merge tool path is repo-relative.** `longhorn_data.yml` resolves `merge-longhorn-layers.py` from `docs/incidents/2026-04-13-power-cut/tools/` and `scp`s it to the source node — run the playbook from inside the collection so that path resolves.

---

## Why this is rehearsed

A recovery procedure run once under outage stress is a liability. These two playbooks — and the CRDs-present-vs-gone decision — are **rehearsed deliberately in the production-like sandbox**: kill the cluster, lose the engine IDs on a test volume, and walk both recovery paths back to green without risking production data. That turns the drill into routine QA rather than one-shot incident memory. See the PRD's [QA strategy](../../../PRD/safe-prod-like-environment/qa-strategy.md) for how recovery drills become a regular exercise, and [Storage & recovery](../../lab-ecosystem/storage-and-recovery.md) for the full startup order these drills validate.

---

## Where this branch sits

```mermaid
%%{init: {'theme':'base', 'themeVariables': {'primaryColor':'#1f2937','primaryTextColor':'#f9fafb','lineColor':'#6b7280','fontSize':'14px'}}}%%
flowchart LR
  classDef done fill:#1e3a5f,stroke:#3b82f6,color:#f9fafb;
  classDef here fill:#5f1e1e,stroke:#ef4444,color:#fef2f2;

  s05["05 · Backup<br/>(produces .volumes dump)"]:::done
  rec["recover/*<br/>longhorn.yml · longhorn_data.yml"]:::here
  s01["01 · System<br/>(rejoin pipeline)"]:::done

  s05 -. "on disaster" .-> rec
  rec -. "once recovered" .-> s01
```

1. **05 · Backup** produced the `.volumes` dump that `longhorn.yml`'s restore phase replays.
2. **recover/** (this page) is invoked only on disaster — pick `longhorn.yml` (CRDs present) or `longhorn_data.yml` (CRDs gone).
3. Once volumes are healthy, the cluster **re-enters the normal pipeline** at [01 · System](01-system.md), and you re-run a fresh [05 · Backup](05-backup.md) once everything is green.