add dolibarr-tva-reconciliation, dolibarr-recurring-templates, dolibarr-data-snapshot

V3 bundle — three sibling skills under .claude/skills/, all read-only, all depending on the dolibarr base skill. dolibarr-tva-reconciliation: - tva-by-month.sh: HT + TVA grouped by (year-month × tva_tx), ready for CA3 / CA12 transcription. - tva-line-detail.sh: per-line audit trail with country-based bucket assignment (A1 domestic / A4 intra-UE autoliquidation / E2 export hors UE). Documents the French TVA mental model. - Today every Arcodange line is E2 (KissMetrics, US, autoliquidation 259-1° CGI). The skill scales for the day a French B2B is invoiced. dolibarr-recurring-templates: - list-templates.sh: probes /invoices/templates/{id} since there's no list endpoint. Stops after 5 consecutive empty responses. - inspect-template.sh: full audit per template, with health checks. - Surfaces that the "Kiss Metrics Invoice" template has frequency=0 and nb_gen_done=0 — it is NOT auto-firing. Every KM invoice today was manually duplicated. Cohort-review implication: the deferred 9-month cycle depends on Gabriel clicking "Generate" each month, not on a Dolibarr cron. dolibarr-data-snapshot: - snapshot.sh: bundles every read endpoint the dolibarr-* family uses into one JSON with a content_hash (sha256 of data only, excluding timestamp — so identical state hashes identically across runs). - Use cases: cohort evidence packs, drift detection, archival before a known-risky UI change. - V1 baseline summary captured at examples/snapshot-summary.txt (the ~246 KB snapshot file itself is intentionally not committed). Also extends dolibarr/SKILL.md endpoint catalogue with /invoices/templates/{id} (and its no-list-endpoint quirk + the id-null sentinel for missing ids), plus links to the three new sibling skills. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 00:01:06 +02:00
parent d34cba3fa0
commit f19b1d2ef2
16 changed files with 1554 additions and 1 deletions
--- a/.claude/skills/dolibarr-data-snapshot/SKILL.md
+++ b/.claude/skills/dolibarr-data-snapshot/SKILL.md
@@ -0,0 +1,114 @@
+---
+name: dolibarr-data-snapshot
+description: Snapshot the read-only state of the Arcodange Dolibarr instance into a single content-addressable JSON file — status, thirdparties (list + per-id detail), invoices (list + per-id detail + per-id payments), recurring templates, products, bank accounts. Each snapshot includes a `content_hash` (sha256 of the data, EXCLUDING the captured-at timestamp) so two snapshots of identical state hash identically — drift detection is one comparison. Use cases: cohort-review evidence packs, archival before a known-risky change, time-series drift detection between two dates, point-in-time forensics. Use when the user asks "snapshot Dolibarr", "dump the state", "archiver l'ERP", "preuve cohort", "diff entre deux dates". Depends on the `dolibarr` skill. SKIP for one-shot reads (use the other workflow skills directly), for PDF / binary attachments (intentionally excluded — would bloat the snapshot), for write-side changes (this is read-only forensics), and for snapshotting NON-Dolibarr state (bank statements, k8s, etc. — those would be sibling skills).
+requires:
+  bins: ["curl", "jq", "python3"]
+  auth: true
+---
+
+# dolibarr-data-snapshot — point-in-time JSON dump of Dolibarr read state
+
+One script: [`snapshot.sh`](scripts/snapshot.sh). Pulls every read-only endpoint the `dolibarr-*` family uses and bundles into a single JSON file with a content hash. Read-only, no side effects.
+
+Depends on the [dolibarr](../dolibarr/SKILL.md) base skill.
+
+## What's in the snapshot
+
+```json
+{
+  "schema_version": "1",
+  "captured_at": "2026-05-28T21:58:50Z",
+  "instance": "erp.arcodange.lab",
+  "content_hash": "sha256:6b94cd312d55a693d3c533ae6c9a5abef2734dd5bca8d4b1bdd5ca6ea6fc1f9a",
+  "data": {
+    "status":         { ... GET /status ... },
+    "thirdparties":   { "list": [ ... GET /thirdparties ... ], "detail": { "1": { ... GET /thirdparties/1 ... }, ... } },
+    "invoices":       { "list": [ ... ], "detail": { "12": { ... }, ... }, "payments": { "12": [ ... ], ... } },
+    "recurring_templates": { "1": { ... GET /invoices/templates/1 ... }, ... },
+    "products":       [ ... GET /products ... ],
+    "bank_accounts":  [ ... GET /bankaccounts ... ]
+  }
+}
+```
+
+**`content_hash` is the sha256 of `data` only** — it deliberately excludes `captured_at`, `schema_version`, `instance`, and the hash field itself. So two snapshots taken at different moments but reflecting **identical Dolibarr state** have the same `content_hash`. That's what makes drift detection trivial:
+
+```bash
+jq -r .content_hash snap-2026-05.json snap-2026-06.json
+# Same hash → no data changed between the two captures.
+# Different hash → use `jq` / `diff` to find what moved.
+```
+
+## Usage
+
+```bash
+./scripts/snapshot.sh                          # writes ./snapshot-YYYY-MM-DDTHHMMSSZ.json
+./scripts/snapshot.sh --out /tmp/baseline.json
+./scripts/snapshot.sh --print-only             # stdout, no file (pipe-friendly)
+./scripts/snapshot.sh --max-template-id 100    # raise the template-probe upper bound
+```
+
+Live output of the current Dolibarr (captured at [examples/snapshot-summary.txt](examples/snapshot-summary.txt) — the actual JSON is too big to commit verbatim, ~246 KB):
+
+```
+wrote ./snapshot-2026-05-28T215850Z.json (246186 bytes)
+  sha256:6b94cd312d55a693d3c533ae6c9a5abef2734dd5bca8d4b1bdd5ca6ea6fc1f9a
+```
+
+Contents (V1 baseline):
+- 10 thirdparties (KissMetrics + 9 others — prospects / suppliers / etc.)
+- 5 invoices, all KM (1 avoir + 4 regular)
+- 5 per-invoice payment arrays
+- 1 recurring template (Kiss Metrics Invoice — frequency=0, see `dolibarr-recurring-templates`)
+- 2 products (KM-audit, KM-cloud-devops)
+- 3 bank accounts (QONTO, WISE EURO, G.RADUREAU CCA)
+
+## What's intentionally excluded
+
+- **PDF attachments.** `/documents/download` returns base64 bodies up to ~MB each. Including them would 10×–100× the snapshot size. Workflow skills (`dolibarr-invoice-audit`) fetch PDFs on-demand.
+- **`users/info`** — leaks the `ai_agent` account internals. Out of scope for a read-only state dump.
+- **`/setup/modules`** — admin-only, not available to `ai_agent`.
+- **Anything that requires writing** (cron triggers, etc.).
+- **`/payments` list-all** — returns 501 (see base skill catalogue); we get payment data via per-invoice fetches.
+
+## Use cases
+
+### 1. Cohort-review evidence pack
+
+```bash
+./scripts/snapshot.sh --out evidence/dolibarr-2026-05-28.json
+# Send the file as proof of the billing state at that moment.
+# The content_hash signs the data.
+```
+
+### 2. Drift detection between two dates
+
+```bash
+./scripts/snapshot.sh --out snap-may.json
+# ... a month passes ...
+./scripts/snapshot.sh --out snap-jun.json
+jq -r .content_hash snap-may.json snap-jun.json
+# Different → something moved. Find what:
+diff <(jq -S .data snap-may.json) <(jq -S .data snap-jun.json) | head -50
+```
+
+### 3. Archive before a known-risky change
+
+Before manually firing the next M-N invoice, regenerating the PDF template, or any UI change with billing consequences:
+
+```bash
+./scripts/snapshot.sh --out before-change-$(date -u +%Y%m%d).json
+# Make the change ...
+./scripts/snapshot.sh --out after-change-$(date -u +%Y%m%d).json
+# Diff to confirm only the intended state moved.
+```
+
+## Performance
+
+On the current Arcodange instance (5 invoices, 10 thirdparties, 1 template), the snapshot completes in **~2 seconds** with one HTTP call per top-level resource + N calls for per-id fetches. At ~30 thirdparties + ~100 invoices + ~10 templates, expect ~150 calls and ~10 s.
+
+## Out of scope
+
+- **Snapshotting bank statements** (Qonto / Wise CSV exports). Different data source — would be a sibling skill (`arcodange-bank-snapshot` or similar).
+- **Snapshotting Kubernetes state** of the ERP deployment. Sibling skill candidate (`arcodange-k8s-snapshot`).
+- **Schema migrations / drift in the Dolibarr DB itself.** That requires `dolibarr-postgres-readonly` or similar; out of scope here.
--- a/.claude/skills/dolibarr-data-snapshot/examples/snapshot-summary.txt
+++ b/.claude/skills/dolibarr-data-snapshot/examples/snapshot-summary.txt
@@ -0,0 +1,24 @@
+# dolibarr-data-snapshot - V1 baseline summary
+#
+# The actual JSON snapshot file is ~246 KB and is intentionally NOT committed.
+# This file is the structural digest captured at the same moment.
+
+captured_at:   2026-05-28T22:00:15Z
+instance:      erp.arcodange.lab
+content_hash:  sha256:6b94cd312d55a693d3c533ae6c9a5abef2734dd5bca8d4b1bdd5ca6ea6fc1f9a
+schema_ver:    1
+
+section                       count
+------------------------------------------------
+  status                        1 (dolibarr_version=22.0.4)
+  thirdparties.list             10
+  thirdparties.detail           10
+  invoices.list                 5
+  invoices.detail               5
+  invoices.payments             5
+  recurring_templates           1 (ids: ['1'])
+  products                      2
+  bank_accounts                 3
+
+# Stable content_hash check: run snapshot.sh twice quickly. Both content_hash
+# values should be identical when Dolibarr state has not changed between runs.
--- a/.claude/skills/dolibarr-data-snapshot/scripts/snapshot.sh
+++ b/.claude/skills/dolibarr-data-snapshot/scripts/snapshot.sh
@@ -0,0 +1,174 @@
+#!/usr/bin/env bash
+# Snapshot the read-only state of the Arcodange Dolibarr into one JSON file.
+#
+# Usage:
+#   snapshot.sh [--out PATH]      # default: ./snapshot-YYYY-MM-DDTHHMMSS.json
+#   snapshot.sh --print-only      # write to stdout instead of a file
+#
+# The snapshot is content-addressable: it includes a SHA-256 of the
+# serialized payload (computed AFTER stable key-sorting) so two snapshots
+# of the same state hash identically. Useful for:
+#   - cohort review evidence packs (sign + send)
+#   - drift detection between dates (diff two snapshots)
+#   - archival before a known-risky change
+#
+# What's included (everything the dolibarr-* family reads):
+#   - status (Dolibarr version)
+#   - thirdparties (full list + detail)
+#   - invoices (full list + per-invoice detail + per-invoice payments)
+#   - recurring invoice templates (probed 1..MAX_TEMPLATE_ID)
+#   - products
+#   - bank accounts
+#
+# Excluded by design:
+#   - PDF attachments (binary, would bloat the snapshot ~50KB each)
+#   - users/info (would leak ai_agent details)
+#   - any non-read endpoints
+#
+# Requires: curl, jq, python3 (with hashlib — standard lib).
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+DOL_CURL="${SCRIPT_DIR}/../../dolibarr/scripts/dol-curl.sh"
+
+MAX_TEMPLATE_ID=20
+EMPTY_TPL_TOLERANCE=5
+OUT=""
+PRINT_ONLY=0
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --out) OUT="$2"; shift 2 ;;
+    --print-only) PRINT_ONLY=1; shift ;;
+    --max-template-id) MAX_TEMPLATE_ID="$2"; shift 2 ;;
+    -h|--help) sed -n '2,20p' "$0" | sed 's/^# \{0,1\}//'; exit 0 ;;
+    *) echo "snapshot.sh: unknown arg: $1" >&2; exit 2 ;;
+  esac
+done
+
+WORK="$(mktemp -d -t dolsnap.XXXXXX)"
+trap 'rm -rf "${WORK}"' EXIT
+
+fetch_into() {
+  local out_file="$1" path="$2"
+  "${DOL_CURL}" "${path}" > "${out_file}" 2>/dev/null || {
+    # On HTTP error, dol-curl prints body+stderr; capture body for record.
+    "${DOL_CURL}" "${path}" > "${out_file}" 2>&1 || true
+  }
+}
+
+# 1. Liveness + status
+fetch_into "${WORK}/status.json"       /status
+
+# 2. Thirdparties (list + detail)
+fetch_into "${WORK}/tps_list.json"     /thirdparties
+TP_IDS=$(python3 -c "
+import json,sys
+try: d = json.load(open(sys.argv[1]))
+except: d = []
+if isinstance(d, list): print(' '.join(str(t['id']) for t in d if t.get('id')))
+" "${WORK}/tps_list.json")
+mkdir -p "${WORK}/tps"
+for id in ${TP_IDS}; do fetch_into "${WORK}/tps/${id}.json" "/thirdparties/${id}"; done
+
+# 3. Invoices (list + detail + payments)
+fetch_into "${WORK}/inv_list.json"     '/invoices?limit=500&sortfield=t.datef&sortorder=ASC'
+INV_IDS=$(python3 -c "
+import json,sys
+try: d = json.load(open(sys.argv[1]))
+except: d = []
+print(' '.join(str(i['id']) for i in d if i.get('id')))
+" "${WORK}/inv_list.json")
+mkdir -p "${WORK}/inv" "${WORK}/pay"
+for id in ${INV_IDS}; do
+  fetch_into "${WORK}/inv/${id}.json"  "/invoices/${id}"
+  fetch_into "${WORK}/pay/${id}.json"  "/invoices/${id}/payments"
+done
+
+# 4. Recurring templates (probe)
+mkdir -p "${WORK}/tpl"
+CONSECUTIVE_EMPTY=0
+for tid in $(seq 1 "${MAX_TEMPLATE_ID}"); do
+  fetch_into "${WORK}/tpl/${tid}.json" "/invoices/templates/${tid}"
+  REAL=$(python3 -c "import json,sys
+try: d=json.load(open(sys.argv[1])); print('1' if d.get('id') else '0')
+except: print('0')" "${WORK}/tpl/${tid}.json")
+  if [[ "${REAL}" == "1" ]]; then
+    CONSECUTIVE_EMPTY=0
+  else
+    CONSECUTIVE_EMPTY=$((CONSECUTIVE_EMPTY+1))
+    rm "${WORK}/tpl/${tid}.json"
+    [[ ${CONSECUTIVE_EMPTY} -ge ${EMPTY_TPL_TOLERANCE} ]] && break
+  fi
+done
+
+# 5. Products + bank accounts
+fetch_into "${WORK}/products.json"      /products
+fetch_into "${WORK}/bankaccounts.json"  /bankaccounts
+
+# 6. Compose the snapshot
+python3 - "${WORK}" <<'PY' > "${WORK}/snapshot.json"
+import json, os, sys, datetime, hashlib
+
+work = sys.argv[1]
+
+def load(path, default):
+    try: return json.load(open(path))
+    except (FileNotFoundError, json.JSONDecodeError): return default
+
+def load_dir(dirname):
+    out = {}
+    full = os.path.join(work, dirname)
+    if not os.path.isdir(full): return out
+    for fn in sorted(os.listdir(full)):
+        if not fn.endswith(".json"): continue
+        key = fn[:-len(".json")]
+        out[key] = load(os.path.join(full, fn), None)
+    return out
+
+data = {
+    "status":         load(os.path.join(work, "status.json"),       {}),
+    "thirdparties":   {
+        "list":   load(os.path.join(work, "tps_list.json"),  []),
+        "detail": load_dir("tps"),
+    },
+    "invoices":       {
+        "list":     load(os.path.join(work, "inv_list.json"), []),
+        "detail":   load_dir("inv"),
+        "payments": load_dir("pay"),
+    },
+    "recurring_templates": load_dir("tpl"),
+    "products":       load(os.path.join(work, "products.json"),      []),
+    "bank_accounts":  load(os.path.join(work, "bankaccounts.json"),  []),
+}
+
+# content_hash is the sha256 of `data` only — excludes timestamp + metadata,
+# so two snapshots of identical Dolibarr state hash identically.
+# (Drift detection is then: compare content_hash, done.)
+content_serialized = json.dumps(data, sort_keys=True, ensure_ascii=False).encode("utf-8")
+content_hash = "sha256:" + hashlib.sha256(content_serialized).hexdigest()
+
+payload = {
+    "schema_version": "1",
+    "captured_at":    datetime.datetime.now(datetime.timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
+    "instance":       "erp.arcodange.lab",
+    "content_hash":   content_hash,
+    "data":           data,
+}
+
+print(json.dumps(payload, indent=2, ensure_ascii=False, sort_keys=True))
+PY
+
+# 7. Output
+if [[ "${PRINT_ONLY}" == "1" ]]; then
+  cat "${WORK}/snapshot.json"
+else
+  if [[ -z "${OUT}" ]]; then
+    OUT="./snapshot-$(date -u +%Y-%m-%dT%H%M%SZ).json"
+  fi
+  cp "${WORK}/snapshot.json" "${OUT}"
+  SIZE=$(stat -f %z "${OUT}" 2>/dev/null || stat -c %s "${OUT}")
+  HASH=$(python3 -c "import json,sys; print(json.load(open(sys.argv[1]))['content_hash'])" "${OUT}")
+  echo "wrote ${OUT} (${SIZE} bytes)"
+  echo "  ${HASH}"
+fi