Compare commits

...

10 Commits

Author SHA1 Message Date
b4bdbe75df Merge pull request 'chart: Phase C of multi-env evolution — template literals, add sandbox overlay' (#11) from claude/chart-multi-env-prep into main 2026-05-31 23:27:05 +02:00
ec4df4719f chart: template hardcoded single-env literals; add values-sandbox.yaml overlay
Phase C of the multi-env evolution discussed in the runbook design thread
(see PR description). Pure refactor — the prod helm template render is
verified byte-identical (10857 bytes both before and after, diff exit 0).

What was hardcoded, now templated:
- chart/templates/vaultauth.yaml          role: erp                       → role: {{ .Values.vault.k8sRole }}
- chart/templates/vaultdynamicsecret.yaml path: creds/erp                 → path: {{ .Values.vault.dynamicPath }}
- chart/templates/vaultsecret.yaml        path: erp/config                → path: {{ .Values.vault.staticPath }}
- chart/templates/config.yaml             DOLI_DB_NAME: erp               → DOLI_DB_NAME: {{ .Values.db.name }}
                                          DOLI_URL_ROOT: https://erp..lab → DOLI_URL_ROOT: 'https://{{ .Values.host }}'

values.yaml gains a documented multi-env coordinate block with prod defaults
(env, instance, host, db.name, vault.k8sRole, vault.dynamicPath, vault.staticPath).
The elision rule (env=prod → no suffix, env=non-prod → "<app>-<env>" suffix)
guarantees the prod render is unchanged.

chart/values-sandbox.yaml is added as the ready-to-use overlay for Phase D.
It is NOT wired into any helm install / ArgoCD app today — the platform side
(factory/postgres/iac tfvars, tools/hashicorp-vault/iac module signature) is
not yet evolved. The file documents the convention so the Phase D commit can
just `helm install -f values.yaml -f values-sandbox.yaml`.

Also fixes .gitea/workflows/vault.yaml CI typo: the vault_step JWT role was
gitea_cicd_webapp (copy-paste from the template repo) instead of
gitea_cicd_erp. Real bug — the erp CI would have failed JWT auth against
Vault. Fix unrelated to multi-env but bundled here because it's small and
touches the same file family.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 23:26:20 +02:00
444886b91a Merge pull request 'arcodange-email-ingest V8.1: filter calendar invites + newsletter senders' (#10) from claude/arcodange-email-ingest-v81 into main 2026-05-31 15:18:58 +02:00
1d38f25c23 arcodange-email-ingest V8.1: filter calendar invites + newsletter senders
email-list.sh gains two hard-exclusion filters (applied before the
candidate test, regardless of attachments):

- EXCLUDE_PATTERN matches subjects starting with Invitation: / Updated
  invitation: / Canceled event: / Accepted: / Declined: / Tentative: /
  Maybe: (after stripping Re:/Fwd:/Tr: prefixes). Filters Google Calendar
  events that always carry an .ics attachment.
- EXCLUDE_SENDER matches updates.<domain>, noreply@*calendar, news@,
  newsletter@. Filters newsletter blast traffic.

Effect on --all-folders --candidates-only baseline: 27 noisy → 12
actionable (calendar invites + the staying-ahead.ai newsletter blast
removed). Real supplier docs intact: Darnis F1042 in /Notification, 3 Free
Mobile factures in /Inbox/abonnements, Mistral + Anthropic in /Inbox/books.

The originally-planned --mark-ingested feature is deferred to V8.2:
flag-set requires the Zoho OAuth scope ZohoMail.messages.UPDATE which our
read-only refresh_token doesn't have. Documented in SKILL.md: once the
user opts in to the wider scope, --mark-ingested becomes a one-line flag
on email-inspect.sh and is_candidate() learns to skip flag_info messages.

Captured the new --all-folders baseline at examples/email-list-all-folders.txt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 15:18:31 +02:00
794aa18d2a Merge pull request 'add arcodange-email-ingest — Zoho Mail → Dolibarr supplier-invoice drafts' (#9) from claude/arcodange-email-ingest into main 2026-05-31 14:56:58 +02:00
c2d8479f5e add arcodange-email-ingest — Zoho Mail → Dolibarr supplier-invoice drafts
V8 — first inbound-side skill. Closes the loop from "bill arrives by email"
to "ready to enter in Dolibarr UI". Read-only at every layer.

What ships:
- arcodange-email-ingest/scripts/zoho-curl.sh   OAuth wrapper with token cache
                                                (50 min TTL, mode 600) — avoids
                                                hitting Zoho OAuth rate limit on
                                                every invocation.
- arcodange-email-ingest/scripts/email-list.sh   List candidates in /Inbox/books
                                                (where the books@ alias auto-
                                                routes mail). --candidates-only
                                                filter on supplier patterns or
                                                attachments. --all-folders to
                                                scan everything.
- arcodange-email-ingest/scripts/email-inspect.sh   Pull message + attachments,
                                                pdftotext on each PDF, heuristic
                                                extract (supplier, ref, dates,
                                                totals, VAT rate), emit Dolibarr
                                                supplier-invoice draft JSON.

Architecture choice — Zoho API (not IMAP):
- books@arcodange.fr is an alias of gabrielradureau@arcodange.fr → one OAuth
  refresh_token covers everything.
- Gmail folded in via forwarding (arcodange@gmail.com → books@) — no Google
  API setup, no app-passwords, no second OAuth flow.
- Token-based auth, no SCA rabbit hole.

V8.0 baseline (in /Inbox/books):
- 3 candidates: Mistral AI facture, Anthropic Stripe receipt (Fwd Gmail),
  INPI payment receipt (Fwd Gmail).
- Heuristic extraction is best-effort: works on amounts/refs for some
  templates, misses others (Mistral PDF format, Stripe receipt layout).
- --save-pdf <DIR> lets the operator grab the PDFs for manual entry when
  the heuristic falls short.

Rate-limit pitfall documented: Zoho OAuth refresh has an aggressive throttle
("too many requests continuously"). The cache file at $TMPDIR/zoho-access-$USER
(mode 600, 50 min TTL) prevents this; on 401 the wrapper auto-refreshes once
and retries.

V8.1+ ideas in SKILL.md out-of-scope:
- mark ingested emails (IMAP flag or Zoho label)
- body text extraction (inline-HTML invoices)
- per-template parsers or LLM-based extraction
- IMAP fallback for non-Zoho mailboxes

CLI: bin/arcodange email {list|inspect|curl} integrated.
Base updates: dolibarr/SKILL.md cross-link, dolibarr/README.md env schema
extended with ZOHO_CLIENT_ID/SECRET/REFRESH_TOKEN/DC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 14:56:15 +02:00
a1042a483b Merge pull request 'arcodange-bank-reco V7: avoir netting + fk_account context + wire-ref matching' (#8) from claude/arcodange-bank-reco-v7 into main 2026-05-31 14:20:37 +02:00
246c7fc5a9 arcodange-bank-reco V7: avoir netting + fk_account context + wire-ref matching
Three improvements that reduce the V6.1 exit-1 signal from 10 to 1 on
the current Arcodange baseline. Every bucket now has a single, clear
purpose; the only entry counted as a failure is a genuine action item.

A. fk_account context on dolibarr-only
   - Fetches /bankaccounts and tags each dolibarr-only with the account
     ref + label (e.g. "CCA1 (G.RADUREAU Compte Courant Asso)").
   - Splits dolibarr-only into "on API-tracked accounts" (QON*/WIS* — real
     gaps) vs "not in API scope" (CCA1 / personal — expected gaps).
   - Personal-account entries no longer count toward the failure verdict.

B. Avoir-cycle netting
   - Pairs AVC entries of -X on socid S with FAC entries of +X on the
     same socid within ±5d.
   - Both surface in a dedicated AVOIR-NETTED bucket and are excluded from
     dolibarr-only, since the bank only sees the net of the cycle.
   - Resolves the V6.1 noise where AVC001-CL0001001 + FAC001-CL00001
     appeared as fake gaps for a 510€ cancel-and-reissue dance.

C. Wire-reference strong matching (--enrich flag, opt-in)
   - When --enrich is passed, bank-match.sh fetches /v1/transfers/{id}
     per Wise TRANSFER and reads the wire `reference` field.
   - References containing a FAC\d+(CL\d+)? pattern strong-match against
     the corresponding Dolibarr customer invoice (annotated [wire-ref]
     vs the loose [amt+date] kind).
   - Verified on FAC002 5100€: KM's wire memo "FOR INVOICE FAC002CL0001002"
     gives an unambiguous match independent of date drift.

Baseline (Jan-May 2026, --enrich on):
  6 matched · 1 internal · 2 avoir-netted · 7 bank-known · 1 bank-UNKNOWN
  0 dol-only-API · 7 dol-only-personal
  → exit-1 count = 1 (just the +2147€ KM Wise 2026-05-29 to record).

The CLI (bin/arcodange) gains --enrich on the match subcommand. The
SKILL.md has a new "V7 bucket structure" section explaining the seven
buckets and a before/after table showing the signal/noise improvement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 14:20:06 +02:00
0f5b6bcbad Merge pull request 'arcodange-bank-reco: known-patterns catalog + annotated bank-only buckets' (#7) from claude/arcodange-bank-reco-patterns into main 2026-05-31 14:07:39 +02:00
4b6a5f7529 arcodange-bank-reco: add known-patterns.json catalog + bank-match annotation
V6.1 follow-up to the bank-reco V6 ship. Splits the BANK-ONLY bucket into
"known patterns" (intentional gaps, documented and classified) vs
"unknown" (real action items).

What the catalog covers today:
- FOUREZ Quentin → capital_deposit (apport en capital 1000 € initial,
  notaire FOUREZ centralisateur du dépôt). Maps to Dolibarr account 1013.
- URSSAF → social_charges (account 645100)
- MISTRAL.AI, CLAUDE.AI → ai_subscription (account 6262)
- Wise *Plan, qonto_fee → bank_fee (account 627)
- BALANCE_DEPOSIT / FEATURE_CHARGE on Wise → internal_topup (self-funding
  pair, often nets to zero)

Effect on the V6 baseline (Jan-May 2026):
- Before catalog: 8 BANK-ONLY mixed entries (noise + signal)
- After catalog:  7 known + 1 UNKNOWN (just the +2147 € KM Wise payment
  2026-05-29 that genuinely needs a Dolibarr entry)

The catalog is JSON (not YAML — stdlib only, no dependency). Schema
documented in SKILL.md. Pattern matches case-insensitive regex against
both bank label AND operation type. Optional filters: bank, side,
amount_min, amount_max.

Exit code now reflects only the UNKNOWN bank-only and dolibarr-only
counts — the verdict is no longer noisy because of intentional gaps.

Edit known-patterns.json as new recurring patterns emerge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 14:06:46 +02:00
21 changed files with 1173 additions and 65 deletions

View File

@@ -124,6 +124,90 @@ Current state (V1 baseline):
- Wise **STANDARD EUR** : 5 308,25 € live
- **Total bank-side** : 9 499,79 €
## Known-patterns catalog ([known-patterns.json](known-patterns.json))
Bank movements that have no Dolibarr counterpart fall into two groups:
1. **Intentional gaps** — operational expenses or one-off events the operator knows about (URSSAF mensuel, AI subs, capital deposit, Wise plan fees). These keep recurring but their accounting treatment is well-understood.
2. **Real action items** — incoming payments not yet entered, expenses missing a supplier invoice, anomalies.
Without a catalog, both look identical in the BANK-ONLY bucket — noise drowns the signal. The catalog is an operator-curated list of patterns; `bank-match.sh` reads it and splits BANK-ONLY into two sub-buckets:
- **BANK-ONLY — known patterns** : annotated with `[classification]` + a one-line note (which Dolibarr account to use, etc.). Don't action; just verify.
- **BANK-ONLY — unknown** : the real signal. Each entry is either a missing supplier invoice, an unrecorded payment, or a new pattern to add to the catalog.
**Schema** (JSON, see [known-patterns.json](known-patterns.json) for current entries):
```json
{
"patterns": [
{
"pattern": "regex (case-insensitive, matched against bank label + operation type)",
"classification": "capital_deposit | social_charges | ai_subscription | bank_fee | internal_topup | personal_apport | needs_classification",
"bank": "qonto | wise (optional, default both)",
"side": "credit | debit (optional, default both)",
"amount_min": 0.0, "amount_max": 99999.0, // optional numeric bounds
"note": "human-readable context — what this is, which Dolibarr account, recurring schedule, etc."
}
]
}
```
**Editing workflow:**
1. Run `bin/arcodange bank match` → look at BANK-ONLY unknown.
2. For each recurring entry that's "expected", add a pattern to `known-patterns.json`.
3. Re-run match → the entry should now appear in the known sub-bucket.
4. For one-off action items (e.g. "+2147 € KM May 29 not in Dolibarr"), don't add a pattern — enter it in Dolibarr instead.
**V6.1 catalog** ships with these patterns for the current Arcodange baseline:
- `FOUREZ Quentin` → capital_deposit (initial 1000 € apport via notaire, 2026-01-21)
- `URSSAF` → social_charges
- `MISTRAL.AI` / `CLAUDE.AI` → ai_subscription
- `Wise *Plan` → bank_fee (Wise account plan billed via Qonto card)
- `qonto_fee` → bank_fee
- `BALANCE_DEPOSIT|For your account plan` → internal_topup (the Wise +50/-50 self-funding pair)
After applying the catalog to the V6 baseline, the **only remaining BANK-UNKNOWN** is the **+2147 € KissMetrics payment on 2026-05-29** that hasn't been entered in Dolibarr — the actual signal.
## V7 bucket structure
V7 adds three improvements that reshape the output buckets:
| Bucket | Meaning | Counts toward exit-1? |
|---|---|---|
| **MATCHED** | Bank ↔ Dolibarr paired. Annotated with match kind: `[wire-ref]` (strong, via `--enrich`) or `[amt+date]` (loose). | No |
| **INTERNAL** | Wise↔Qonto consolidations (5000€ moved between Arcodange's own accounts). | No |
| **AVOIR-NETTED** | Dolibarr AVC + FAC cancellation cycles paired and excluded (the bank only saw the net). | No |
| **BANK-ONLY — known patterns** | Bank movement with a `known-patterns.json` annotation. Intentional gap. | No |
| **BANK-ONLY — unknown** | Bank movement with no Dolibarr counterpart AND no catalog pattern. **Real action item**. | Yes |
| **DOLIBARR-ONLY — on API-tracked accounts** (QON*/WIS*) | Dolibarr payment that the bank should have shown. **Real gap**. | Yes |
| **DOLIBARR-ONLY — not in API scope** (CCA1 perso etc.) | Expected gap — we have no API on those accounts. | No |
Exit code 0 iff the two "real gap" buckets are empty.
### `--enrich` — wire-reference strong matching
`bank-match.sh --enrich` fetches `/v1/transfers/{id}` for each Wise TRANSFER and reads the `reference` field (the wire memo from the sender, e.g. `FROM KISSMETRICS HOLDINGS INC FOR INVOICE FAC002CL0001002/ VENDOR:DEV`). When the reference contains a `FAC\d+(CL\d+)?` pattern matching a Dolibarr customer invoice, that pairing takes precedence over the loose date+amount match. Only the strong-matched ones get `[wire-ref]`; the rest fall through to `[amt+date]`. Cost: 1 extra HTTP call per Wise transfer.
### Avoir cycle netting
When Arcodange cancels and reissues an invoice (FAC001 → AVC001 + FAC001-NEW), the bank sees one net credit but Dolibarr stores 3 payment entries. V7 pairs AVC entries of -X with FAC entries of +X for the same socid within ±5d, surfaces them in **AVOIR-NETTED**, and excludes them from `dolibarr-only`. Removes the V6.1 noise where AVC001 + FAC001-CL00001 appeared as fake gaps.
### fk_account context
`bank-match.sh` now fetches `/bankaccounts` and tags `dolibarr-only` entries with their account ref + label. Splits into API-tracked (QON*/WIS* — real gaps) vs not-in-scope (everything else — expected). The 7 CCA1 personal-account entries that used to look like failures are now correctly classified as expected gaps.
### Effect on the baseline
| | V6 | V6.1 | V7 |
|---|---|---|---|
| MATCHED | 6 (all amt+date) | 6 | 6 (1 wire-ref strong + 5 amt+date when --enrich) |
| BANK-ONLY total | 8 mixed | 7 known + 1 UNKNOWN | 7 known + 1 UNKNOWN |
| AVOIR-NETTED | — | — | 2 (silently absorbed) |
| DOL-only TRUE GAP | 9 (noisy) | 9 (noisy) | **0** |
| DOL-only EXPECTED | — | — | 7 (CCA1 personal) |
| Exit-1 signal count | 17 (noise) | 10 (less noise) | **1** (just the +2147€ KM) |
## Matching heuristic — what's in v1 and what's V7
Today's match logic:

View File

@@ -1,36 +1,50 @@
# Bank reconciliation: 2026-01-01 → 2026-05-31 (window ±7d, fees: off)
# Bank reconciliation: 2026-01-01 → 2026-05-31 (window ±7d, fees: off, enrich: on)
=== MATCHED (6 bank ↔ Dolibarr) ===
Qonto 2026-01-27 - 50.00 card Wise *Plan ↔ supplier FAF2026001 (2026-01-26, Δ-1d)
Wise 2026-02-05 + 510.00 TRANSFER Kissmetrics Holdings Inc ↔ customer FAC001-CL0001001 (2026-02-05, Δ+0d)
Wise 2026-03-06 + 5100.00 TRANSFER Kissmetrics Holdings Inc ↔ customer FAC002-CL0001002 (2026-03-12, Δ+6d)
Qonto 2026-03-13 - 612.00 transfer DARNIS OPERATIONS ↔ supplier FAF2026008 (2026-03-13, Δ+0d)
Wise 2026-04-20 + 2550.00 TRANSFER Kissmetrics Holdings Inc ↔ customer FAC003-CL0001003 (2026-04-20, Δ+0d)
Qonto 2026-05-10 - 306.00 transfer DARNIS OPERATIONS ↔ supplier FAF2026009 (2026-05-10, Δ+0d)
Qonto 2026-01-27 - 50.00 card Wise *Plan ↔[amt+date] supplier FAF2026001 (2026-01-26, Δ-1d)
Wise 2026-02-05 + 510.00 TRANSFER Kissmetrics Holdings Inc ↔[amt+date] customer FAC001-CL0001001 (2026-02-05, Δ+0d)
Wise 2026-03-06 + 5100.00 TRANSFER Kissmetrics Holdings Inc ↔[wire-ref] customer FAC002-CL0001002 (2026-03-12, Δ+6d)
Qonto 2026-03-13 - 612.00 transfer DARNIS OPERATIONS ↔[amt+date] supplier FAF2026008 (2026-03-13, Δ+0d)
Wise 2026-04-20 + 2550.00 TRANSFER Kissmetrics Holdings Inc ↔[amt+date] customer FAC003-CL0001003 (2026-04-20, Δ+0d)
Qonto 2026-05-10 - 306.00 transfer DARNIS OPERATIONS ↔[amt+date] supplier FAF2026009 (2026-05-10, Δ+0d)
=== INTERNAL (Wise↔Qonto consolidations, 1) ===
Wise 2026-03-13 - 5000.00 TRANSFER ARCODANGE ↔ Qonto 2026-03-13 +5000.00
=== BANK-ONLY (8 bank movements without Dolibarr counterpart) ===
Qonto 2026-01-16 + 5.22 qonto_fee Qonto
Qonto 2026-01-21 + 1000.00 income FOUREZ Quentin
Wise 2026-01-26 - 50.00 FEATURE_CHARGE For your account plan
Wise 2026-01-26 + 50.00 BALANCE_DEPOSIT To EUR
Qonto 2026-04-03 - 172.68 card MISTRAL.AI
Qonto 2026-04-13 - 180.00 card CLAUDE.AI SUBSCRIPTION
Qonto 2026-05-22 - 493.00 direct_debit URSSAF D ILE DE FRANCE
=== AVOIR-NETTED (2 Dolibarr entries pairing AVC↔FAC cancellation cycles) ===
customer 2026-02-05 -510.00 AVC001-CL0001001 ↔ netted against FAC001-CL00001
customer 2026-02-05 510.00 FAC001-CL00001 ↔ netted against AVC001-CL0001001
=== BANK-ONLY — known patterns (7, intentional gaps documented in known-patterns.json) ===
Qonto 2026-01-16 + 5.22 qonto_fee Qonto [bank_fee]
└─ Qonto fees ou refunds. Petites valeurs. Dolibarr: account 627.
Qonto 2026-01-21 + 1000.00 income FOUREZ Quentin [capital_deposit]
└─ Apport en capital social initial 1000 €. Maître FOUREZ Quentin, notaire centralisateur du dépôt. Date typique : 2026-01-21. Dolibarr: account 1013.
Wise 2026-01-26 - 50.00 FEATURE_CHARGE For your account plan [internal_topup]
└─ Solde Wise rechargé pour couvrir un frais immédiat (souvent net zéro avec le FEATURE_CHARGE du même jour).
Wise 2026-01-26 + 50.00 BALANCE_DEPOSIT To EUR [internal_topup]
└─ Solde Wise rechargé pour couvrir un frais immédiat (souvent net zéro avec le FEATURE_CHARGE du même jour).
Qonto 2026-04-03 - 172.68 card MISTRAL.AI [ai_subscription]
└─ Mistral AI API subscription. Récurrent mensuel. Dolibarr: account 6262 + supplier 'Mistral AI'.
Qonto 2026-04-13 - 180.00 card CLAUDE.AI SUBSCRIPTION [ai_subscription]
└─ Claude AI subscription (Anthropic). Récurrent mensuel. Dolibarr: account 6262 + supplier 'Anthropic'.
Qonto 2026-05-22 - 493.00 direct_debit URSSAF D ILE DE FRANCE [social_charges]
└─ Cotisations sociales URSSAF (régime mensuel/trimestriel). Dolibarr: account 645100 (charges de sécurité sociale).
=== BANK-ONLY — unknown (1, NEEDS attention: missing supplier invoice / unrecorded payment / new pattern) ===
Wise 2026-05-29 + 2147.00 TRANSFER Kissmetrics Holdings Inc
=== DOLIBARR-ONLY (9 Dolibarr payments without bank movement) ===
supplier 2026-01-04 1.99 FAF2026003 (fk_account=3)
supplier 2026-01-06 202.80 FAF2026005 (fk_account=3)
supplier 2026-01-09 55.93 FAF2026002 (fk_account=3)
supplier 2026-01-09 148.80 FAF2026004 (fk_account=3)
supplier 2026-01-12 8.43 FAF2026006 (fk_account=3)
supplier 2026-01-15 1.30 FAF2026002 (fk_account=3)
supplier 2026-01-17 3.20 FAF2026007 (fk_account=3)
customer 2026-02-05 -510.00 AVC001-CL0001001 (fk_account=2)
customer 2026-02-05 510.00 FAC001-CL00001 (fk_account=2)
=== DOLIBARR-ONLY — on API-tracked accounts (0, REAL GAP: bank should have shown this) ===
--------------------------------------------------------------------------------
# 6 matched, 1 internal, 8 bank-only, 9 dolibarr-only
=== DOLIBARR-ONLY — on accounts NOT in API scope (7, expected gap: CCA1 perso etc.) ===
supplier 2026-01-04 1.99 FAF2026003 (CCA1 (G.RADUREAU Compte Courant Asso))
supplier 2026-01-06 202.80 FAF2026005 (CCA1 (G.RADUREAU Compte Courant Asso))
supplier 2026-01-09 55.93 FAF2026002 (CCA1 (G.RADUREAU Compte Courant Asso))
supplier 2026-01-09 148.80 FAF2026004 (CCA1 (G.RADUREAU Compte Courant Asso))
supplier 2026-01-12 8.43 FAF2026006 (CCA1 (G.RADUREAU Compte Courant Asso))
supplier 2026-01-15 1.30 FAF2026002 (CCA1 (G.RADUREAU Compte Courant Asso))
supplier 2026-01-17 3.20 FAF2026007 (CCA1 (G.RADUREAU Compte Courant Asso))
----------------------------------------------------------------------------------------------------
# 6 matched, 1 internal, 2 avoir-netted, 7 bank-known, 1 bank-UNKNOWN, 0 dol-only-API, 7 dol-only-personal
# patterns loaded from /Users/gabrielradureau/Work/Arcodange/erp/.claude/worktrees/happy-wilson-ee5645/.claude/skills/arcodange-bank-reco/scripts/../known-patterns.json: 7 pattern(s)

View File

@@ -0,0 +1,60 @@
{
"_schema": "v1",
"_description": "Operator-curated catalogue of known recurring/intentional bank movements. Used by bank-match.sh to annotate the BANK-ONLY bucket so the operator can immediately tell 'needs Dolibarr entry' from 'documented intentional gap'. Edit this file as new recurring patterns emerge.",
"_match_rules": "Pattern matched case-insensitively as a regex against the bank label. Optional filters: bank (qonto|wise), side (credit|debit), amount_min, amount_max, type (Wise activity type). All present filters must match.",
"_classifications": {
"capital_deposit": "Apport en capital social. Dolibarr account 1013 (capital souscrit appelé versé).",
"social_charges": "URSSAF, retraite complémentaire, etc. Dolibarr account 645x.",
"ai_subscription": "Claude / Mistral / OpenAI / similar. Dolibarr account 6262 (frais télécom / abonnements logiciels).",
"bank_fee": "Plan bancaire, frais d'opération, refunds. Dolibarr account 627 (services bancaires).",
"internal_topup": "Solde Wise/Qonto rechargé pour couvrir un frais immédiat. Often nets out.",
"personal_apport": "Apport en compte courant d'associé (Gabriel finançant Arcodange depuis son perso). Dolibarr account 4551.",
"needs_classification": "Pattern catched but no Dolibarr account assignment defined yet; surface for review."
},
"patterns": [
{
"pattern": "FOUREZ.*Quentin",
"classification": "capital_deposit",
"bank": "qonto",
"side": "credit",
"note": "Apport en capital social initial 1000 €. Maître FOUREZ Quentin, notaire centralisateur du dépôt. Date typique : 2026-01-21. Dolibarr: account 1013."
},
{
"pattern": "URSSAF",
"classification": "social_charges",
"bank": "qonto",
"side": "debit",
"note": "Cotisations sociales URSSAF (régime mensuel/trimestriel). Dolibarr: account 645100 (charges de sécurité sociale)."
},
{
"pattern": "MISTRAL\\.AI",
"classification": "ai_subscription",
"side": "debit",
"note": "Mistral AI API subscription. Récurrent mensuel. Dolibarr: account 6262 + supplier 'Mistral AI'."
},
{
"pattern": "CLAUDE\\.AI",
"classification": "ai_subscription",
"side": "debit",
"note": "Claude AI subscription (Anthropic). Récurrent mensuel. Dolibarr: account 6262 + supplier 'Anthropic'."
},
{
"pattern": "Wise.*Plan",
"classification": "bank_fee",
"side": "debit",
"note": "Wise account plan billed via card. Wise's internal fee for keeping the BUSINESS profile active."
},
{
"pattern": "qonto_fee",
"classification": "bank_fee",
"bank": "qonto",
"note": "Qonto fees ou refunds. Petites valeurs. Dolibarr: account 627."
},
{
"pattern": "BALANCE_DEPOSIT|For your account plan",
"classification": "internal_topup",
"bank": "wise",
"note": "Solde Wise rechargé pour couvrir un frais immédiat (souvent net zéro avec le FEATURE_CHARGE du même jour)."
}
]
}

View File

@@ -24,7 +24,7 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
BANK_CURL="${SCRIPT_DIR}/bank-curl.sh"
DOL_CURL="${SCRIPT_DIR}/../../dolibarr/scripts/dol-curl.sh"
SINCE=""; UNTIL=""; MONTH=""; WINDOW=7; INCLUDE_FEES=0
SINCE=""; UNTIL=""; MONTH=""; WINDOW=7; INCLUDE_FEES=0; ENRICH=0
while [[ $# -gt 0 ]]; do
case "$1" in
--since) SINCE="$2"; shift 2 ;;
@@ -32,6 +32,7 @@ while [[ $# -gt 0 ]]; do
--month) MONTH="$2"; shift 2 ;;
--window-days) WINDOW="$2"; shift 2 ;;
--include-fees) INCLUDE_FEES=1; shift ;;
--enrich) ENRICH=1; shift ;;
-h|--help) sed -n '2,18p' "$0" | sed 's/^# \{0,1\}//'; exit 0 ;;
*) echo "bank-match.sh: unknown arg: $1" >&2; exit 2 ;;
esac
@@ -66,9 +67,10 @@ QURL="/v2/transactions?bank_account_id=${QONTO_ACCT}&settled_at_from=${SINCE}T00
# --- 2. Pull Wise activities ---
"${BANK_CURL}" wise "/v1/profiles/${WISE_PROFILE_ID}/activities?size=100&since=${SINCE}T00:00:00.000Z&until=${UNTIL}T23:59:59.999Z" > "${WORK}/wise.json"
# --- 3. Pull Dolibarr customer + supplier invoices and their payments ---
# --- 3. Pull Dolibarr customer + supplier invoices, payments, and bank accounts ---
"${DOL_CURL}" '/invoices?limit=500&sortfield=t.datef&sortorder=ASC' > "${WORK}/dol_inv.json"
"${DOL_CURL}" '/supplierinvoices?limit=500' > "${WORK}/dol_sup.json"
"${DOL_CURL}" '/bankaccounts' > "${WORK}/dol_acct.json"
mkdir -p "${WORK}/dol_pay" "${WORK}/dol_supay"
for id in $(python3 -c "import json,sys; print(' '.join(str(r['id']) for r in json.load(open(sys.argv[1])) if r.get('id')))" "${WORK}/dol_inv.json"); do
@@ -78,11 +80,26 @@ for id in $(python3 -c "import json,sys; print(' '.join(str(r['id']) for r in js
"${DOL_CURL}" "/supplierinvoices/${id}/payments" > "${WORK}/dol_supay/${id}.json" 2>/dev/null || echo "[]" > "${WORK}/dol_supay/${id}.json"
done
# --- 3b. Optional: enrich Wise TRANSFER activities with wire references ---
if [[ "${ENRICH}" == "1" ]]; then
mkdir -p "${WORK}/wise_refs"
for tid in $(python3 -c "
import json, sys
acts = json.load(open(sys.argv[1])).get('activities') or []
for a in acts:
r = a.get('resource') or {}
if r.get('type')=='TRANSFER' and r.get('id'): print(r['id'])
" "${WORK}/wise.json"); do
"${BANK_CURL}" wise "/v1/transfers/${tid}" > "${WORK}/wise_refs/${tid}.json" 2>/dev/null || true
done
fi
# --- 4. Match in python ---
python3 - "${WORK}" "${SINCE}" "${UNTIL}" "${WINDOW}" "${INCLUDE_FEES}" <<'PY'
PATTERNS_FILE="${SCRIPT_DIR}/../known-patterns.json"
python3 - "${WORK}" "${SINCE}" "${UNTIL}" "${WINDOW}" "${INCLUDE_FEES}" "${PATTERNS_FILE}" "${ENRICH}" <<'PY'
import json, sys, os, re, datetime, collections
work, since, until, window_days, include_fees = sys.argv[1:6]
window = int(window_days); include_fees = include_fees == "1"
work, since, until, window_days, include_fees, patterns_file, enrich = sys.argv[1:8]
window = int(window_days); include_fees = include_fees == "1"; enrich = enrich == "1"
since_d = datetime.date.fromisoformat(since); until_d = datetime.date.fromisoformat(until)
def strip(s): return re.sub(r'<[^>]+>', '', s or '').strip()
@@ -110,7 +127,22 @@ for a in (json.load(open(os.path.join(work,"wise.json"))).get("activities") or [
m = re.search(r'([\d,.]+)\s*([A-Z]{3})', pa)
amt = float(m.group(1).replace(",", "")) if m else 0.0
title = strip(a.get("title") or "")[:40]
wise_movs.append({"bank":"Wise", "date":dt, "sign":sign, "amount":amt, "label":title, "op":typ, "matched_dol":None, "matched_internal":False})
res = a.get("resource") or {}
resource_id = str(res.get("id")) if res.get("type") == "TRANSFER" else None
wise_movs.append({"bank":"Wise", "date":dt, "sign":sign, "amount":amt, "label":title, "op":typ, "matched_dol":None, "matched_internal":False, "wise_resource_id":resource_id, "wire_ref":""})
# 4b'. If --enrich, load per-transfer wire references and attach to Wise movs
if enrich:
ref_dir = os.path.join(work, "wise_refs")
if os.path.isdir(ref_dir):
for m in wise_movs:
if not m["wise_resource_id"]: continue
p = os.path.join(ref_dir, f"{m['wise_resource_id']}.json")
if not os.path.isfile(p): continue
try:
t = json.load(open(p))
m["wire_ref"] = (t.get("reference") or "")
except Exception: pass
bank_movs = qonto_movs + wise_movs
@@ -121,7 +153,7 @@ for w in [m for m in bank_movs if m["bank"]=="Wise" and m["sign"]=="-"]:
w["matched_internal"] = q; q["matched_internal"] = w
break
# 4d. Normalize Dolibarr payments
# 4d. Normalize Dolibarr payments — carry socid too for avoir netting
dol_pays = []
inv_by_id = {str(r["id"]): r for r in json.load(open(os.path.join(work,"dol_inv.json")))}
for fn in os.listdir(os.path.join(work,"dol_pay")):
@@ -131,7 +163,9 @@ for fn in os.listdir(os.path.join(work,"dol_pay")):
d = datetime.datetime.strptime(p["date"], "%Y-%m-%d %H:%M:%S").date()
if d < since_d or d > until_d: continue
amt = float(p.get("amount") or 0)
dol_pays.append({"side":"customer", "ref":inv["ref"], "date":d, "amount":amt, "fk_account":inv.get("fk_account"), "matched_bank":None})
dol_pays.append({"side":"customer", "ref":inv["ref"], "date":d, "amount":amt,
"fk_account":inv.get("fk_account"), "socid":inv.get("socid"),
"matched_bank":None, "netted_against":None})
sup_by_id = {str(r["id"]): r for r in json.load(open(os.path.join(work,"dol_sup.json")))}
for fn in os.listdir(os.path.join(work,"dol_supay")):
@@ -141,37 +175,136 @@ for fn in os.listdir(os.path.join(work,"dol_supay")):
d = datetime.datetime.strptime(p["date"], "%Y-%m-%d %H:%M:%S").date()
if d < since_d or d > until_d: continue
amt = float(p.get("amount") or 0)
dol_pays.append({"side":"supplier", "ref":sup["ref"], "date":d, "amount":amt, "fk_account":sup.get("fk_account"), "matched_bank":None})
dol_pays.append({"side":"supplier", "ref":sup["ref"], "date":d, "amount":amt,
"fk_account":sup.get("fk_account"), "socid":sup.get("socid"),
"matched_bank":None, "netted_against":None})
# 4e. Match: each bank movement (non-internal) tries to find a Dolibarr counterpart
for m in [x for x in bank_movs if not x["matched_internal"]]:
bank_signed = m["amount"] if m["sign"]=="+" else -m["amount"]
# For customer payments (Dol records them as positive amounts): +bank credit matches +dol customer payment
# For supplier payments: -bank debit matches +dol supplier payment (positive in Dol since it's the amount paid out)
# Heuristic: match abs(amount) within 0.01 and date within window.
candidates = [p for p in dol_pays if p["matched_bank"] is None and abs(p["amount"] - m["amount"]) < 0.01 and abs((p["date"] - m["date"]).days) <= window]
# 4d.1. AVOIR cycle netting: an AVC (credit note) for -X on socid S cancels out
# a FAC for +X on the same socid, within a small date window. Bank sees the NET
# of the cycle (typically +X for the reissued FAC with the new ref scheme).
# Pair an AVC with a FAC of opposite sign + equal abs(amount) + same socid +
# within ±5d. Mark both as "netted" so they're excluded from matching and
# excluded from the dolibarr-only failure count.
avcs = [p for p in dol_pays if p["side"]=="customer" and p["ref"].startswith("AVC") and p["amount"] < 0]
for avc in avcs:
candidates = [p for p in dol_pays
if p is not avc
and p["side"]=="customer"
and p["socid"] == avc["socid"]
and abs(p["amount"] + avc["amount"]) < 0.01 # opposite signs equal magnitude
and abs((p["date"] - avc["date"]).days) <= 5
and p["netted_against"] is None
and p["matched_bank"] is None]
if candidates:
# Prefer the OLDEST (the original cancelled FAC), not the reissue.
# Heuristic: refs with shorter / older numbering scheme. If multiple,
# pick smallest date delta.
candidates.sort(key=lambda p: (abs((p["date"] - avc["date"]).days), p["ref"]))
partner = candidates[0]
avc["netted_against"] = partner["ref"]
partner["netted_against"] = avc["ref"]
# 4e. Match — two-pass:
# PASS 1 (strong) : Wise transfers with an --enrich'd wire reference containing
# a "FAC***" pattern try to match the Dolibarr invoice with
# that exact ref. This is the highest-confidence match.
# PASS 2 (loose) : remaining bank movements use the date+amount heuristic.
# Netted Dolibarr entries (avoir cycle) are excluded from both passes.
# Build customer ref -> dol payment index (only un-netted, un-matched entries)
ref_index = collections.defaultdict(list)
for p in dol_pays:
if p["matched_bank"] is None and p["netted_against"] is None:
# Strip trailing dash/suffix variants — FAC002CL0001002 vs FAC002-CL0001002 are equivalent
normalized = re.sub(r'[^A-Z0-9]', '', p["ref"].upper())
ref_index[normalized].append(p)
# Pass 1: strong match on wire references
for m in [x for x in bank_movs if not x["matched_internal"] and x.get("wire_ref")]:
refs_in_wire = re.findall(r'FAC\d+(?:CL\d+)?', (m["wire_ref"] or "").upper().replace("-",""))
for r in refs_in_wire:
if r in ref_index and ref_index[r]:
p = ref_index[r].pop(0)
m["matched_dol"] = p; m["match_kind"] = "wire-ref"
p["matched_bank"] = m
break
# Pass 2: loose date+amount match for remaining bank movements
for m in [x for x in bank_movs if not x["matched_internal"] and not x["matched_dol"]]:
candidates = [p for p in dol_pays
if p["matched_bank"] is None and p["netted_against"] is None
and abs(p["amount"] - m["amount"]) < 0.01
and abs((p["date"] - m["date"]).days) <= window]
if candidates:
# Pick smallest date delta
candidates.sort(key=lambda p: abs((p["date"] - m["date"]).days))
p = candidates[0]
m["matched_dol"] = p; p["matched_bank"] = m
m["matched_dol"] = p; m["match_kind"] = "amt+date"
p["matched_bank"] = m
# 4f. Annotate non-matched movements with known-patterns catalog
patterns = []
if os.path.isfile(patterns_file):
try: patterns = json.load(open(patterns_file)).get("patterns", [])
except Exception as e: print(f" /!\\ failed to load {patterns_file}: {e}", file=sys.stderr)
def match_pattern(mov):
# Match against both the bank label AND the operation type — different
# banks surface useful info in different fields (Qonto puts the operation
# type in `op`, e.g. "qonto_fee"; Wise puts the activity type in `op`,
# e.g. "BALANCE_DEPOSIT", and the human title in `label`).
haystack = (mov.get("label") or "") + " | " + (mov.get("op") or "")
for pat in patterns:
if pat.get("bank") and pat["bank"] != mov["bank"].lower(): continue
if pat.get("side") and pat["side"] != ("credit" if mov["sign"]=="+" else "debit"): continue
amin = pat.get("amount_min"); amax = pat.get("amount_max")
if amin is not None and mov["amount"] < amin: continue
if amax is not None and mov["amount"] > amax: continue
if re.search(pat["pattern"], haystack, re.IGNORECASE):
return pat
return None
for m in bank_movs:
if m["matched_dol"] or m["matched_internal"]: continue
m["known"] = match_pattern(m)
# --- 5. Render ---
# Load Dolibarr bank accounts (for fk_account context on dolibarr-only)
dol_accts = {}
try:
for a in json.load(open(os.path.join(work, "dol_acct.json"))):
dol_accts[str(a["id"])] = {"ref": a.get("ref","-"), "label": a.get("label","-"), "country": a.get("country_code","")}
except Exception: pass
# Heuristic: which Dolibarr accounts are NOT covered by Qonto/Wise API today?
# Convention: CCA = Compte Courant d'Associé (personal). Anything not QON*/WIS*
# is treated as "API-invisible" and tagged as such.
def account_kind(fk_account):
if not fk_account: return ("unknown", "fk_account=None")
a = dol_accts.get(str(fk_account))
if not a: return ("unknown", f"fk_account={fk_account} (not in /bankaccounts)")
ref = (a["ref"] or "").upper()
if ref.startswith(("QON", "WIS")):
return ("api_tracked", f"{a['ref']} ({a['label']})")
return ("personal_or_other", f"{a['ref']} ({a['label']})")
def fmt_bank(m):
return f" {m['bank']:<5} {m['date']} {m['sign']:<2}{m['amount']:>9.2f} {m['op'][:18]:<18} {m['label']}"
print(f"# Bank reconciliation: {since} → {until} (window ±{window}d, fees: {'on' if include_fees else 'off'})")
print(f"# Bank reconciliation: {since} → {until} (window ±{window}d, fees: {'on' if include_fees else 'off'}, enrich: {'on' if enrich else 'off'})")
print()
matched = [m for m in bank_movs if m["matched_dol"]]
internal = [m for m in bank_movs if m["matched_internal"] and m["sign"]=="-"]
bank_only = [m for m in bank_movs if not m["matched_dol"] and not m["matched_internal"]]
dol_only = [p for p in dol_pays if p["matched_bank"] is None]
netted_dol_pairs = [p for p in dol_pays if p["netted_against"]]
dol_only = [p for p in dol_pays if p["matched_bank"] is None and p["netted_against"] is None]
print(f"=== MATCHED ({len(matched)} bank ↔ Dolibarr) ===")
for m in sorted(matched, key=lambda m: m["date"]):
p = m["matched_dol"]
delta = (p["date"] - m["date"]).days
print(fmt_bank(m) + f" ↔ {p['side']:<8} {p['ref']:<24} ({p['date']}, Δ{delta:+d}d)")
kind = m.get("match_kind", "?")
print(fmt_bank(m) + f" ↔[{kind}] {p['side']:<8} {p['ref']:<24} ({p['date']}, Δ{delta:+d}d)")
print()
print(f"=== INTERNAL (Wise↔Qonto consolidations, {len(internal)}) ===")
@@ -180,19 +313,51 @@ for m in sorted(internal, key=lambda m: m["date"]):
print(fmt_bank(m) + f" ↔ {other['bank']} {other['date']} {other['sign']}{other['amount']:.2f}")
print()
print(f"=== BANK-ONLY ({len(bank_only)} bank movements without Dolibarr counterpart) ===")
for m in sorted(bank_only, key=lambda m: m["date"]):
# Avoir cycles netted out (informational; bank correctly sees only the net)
if netted_dol_pairs:
print(f"=== AVOIR-NETTED ({len(netted_dol_pairs)} Dolibarr entries pairing AVC↔FAC cancellation cycles) ===")
for p in sorted(netted_dol_pairs, key=lambda p: (p["date"], p["ref"])):
print(f" {p['side']:<8} {p['date']} {p['amount']:>9.2f} {p['ref']:<24} ↔ netted against {p['netted_against']}")
print()
bank_known = [m for m in bank_only if m.get("known")]
bank_unknown = [m for m in bank_only if not m.get("known")]
print(f"=== BANK-ONLY — known patterns ({len(bank_known)}, intentional gaps documented in known-patterns.json) ===")
for m in sorted(bank_known, key=lambda m: m["date"]):
k = m["known"]
cls = k.get("classification","?")
print(fmt_bank(m) + f" [{cls}]")
print(f" └─ {k.get('note','')}")
print()
print(f"=== BANK-ONLY — unknown ({len(bank_unknown)}, NEEDS attention: missing supplier invoice / unrecorded payment / new pattern) ===")
for m in sorted(bank_unknown, key=lambda m: m["date"]):
print(fmt_bank(m))
print()
print(f"=== DOLIBARR-ONLY ({len(dol_only)} Dolibarr payments without bank movement) ===")
for p in sorted(dol_only, key=lambda p: p["date"]):
print(f" {p['side']:<8} {p['date']} {p['amount']:>9.2f} {p['ref']} (fk_account={p['fk_account']})")
# Split dolibarr-only by whether the fk_account is API-tracked (real gap)
# or personal_or_other (expected gap — we have no API on those accounts)
dol_only_api = [p for p in dol_only if account_kind(p["fk_account"])[0] == "api_tracked"]
dol_only_personal = [p for p in dol_only if account_kind(p["fk_account"])[0] != "api_tracked"]
print(f"=== DOLIBARR-ONLY — on API-tracked accounts ({len(dol_only_api)}, REAL GAP: bank should have shown this) ===")
for p in sorted(dol_only_api, key=lambda p: p["date"]):
_, ctx = account_kind(p["fk_account"])
print(f" {p['side']:<8} {p['date']} {p['amount']:>9.2f} {p['ref']:<24} ({ctx})")
print()
# Verdict
fails = len(bank_only) + len(dol_only)
print("-" * 80)
print(f"# {len(matched)} matched, {len(internal)} internal, {len(bank_only)} bank-only, {len(dol_only)} dolibarr-only")
print(f"=== DOLIBARR-ONLY — on accounts NOT in API scope ({len(dol_only_personal)}, expected gap: CCA1 perso etc.) ===")
for p in sorted(dol_only_personal, key=lambda p: p["date"]):
_, ctx = account_kind(p["fk_account"])
print(f" {p['side']:<8} {p['date']} {p['amount']:>9.2f} {p['ref']:<24} ({ctx})")
print()
# Verdict: only UNKNOWN bank-only AND dol-only-on-API-tracked count as failures.
# Avoir-netted pairs and personal-account dolibarr entries are intentional/expected.
fails = len(bank_unknown) + len(dol_only_api)
print("-" * 100)
print(f"# {len(matched)} matched, {len(internal)} internal, {len(netted_dol_pairs)} avoir-netted, {len(bank_known)} bank-known, {len(bank_unknown)} bank-UNKNOWN, {len(dol_only_api)} dol-only-API, {len(dol_only_personal)} dol-only-personal")
print(f"# patterns loaded from {patterns_file}: {len(patterns)} pattern(s)")
sys.exit(0 if fails == 0 else 1)
PY

View File

@@ -0,0 +1,113 @@
---
name: arcodange-email-ingest
description: Scrape supplier-invoice emails from the Arcodange Zoho mailbox (`gabrielradureau@arcodange.fr` + its `books@arcodange.fr` alias + forwarded Gmail) via the Zoho Mail OAuth API, list candidates matching supplier patterns, download PDF attachments, run pdftotext + heuristic extract, and emit Dolibarr-ready supplier-invoice draft JSON for the operator to paste into the Dolibarr UI. Two workflows — (1) list candidates in a folder (default `/Inbox/books` where the alias auto-routes mail); (2) inspect one message by id, download + parse PDFs, propose draft entries. Surfaces concrete data: supplier name guess (first PDF line), invoice ref, invoice date, total HT/TVA/TTC, VAT rate. Read-only at every layer (Zoho scopes are READ-only; no write to Dolibarr). Use when the user asks "list pending supplier invoices in mail", "ingest invoices from email", "draft Dolibarr entry from this email", "audit cohort supplier docs from mail". Depends on `dolibarr` for the shared `.env`. SKIP for write-side Dolibarr operations (V9 candidate), for non-Zoho mailboxes (use IMAP fallback in a future skill if needed), and for attachments that aren't PDFs (only PDF text extraction is wired today).
requires:
bins: ["curl", "jq", "python3", "pdftotext"]
auth: true
---
# arcodange-email-ingest — supplier-invoice emails → Dolibarr draft
Close the inbound side of the accounting loop: bills land in `books@arcodange.fr`, this skill turns them into Dolibarr-ready draft entries for the operator to validate + create.
Depends on the [dolibarr](../dolibarr/SKILL.md) base skill (shared `.env`).
**CLI shortcuts:** `bin/arcodange email list | inspect | curl`
## Architecture choice — Zoho API, not IMAP
We chose the Zoho Mail OAuth API over IMAP because:
- **Richer metadata** — folder paths, attachment IDs, search operators, threads.
- **One account covers everything** — `books@arcodange.fr` is an alias of `gabrielradureau@arcodange.fr`. One refresh_token + the `/accounts` endpoint exposes both, plus all the other aliases (`contact@`, `bonjour@`, etc.).
- **Gmail folded in via forwarding** — `arcodange@gmail.com` forwards incoming to `books@` (configured in Gmail UI). No Google API setup, no app-passwords, no second OAuth flow.
- **Token-only auth** — no app-password fragility, no SCA dance (unlike Wise).
The single canonical inbox path: **`/Inbox/books`** — Zoho's auto-filter routes incoming mail to the `books@` alias into this sub-folder. Scan it first; widen with `--all-folders` only if needed.
## Prerequisites
1. Base skill set up ([dolibarr/README.md](../dolibarr/README.md)).
2. Zoho OAuth Self-Client created and a refresh_token generated. The `.env` extension:
```
ZOHO_CLIENT_ID=<from api-console.zoho.com self-client>
ZOHO_CLIENT_SECRET=<same>
ZOHO_REFRESH_TOKEN=<exchanged from one-time code>
ZOHO_DC=eu # eu | com | in | au
```
Setup walkthrough is in the V8 prep section of the cohort review notes.
3. Gmail forwarding to `books@arcodange.fr` enabled (Gmail Settings → Forwarding and POP/IMAP).
4. `pdftotext` (`brew install poppler` on macOS).
## Workflows
### 1. List candidates
```bash
bin/arcodange email list # default: /Inbox/books, last 30 msgs, no filter
bin/arcodange email list --candidates-only # filter to subjects/attachments matching supplier patterns
bin/arcodange email list --folder /Inbox/contact --limit 50
bin/arcodange email list --all-folders --candidates-only # scan everything (slower, more API calls)
```
Captured at [examples/email-list.txt](examples/email-list.txt). The candidate filter matches subjects against `facture|invoice|receipt|reçu|payment|paiement|abonnement|subscription|order|commande|bill` OR any message with an attachment.
**Hard exclusions** (V8.1) — applied before the candidate test, regardless of attachments:
- Subjects starting with `Invitation:` / `Updated invitation:` / `Canceled event:` / `Accepted:` / `Declined:` / `Tentative:` / `Maybe:` (after stripping `Re:` / `Fwd:` / `Tr:` prefixes) → filters calendar events that always carry an `.ics` attachment.
- Senders matching newsletter/marketing patterns (`updates.<domain>`, `noreply@*calendar*`, `news@`, `newsletter@`, etc.).
The `[*]` column marks candidates, `[Y]` marks emails with attachments. Compared to V8.0, V8.1 cuts the `--all-folders --candidates-only` baseline from ~27 noisy entries down to ~12 actionable ones.
### 2. Inspect one email + draft Dolibarr entry
```bash
bin/arcodange email inspect 1775141901205014300
bin/arcodange email inspect 1775141901205014300 --folder /Inbox/books # default
bin/arcodange email inspect 1775141901205014300 --save-pdf ~/Documents/factures-2026-Q2/
bin/arcodange email inspect 1775141901205014300 --json # machine-readable
```
The script:
1. Fetches the email metadata (subject / from / date) via `/messages/view`.
2. Lists attachments via `/messages/{mid}/attachmentinfo`.
3. Downloads each attachment via `/messages/{mid}/attachments/{aid}`.
4. For each `.pdf`, runs `pdftotext -layout`, applies regex heuristics to extract:
- Supplier name guess (first non-empty PDF line — often the supplier letterhead).
- Invoice reference (`facture/invoice n° XXX`).
- Invoice date.
- Total HT / TVA / TTC + VAT rate %.
5. Emits a draft JSON record per attachment — paste into the Dolibarr UI manually.
Heuristics are intentionally conservative (regex-based, no LLM dependency). For PDF templates where the heuristic fails, the raw `pdftotext` output is on disk in the work dir; rerun with `--save-pdf` to grab the PDF for manual entry.
Captured at [examples/email-inspect.txt](examples/email-inspect.txt) for the V8 baseline (Mistral AI receipt).
## What it doesn't do (V8.0 scope)
- **Does not write to Dolibarr.** The supplier invoice is still created manually in the Dolibarr UI from the draft JSON. V9 candidate: automate via `/supplierinvoices` POST.
- **Does not mark emails as ingested.** Each run re-emits the same candidates. Implementing this requires extending the OAuth scope: the current refresh_token only has READ scopes (`ZohoMail.messages.READ` etc.). The flag-set endpoint (`PUT /api/accounts/{aid}/updatemessage`) requires `ZohoMail.messages.UPDATE`, which would force the user to regenerate the refresh_token. **V8.2 candidate** — once the user opts in to the wider scope, `--mark-ingested` becomes a one-line flag on `email-inspect.sh` and `is_candidate()` in `email-list.sh` learns to skip messages with `flagid == flag_info`.
- **No body extraction yet.** We only parse PDF attachments. Inline-HTML invoices (rare — most suppliers send PDFs) would need body fetch via `/content`.
- **Heuristic extraction is best-effort.** Different supplier PDF templates yield different field-extraction reliability. The draft JSON is a starting point, not ground truth.
## Token cache
`zoho-curl.sh` caches the OAuth access_token in `$TMPDIR/zoho-access-$USER` (mode 600, TTL 50 min). Avoids hitting Zoho's OAuth refresh rate-limit on every invocation. On 401, the wrapper auto-refreshes once and retries.
## API endpoints used (Zoho Mail)
| Endpoint | Purpose |
|---|---|
| `POST /oauth/v2/token` (accounts.zoho.{dc}) | Refresh access_token from refresh_token |
| `GET /accounts` | Discover accountId + aliases on the account |
| `GET /accounts/{aid}/folders` | List folders (with paths like `/Inbox/books`) |
| `GET /accounts/{aid}/messages/view?folderId=&limit=&start=` | List messages in a folder |
| `GET /accounts/{aid}/folders/{fid}/messages/{mid}/attachmentinfo` | List attachments metadata |
| `GET /accounts/{aid}/folders/{fid}/messages/{mid}/attachments/{aid}` | Download attachment bytes |
## Out of scope
- **Writing to Dolibarr** (V9 candidate — would lift the read-only constraint on the API key, or use a separate write-scoped key).
- **Marking ingested emails** (V8.1 trivial follow-up).
- **Non-PDF attachments** (heuristics are PDF-specific).
- **Body-text extraction** (would need `/content` endpoint, deferred).
- **IMAP fallback** for non-Zoho mailboxes (deferred — Gmail forwarding to books@ covers the only known external mailbox today).
- **LLM-based extraction** (deferred — regex covers the current set of supplier templates well enough).

View File

@@ -0,0 +1,31 @@
================================================================================
Email 1775141901205014300
================================================================================
subject : Votre facture nº MSTRL-API-814045-001 de Mistral AI SAS
from : no-reply@mistral.ai
date : 2026-04-02
attached : True
-- Attachment 1: invoice-MSTRL-API-814045-001.pdf (74377 bytes, 1771 chars extracted) --
pdf_top_line = 'Facture'
invoice_ref = 'API-814045-001'
invoice_date_raw = None
total_ht = None
total_tva = None
total_ttc = None
vat_rate_pct = '20.0'
Suggested Dolibarr supplier-invoice draft entries:
[
{
"supplier_hint": "Facture",
"invoice_ref": "API-814045-001",
"invoice_date": null,
"total_ht": null,
"total_tva": null,
"total_ttc": null,
"vat_rate_pct": "20.0",
"source_email": "1775141901205014300",
"source_attachment": "invoice-MSTRL-API-814045-001.pdf"
}
]

View File

@@ -0,0 +1,16 @@
date cand att messageId folder from subject
----------------------------------------------------------------------------------------------------------------------------------
2026-05-20 [*] [Y] 1779312401677014300 /clients/KissMetrics rsirvent@digitalocean.com Re: VM not running despite status=active, after volume
2026-05-20 [*] [Y] 1779298419301014300 /clients/KissMetrics tdziuba@kissmetrics.io Re: VM not running despite status=active, after volume
2026-05-20 [*] [Y] 1779285954272004400 /clients/KissMetrics tdziuba@kissmetrics.io Re: VM not running despite status=active, after volume
2026-05-05 [*] [ ] 1777970798248014300 /Inbox/abonnements freemobile@free-mobile.fr Votre facture mobile Free est disponible
2026-04-21 [*] [Y] 1776785469477004300 /Notification noreply@hiway.fr Darnis Operations - Facture F1042
2026-04-12 [*] [Y] 1776017238960014300 /Inbox/books arcodange@gmail.com Fwd: Your receipt from Anthropic Ireland, Limited #2109
2026-04-04 [*] [ ] 1775264759983014300 /Inbox/abonnements freemobile@free-mobile.fr Votre facture mobile Free est disponible
2026-04-02 [*] [Y] 1775141901205014300 /Inbox/books no-reply@mistral.ai Votre facture nº MSTRL-API-814045-001 de Mistral AI SAS
2026-03-05 [*] [ ] 1772689535069004400 /Inbox/helloworld freemobile@free-mobile.fr Votre facture mobile Free est disponible
2026-02-08 [*] [Y] 1770582421208004400 /Inbox/bureaux ne-pas-repondre@portailpro.gouv.fr Valider votre espace personnel sur Portailpro.gouv
2026-01-09 [*] [ ] 1767989744791004400 /Inbox/books gabrielradureau@gmail.com Fwd: INPI - Votre paiement pour la commande Réf. 181876
2026-01-06 [*] [Y] 1767710535894005600 /Inbox gabrielradureau@gmail.com Statuts
----------------------------------------------------------------------------------------------------------------------------------
# 12 message(s) (candidates only)

View File

@@ -0,0 +1,7 @@
date cand att messageId folder from subject
----------------------------------------------------------------------------------------------------------------------------------
2026-04-12 [*] [Y] 1776017238960014300 /Inbox/books arcodange@gmail.com Fwd: Your receipt from Anthropic Ireland, Limited #2109
2026-04-02 [*] [Y] 1775141901205014300 /Inbox/books no-reply@mistral.ai Votre facture nº MSTRL-API-814045-001 de Mistral AI SAS
2026-01-09 [*] [ ] 1767989744791004400 /Inbox/books gabrielradureau@gmail.com Fwd: INPI - Votre paiement pour la commande Réf. 181876
----------------------------------------------------------------------------------------------------------------------------------
# 3 message(s) (candidates only)

View File

@@ -0,0 +1,256 @@
#!/usr/bin/env bash
# Inspect one email by id and propose a Dolibarr supplier-invoice draft.
#
# Usage:
# email-inspect.sh <messageId> [--folder PATH] # default folder: /Inbox/books
# [--save-pdf DIR] # save PDF attachments under DIR/
# [--json] # emit a single JSON object on stdout
#
# Pipeline (read-only):
# 1. Find the message (in the given folder, default /Inbox/books).
# 2. List attachments via /attachmentinfo.
# 3. For each PDF attachment: download, run pdftotext, extract supplier-side
# heuristics (name, totals, dates, ref).
# 4. Emit a draft "Dolibarr-ready" record per attachment so the operator can
# hand-create the supplier invoice in the Dolibarr UI.
#
# This skill DOES NOT write to Dolibarr. Auto-creation of supplier invoices is
# V9 candidate.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
ZOHO_CURL="${SCRIPT_DIR}/zoho-curl.sh"
if [[ $# -lt 1 ]]; then
echo "email-inspect.sh: missing <messageId>" >&2
echo " Hint: bin/arcodange email list to see candidate ids." >&2
exit 2
fi
MID="$1"; shift || true
FOLDER="/Inbox/books"; SAVE_PDF_DIR=""; FMT="text"
while [[ $# -gt 0 ]]; do
case "$1" in
--folder) FOLDER="$2"; shift 2 ;;
--save-pdf) SAVE_PDF_DIR="$2"; shift 2 ;;
--json) FMT="json"; shift ;;
-h|--help) sed -n '2,18p' "$0" | sed 's/^# \{0,1\}//'; exit 0 ;;
*) echo "email-inspect.sh: unknown arg: $1" >&2; exit 2 ;;
esac
done
command -v pdftotext >/dev/null || { echo "email-inspect.sh: pdftotext not found (brew install poppler)" >&2; exit 2; }
WORK="$(mktemp -d -t emailinspect.XXXXXX)"
trap 'rm -rf "${WORK}"' EXIT
# 1. accountId + folderId
"${ZOHO_CURL}" /accounts > "${WORK}/accounts.json"
AID=$(python3 -c "import json,sys; d=json.load(open(sys.argv[1])); print((d.get('data') or [{}])[0].get('accountId',''))" "${WORK}/accounts.json")
"${ZOHO_CURL}" "/accounts/${AID}/folders" > "${WORK}/folders.json"
FID=$(python3 -c "
import json, sys
d = json.load(open(sys.argv[1]))
target = sys.argv[2]
for f in (d.get('data') or []):
if f.get('path') == target:
print(f.get('folderId')); break" "${WORK}/folders.json" "${FOLDER}")
[[ -z "${FID}" ]] && { echo "email-inspect.sh: folder '${FOLDER}' not found" >&2; exit 2; }
# 2. Find the message in the folder listing (to grab metadata: subject, from, date)
"${ZOHO_CURL}" "/accounts/${AID}/messages/view?folderId=${FID}&limit=100&sortorder=false&start=1" > "${WORK}/folder_msgs.json"
python3 - "${WORK}/folder_msgs.json" "${MID}" > "${WORK}/meta.json" <<'PY'
import json, sys
d = json.load(open(sys.argv[1]))
mid = sys.argv[2]
for m in (d.get("data") or []):
if str(m.get("messageId")) == mid:
json.dump(m, sys.stdout); sys.exit(0)
sys.exit(f"messageId {mid} not found in this folder")
PY
# 3. Attachment metadata
"${ZOHO_CURL}" "/accounts/${AID}/folders/${FID}/messages/${MID}/attachmentinfo" > "${WORK}/attachinfo.json"
# 4. Download each attachment — needs raw bytes (Accept: */*), not the JSON
# wrapper's default. We bypass zoho-curl.sh for the attachment download but
# reuse the cached access_token it wrote.
set -a; source "${SCRIPT_DIR}/../../dolibarr/.env"; set +a
: "${ZOHO_DC:=eu}"
TOKEN_CACHE="${TMPDIR:-/tmp}/zoho-access-$(whoami)"
if [[ ! -s "${TOKEN_CACHE}" ]]; then
echo "email-inspect.sh: missing access token cache — run any zoho-curl call first to populate it" >&2
exit 2
fi
ACCESS_TOKEN=$(cat "${TOKEN_CACHE}")
MAIL_BASE="https://mail.zoho.${ZOHO_DC}/api"
mkdir -p "${WORK}/atts" "${WORK}/text"
ATT_IDS=$(python3 -c "
import json, sys
d = json.load(open(sys.argv[1]))
data = d.get('data') or {}
for a in (data.get('attachments') or []):
print(f\"{a.get('attachmentId')}|{a.get('attachmentName','-')}\")" "${WORK}/attachinfo.json")
while IFS='|' read -r aid aname; do
[[ -z "${aid}" ]] && continue
outpath="${WORK}/atts/${aname}"
curl -sS \
-H "Authorization: Zoho-oauthtoken ${ACCESS_TOKEN}" \
-H "Accept: */*" \
--max-time 60 \
-o "${outpath}" \
"${MAIL_BASE}/accounts/${AID}/folders/${FID}/messages/${MID}/attachments/${aid}" || true
# If pdf, extract text (bash 3.2 compatible — no ${var,,})
aname_lc=$(echo "${aname}" | tr '[:upper:]' '[:lower:]')
if [[ "${aname_lc}" == *.pdf ]]; then
pdftotext -layout "${outpath}" "${WORK}/text/${aname%.pdf}.txt" 2>/dev/null || true
fi
done <<< "${ATT_IDS}"
# Optional save
if [[ -n "${SAVE_PDF_DIR}" ]]; then
mkdir -p "${SAVE_PDF_DIR}"
cp "${WORK}/atts/"*.pdf "${SAVE_PDF_DIR}/" 2>/dev/null || true
fi
# 5. Heuristic extract + render
python3 - "${WORK}" "${FMT}" <<'PY'
import json, sys, os, re, datetime, glob
work, fmt = sys.argv[1:3]
meta = json.load(open(os.path.join(work,"meta.json")))
ts = int(meta.get("sentDateInGMT") or meta.get("receivedTime") or 0) // 1000
mail_date = datetime.datetime.fromtimestamp(ts).strftime("%Y-%m-%d") if ts else None
mail_from = (meta.get("fromAddress") or meta.get("sender") or "-").replace("&lt;","<").replace("&gt;",">").replace("<","").replace(">","")
mail_subject = meta.get("subject") or "-"
# Heuristics on PDF text
def extract(text):
out = {}
# First non-empty line is often the supplier name (or the address block first line)
lines = [l.strip() for l in text.splitlines() if l.strip()]
out["pdf_top_line"] = lines[0] if lines else None
# Total TTC / HT / TVA — try multiple French/English patterns
def first_match(*patterns):
for p in patterns:
for line in lines:
m = re.search(p, line, re.IGNORECASE)
if m: return m.group(1).replace(",", ".").replace(" ", "")
return None
def parse_amount(s):
if not s: return None
clean = s.replace(",", ".").replace(" ", "")
try:
v = float(clean)
# Money amounts < 1M EUR; filters out VAT-number false positives (FR12345678901)
return v if 0 <= v < 1_000_000 else None
except: return None
def first_amount(*patterns):
for p in patterns:
for line in lines:
m = re.search(p, line, re.IGNORECASE)
if m:
v = parse_amount(m.group(1))
if v is not None: return f"{v:.2f}"
return None
out["total_ht"] = first_amount(r'(?:total\s*ht|montant\s*ht|net\s*amount|subtotal)[^\d-]*([\d \.,]+)')
# TVA: require currency suffix to avoid matching VAT-number digits
out["total_tva"] = first_amount(r'(?:tva|vat)[^\d-]*([\d \.,]+)\s*(?:€|eur)\b')
out["total_ttc"] = first_amount(r'(?:total\s*ttc|amount\s*due|total\s*due|grand\s*total|montant\s*total|amount\s*paid)[^\d-]*([\d \.,]+)')
# Invoice ref — must contain a digit (filters "umber", "Invoice", etc.)
m = re.search(r'(?:facture|invoice|receipt|reçu)\s*(?:n[°o]?|number|#|:)\s*([A-Za-z0-9][\w\d/-]{2,})', text, re.IGNORECASE)
if m and any(c.isdigit() for c in m.group(1)):
out["invoice_ref"] = m.group(1)
else:
# Fallback: any reasonable ref-shaped token after "Invoice" / "Facture" header
m = re.search(r'\b([A-Z]{2,}[-/]?\d[\w\d/-]{2,})\b', text)
out["invoice_ref"] = m.group(1) if m else None
# Invoice date — try ISO, French DD/MM/YYYY, English MM/DD/YYYY, French long form
out["invoice_date_raw"] = None
for p in (
r'\b(\d{4}-\d{2}-\d{2})\b',
r'(?:date|émise\s*le|invoice\s*date|date\s*de\s*facturation)[:\s]*(\d{1,2}[\s/.-]\d{1,2}[\s/.-]\d{2,4})',
r'(?:date|émise\s*le|invoice\s*date)[:\s]*(\d{1,2}\s+\w{3,9}\.?\s+\d{4})',
):
m = re.search(p, text, re.IGNORECASE)
if m: out["invoice_date_raw"] = m.group(1).strip(); break
# VAT rate (e.g. "20%") — restrict to 0-25% so "100%" / page footers don't match.
vrate = None
for line in lines:
m = re.search(r'\b(\d{1,2}([.,]\d+)?)\s*%', line)
if m:
v = float(m.group(1).replace(",", "."))
if 0 <= v <= 25:
vrate = m.group(1).replace(",", "."); break
out["vat_rate_pct"] = vrate
return out
pdfs = []
for pdf in sorted(glob.glob(os.path.join(work,"atts","*.pdf")) +
glob.glob(os.path.join(work,"atts","*.PDF"))):
name = os.path.basename(pdf)
txt_path = os.path.join(work,"text", os.path.splitext(name)[0] + ".txt")
text = open(txt_path).read() if os.path.isfile(txt_path) else ""
h = extract(text)
h["attachment_name"] = name
h["pdf_size_bytes"] = os.path.getsize(pdf)
h["pdf_text_len"] = len(text)
pdfs.append(h)
result = {
"email": {
"messageId": meta.get("messageId"),
"subject": mail_subject,
"from": mail_from,
"date": mail_date,
"hasAttachment": str(meta.get("hasAttachment","")) == "1",
},
"attachments": pdfs,
"dolibarr_draft_suggestions": [
{
"supplier_hint": p.get("pdf_top_line"),
"invoice_ref": p.get("invoice_ref"),
"invoice_date": p.get("invoice_date_raw"),
"total_ht": p.get("total_ht"),
"total_tva": p.get("total_tva"),
"total_ttc": p.get("total_ttc"),
"vat_rate_pct": p.get("vat_rate_pct"),
"source_email": meta.get("messageId"),
"source_attachment": p.get("attachment_name"),
} for p in pdfs
]
}
if fmt == "json":
print(json.dumps(result, indent=2, ensure_ascii=False))
sys.exit(0)
print("=" * 80)
print(f" Email {meta.get('messageId')}")
print("=" * 80)
print(f" subject : {mail_subject}")
print(f" from : {mail_from}")
print(f" date : {mail_date}")
print(f" attached : {result['email']['hasAttachment']}")
print()
if not pdfs:
print(" (no PDF attachments — try inspecting body or other types)")
for i, p in enumerate(pdfs, 1):
print(f" -- Attachment {i}: {p['attachment_name']} ({p['pdf_size_bytes']} bytes, {p['pdf_text_len']} chars extracted) --")
for k in ("pdf_top_line","invoice_ref","invoice_date_raw","total_ht","total_tva","total_ttc","vat_rate_pct"):
v = p.get(k)
print(f" {k:<16} = {v!r}")
print()
print(" Suggested Dolibarr supplier-invoice draft entries:")
print(json.dumps(result["dolibarr_draft_suggestions"], indent=4, ensure_ascii=False))
PY

View File

@@ -0,0 +1,141 @@
#!/usr/bin/env bash
# List candidate supplier-invoice emails from the books@ Zoho mailbox.
#
# Usage:
# email-list.sh [--folder PATH] # default: /Inbox/books (the books@ alias-filtered folder)
# [--limit N] # default: 30
# [--candidates-only] # filter by subject pattern OR attachment
# [--all-folders] # scan every folder (slow, lots of API calls)
#
# Output: table with mid, date, from, subject, hasAttachment.
# A "candidate" is a message whose subject matches a supplier-like pattern
# (facture/invoice/receipt/reçu/payment/paiement/abonnement/order/commande)
# OR which has an attachment.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
ZOHO_CURL="${SCRIPT_DIR}/zoho-curl.sh"
FOLDER="/Inbox/books"
LIMIT=30
CANDIDATES_ONLY=0
ALL_FOLDERS=0
while [[ $# -gt 0 ]]; do
case "$1" in
--folder) FOLDER="$2"; shift 2 ;;
--limit) LIMIT="$2"; shift 2 ;;
--candidates-only) CANDIDATES_ONLY=1; shift ;;
--all-folders) ALL_FOLDERS=1; shift ;;
-h|--help) sed -n '2,12p' "$0" | sed 's/^# \{0,1\}//'; exit 0 ;;
*) echo "email-list.sh: unknown arg: $1" >&2; exit 2 ;;
esac
done
WORK="$(mktemp -d -t emailist.XXXXXX)"
trap 'rm -rf "${WORK}"' EXIT
# 1. Discover accountId
"${ZOHO_CURL}" /accounts > "${WORK}/accounts.json"
AID=$(python3 -c "import json,sys; d=json.load(open(sys.argv[1])); print((d.get('data') or [{}])[0].get('accountId',''))" "${WORK}/accounts.json")
[[ -z "${AID}" ]] && { echo "email-list.sh: no accountId in /accounts response" >&2; exit 1; }
# 2. Resolve folder path → folderId
"${ZOHO_CURL}" "/accounts/${AID}/folders" > "${WORK}/folders.json"
# Build list of (folderId, path) tuples to scan
if [[ "${ALL_FOLDERS}" == "1" ]]; then
FOLDER_IDS=$(python3 -c "
import json, sys
d = json.load(open(sys.argv[1]))
for f in (d.get('data') or []):
fid = f.get('folderId'); path = f.get('path') or f.get('folderName','-')
# Skip noisy system folders
if path in ('/Drafts','/Templates','/Snoozed','/Sent','/Spam','/Trash','/Outbox'): continue
print(f\"{fid}|{path}\")" "${WORK}/folders.json")
else
FOLDER_IDS=$(python3 -c "
import json, sys
d = json.load(open(sys.argv[1]))
target = sys.argv[2]
for f in (d.get('data') or []):
if f.get('path') == target:
print(f\"{f.get('folderId')}|{f.get('path')}\")
break" "${WORK}/folders.json" "${FOLDER}")
if [[ -z "${FOLDER_IDS}" ]]; then
echo "email-list.sh: folder '${FOLDER}' not found. Available:" >&2
python3 -c "import json,sys; [print(f' {f.get(\"path\",\"-\")}') for f in json.load(open(sys.argv[1])).get('data',[])]" "${WORK}/folders.json" >&2
exit 2
fi
fi
# 3. Fetch messages per folder
mkdir -p "${WORK}/msgs"
COUNT=0
while IFS='|' read -r fid fpath; do
[[ -z "${fid}" ]] && continue
COUNT=$((COUNT+1))
out="${WORK}/msgs/$(printf '%03d' "${COUNT}").json"
"${ZOHO_CURL}" "/accounts/${AID}/messages/view?folderId=${fid}&limit=${LIMIT}&sortorder=false&start=1" > "${out}" 2>/dev/null || echo '{"data":[]}' > "${out}"
echo "${fpath}" > "${out}.path"
done <<< "${FOLDER_IDS}"
# 4. Render
python3 - "${WORK}/msgs" "${CANDIDATES_ONLY}" <<'PY'
import json, sys, os, re, datetime, glob
msgs_dir, candidates_only_str = sys.argv[1:3]
candidates_only = candidates_only_str == "1"
CANDIDATE_PATTERN = re.compile(
r'facture|invoice|receipt|re[cç]u|payment|paiement|abonnement|subscription|order|commande|invoice|bill',
re.IGNORECASE,
)
# Subjects that look like calendar invites / event updates / generic notifications
# get filtered out of --candidates-only — they always have a .ics attachment so
# the "has-attachment" heuristic alone catches them as false positives.
EXCLUDE_PATTERN = re.compile(
r'^(?:re:\s*|fwd:\s*|tr:\s*)*' # strip Re:/Fwd:/Tr: prefixes
r'(?:invitation|updated\s+invitation|canceled\s+event|accepted|declined|tentative|maybe)\s*:',
re.IGNORECASE,
)
# Senders that are pure noise — newsletter/marketing patterns.
EXCLUDE_SENDER = re.compile(
r'(updates\.|noreply@.*calendar|@calendar\.|news@|newsletter@|@updates\.)',
re.IGNORECASE,
)
def is_candidate(m):
subj = m.get("subject","") or ""
sender = m.get("fromAddress","") or m.get("sender","") or ""
# Hard exclusions take precedence over inclusions
if EXCLUDE_PATTERN.match(subj.strip()): return False
if EXCLUDE_SENDER.search(sender): return False
if str(m.get("hasAttachment","")) == "1": return True
if CANDIDATE_PATTERN.search(subj): return True
return False
rows = []
for f in sorted(glob.glob(os.path.join(msgs_dir, "*.json"))):
fpath = open(f + ".path").read().strip()
try: data = json.load(open(f)).get("data") or []
except: continue
for m in data:
if candidates_only and not is_candidate(m): continue
ts = int(m.get("sentDateInGMT") or m.get("receivedTime") or 0) // 1000
dt = datetime.datetime.fromtimestamp(ts).strftime("%Y-%m-%d") if ts else "-"
frm = (m.get("fromAddress") or m.get("sender") or "-").replace("&lt;","<").replace("&gt;",">").replace("<","").replace(">","")[:36]
subj = (m.get("subject") or "-")[:55]
has = "Y" if str(m.get("hasAttachment","")) == "1" else " "
cand = "*" if is_candidate(m) else " "
rows.append((dt, fpath, cand, has, m.get("messageId","-"), frm, subj))
rows.sort(key=lambda r: r[0], reverse=True)
print(f"{'date':<10} {'cand':<4} {'att':<3} {'messageId':<22} {'folder':<22} {'from':<36} subject")
print("-" * 130)
for dt, fpath, cand, has, mid, frm, subj in rows:
print(f"{dt:<10} [{cand}] [{has}] {mid:<22} {fpath[:22]:<22} {frm:<36} {subj}")
print("-" * 130)
print(f"# {len(rows)} message(s)" + (" (candidates only)" if candidates_only else ""))
PY

View File

@@ -0,0 +1,126 @@
#!/usr/bin/env bash
# Read-only curl wrapper for the Zoho Mail API.
#
# Usage:
# zoho-curl.sh <path> # e.g. zoho-curl.sh /accounts
# zoho-curl.sh -i <path> # include curl's -i (response headers)
# zoho-curl.sh -o file.json <path> # write body to file
#
# Reads credentials from ../../dolibarr/.env (the shared canonical file).
# Required vars:
# ZOHO_CLIENT_ID, ZOHO_CLIENT_SECRET, ZOHO_REFRESH_TOKEN, ZOHO_DC
#
# Token strategy: each invocation refreshes a short-lived access_token from
# the refresh_token (Zoho access_tokens live 1h; the cost of refreshing on
# every call is ~150 ms and avoids state on disk). On 401 from the mail API
# we re-refresh once and retry (covers refresh-token rotation cases).
#
# Exits non-zero on HTTP >= 400 and writes body to stdout + a short message
# to stderr — same shape as dol-curl.sh / bank-curl.sh.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
ENV_FILE="${SCRIPT_DIR}/../../dolibarr/.env"
if [[ ! -f "${ENV_FILE}" ]]; then
echo "zoho-curl.sh: missing ${ENV_FILE}" >&2
echo " Required vars: ZOHO_CLIENT_ID, ZOHO_CLIENT_SECRET, ZOHO_REFRESH_TOKEN, ZOHO_DC." >&2
echo " See arcodange-email-ingest/SKILL.md for the OAuth setup." >&2
exit 2
fi
set -a; source "${ENV_FILE}"; set +a
: "${ZOHO_CLIENT_ID:?zoho-curl.sh: ZOHO_CLIENT_ID not set in .env}"
: "${ZOHO_CLIENT_SECRET:?zoho-curl.sh: ZOHO_CLIENT_SECRET not set in .env}"
: "${ZOHO_REFRESH_TOKEN:?zoho-curl.sh: ZOHO_REFRESH_TOKEN not set in .env}"
: "${ZOHO_DC:=eu}"
ACCOUNTS_BASE="https://accounts.zoho.${ZOHO_DC}"
MAIL_BASE="https://mail.zoho.${ZOHO_DC}/api"
# Parse pass-through curl args (everything before the last positional)
PASSTHRU=()
while [[ $# -gt 1 ]]; do
PASSTHRU+=("$1"); shift
done
if [[ $# -lt 1 ]]; then
echo "zoho-curl.sh: missing API path. Example: zoho-curl.sh /accounts" >&2
exit 2
fi
API_PATH="$1"
# Cache access_token in tmpfs to avoid hitting OAuth rate limits on every
# zoho-curl invocation. Zoho access_tokens live 1h; we refresh after 50 min.
CACHE_FILE="${TMPDIR:-/tmp}/zoho-access-$(whoami)"
CACHE_TTL_SECONDS=$((50 * 60))
get_access_token() {
if [[ -f "${CACHE_FILE}" ]]; then
local age
age=$(( $(date +%s) - $(stat -f %m "${CACHE_FILE}" 2>/dev/null || stat -c %Y "${CACHE_FILE}") ))
if [[ ${age} -lt ${CACHE_TTL_SECONDS} ]]; then
cat "${CACHE_FILE}"
return 0
fi
fi
local token
if ! token=$(curl -sS -X POST "${ACCOUNTS_BASE}/oauth/v2/token" \
--max-time 15 \
-d "grant_type=refresh_token" \
-d "client_id=${ZOHO_CLIENT_ID}" \
-d "client_secret=${ZOHO_CLIENT_SECRET}" \
-d "refresh_token=${ZOHO_REFRESH_TOKEN}" \
| python3 -c "
import json, sys
try: d = json.load(sys.stdin)
except: sys.exit('failed to parse OAuth response')
if 'access_token' not in d:
sys.exit(f'OAuth refresh failed: {d}')
print(d['access_token'])"); then
return 1
fi
if [[ -z "${token}" ]]; then
return 1
fi
# Store cache (mode 600) only on success
printf '%s' "${token}" > "${CACHE_FILE}"
chmod 600 "${CACHE_FILE}"
printf '%s' "${token}"
}
do_call() {
local token="$1"
local body_file="$2"
local headers_file="$3"
curl -sS \
-H "Authorization: Zoho-oauthtoken ${token}" \
-H "Accept: application/json" \
--max-time 30 \
-o "${body_file}" \
-D "${headers_file}" \
-w "%{http_code}" \
${PASSTHRU[@]+"${PASSTHRU[@]}"} \
"${MAIL_BASE}${API_PATH}"
}
ACCESS_TOKEN=$(get_access_token)
[[ -z "${ACCESS_TOKEN}" ]] && { echo "zoho-curl.sh: empty access_token" >&2; exit 1; }
BODY_FILE="$(mktemp -t zohocurl.XXXXXX)"
HEADERS_FILE="$(mktemp -t zohohdr.XXXXXX)"
trap 'rm -f "${BODY_FILE}" "${HEADERS_FILE}"' EXIT
HTTP_CODE=$(do_call "${ACCESS_TOKEN}" "${BODY_FILE}" "${HEADERS_FILE}")
# Retry once on 401 with a fresh token (handles edge cases of refresh-token rotation)
if [[ "${HTTP_CODE}" == "401" ]]; then
ACCESS_TOKEN=$(get_access_token)
HTTP_CODE=$(do_call "${ACCESS_TOKEN}" "${BODY_FILE}" "${HEADERS_FILE}")
fi
cat "${BODY_FILE}"
if [[ "${HTTP_CODE}" -ge 400 ]]; then
echo "zoho-curl.sh: HTTP ${HTTP_CODE} on ${API_PATH}" >&2
exit 1
fi

View File

@@ -19,6 +19,12 @@ WISE_API_TOKEN=<from wise.com/settings/api-tokens>
WISE_PROFILE_ID=<numeric id of the BUSINESS profile — bank probe prints it>
# Optional: only needed if Wise ever opens the EU statement endpoint
WISE_SCA_KEY_PATH=~/.config/arcodange-erp/wise-sca-private.pem
# Required by arcodange-email-ingest only
ZOHO_CLIENT_ID=<from api-console.zoho.com self-client>
ZOHO_CLIENT_SECRET=<same>
ZOHO_REFRESH_TOKEN=<exchanged from one-time code via /oauth/v2/token>
ZOHO_DC=eu # eu | com | in | au
EOF
chmod 600 .claude/skills/dolibarr/.env
```

View File

@@ -151,7 +151,8 @@ Not available on this account (intentionally): `/setup/modules` (admin-only), `/
- Workflow skill for supplier-side TVA déductible (CA3 lignes 19 / 20 / 17+24): [dolibarr-tva-deductible](../dolibarr-tva-deductible/SKILL.md).
- Workflow skill for composite CA3-ready TVA summary (collectée + déductible + net): [dolibarr-tva-summary](../dolibarr-tva-summary/SKILL.md).
- **Bank-side reconciliation** (Qonto + Wise ↔ Dolibarr matching): [arcodange-bank-reco](../arcodange-bank-reco/SKILL.md).
- Future workflow skills follow the `dolibarr-<topic>` convention (ERP-internal) or `arcodange-<topic>` (cross-system, like bank reconciliation). Each one depends on this skill for connection + permissions + endpoint reference; each one keeps its triggers focused on its specific business workflow.
- **Email ingestion** (Zoho Mail → supplier-invoice draft for Dolibarr): [arcodange-email-ingest](../arcodange-email-ingest/SKILL.md).
- Future workflow skills follow the `dolibarr-<topic>` convention (ERP-internal) or `arcodange-<topic>` (cross-system). Each one depends on this skill for connection + permissions + endpoint reference; each one keeps its triggers focused on its specific business workflow.
## Out of scope

View File

@@ -22,7 +22,7 @@ concurrency:
url: https://vault.arcodange.lab
caCertificate: ${{ secrets.HOMELAB_CA_CERT }}
jwtGiteaOIDC: ${{ needs.gitea_vault_auth.outputs.gitea_vault_jwt }}
role: gitea_cicd_webapp
role: gitea_cicd_erp
method: jwt
path: gitea_jwt
secrets: |

View File

@@ -72,10 +72,15 @@ COMMANDS
probe Auth + discovery (org slug, profile id, balance ids)
qonto-transactions [--month|--since|--until] Qonto transactions table (incoming + outgoing)
wise-transactions [--month|--since|--until|--type|--enrich] Wise activities (incoming + outgoing)
match [--month|--since|--until|--window-days N] Match bank ↔ Dolibarr (3 buckets)
match [--month|--since|--until|--window-days N|--enrich] Match bank ↔ Dolibarr (split buckets)
balance Live balances + Dolibarr cross-check per fk_account
curl <qonto|wise> <path> Raw read-only curl through bank-curl.sh
email Supplier-invoice emails from the Zoho mailbox
list [--folder|--limit|--candidates-only|--all-folders] List candidates
inspect <messageId> [--folder|--save-pdf|--json] Parse PDFs + draft Dolibarr entry
curl <path> Raw read-only curl through zoho-curl.sh
whoami GET /users/info — confirm auth
ping GET /status — liveness + Dolibarr version
curl <path> Raw read-only curl through dol-curl.sh
@@ -236,6 +241,30 @@ EOF
esac
;;
email)
sub="${1:-help}"; shift || true
case "${sub}" in
list) exec "${SKILLS}/arcodange-email-ingest/scripts/email-list.sh" "$@" ;;
inspect) exec "${SKILLS}/arcodange-email-ingest/scripts/email-inspect.sh" "$@" ;;
curl) exec "${SKILLS}/arcodange-email-ingest/scripts/zoho-curl.sh" "$@" ;;
help|-h|--help)
cat <<'EOF'
arcodange email — supplier-invoice ingestion from the Zoho mailbox.
list [--folder PATH|--limit N|--candidates-only|--all-folders]
List messages (default: /Inbox/books)
inspect <messageId> [--folder PATH|--save-pdf DIR|--json]
Parse PDF attachments, propose Dolibarr supplier-invoice draft
curl <path> Raw read-only call through zoho-curl.sh
Requires ZOHO_CLIENT_ID, ZOHO_CLIENT_SECRET, ZOHO_REFRESH_TOKEN, ZOHO_DC in .env.
See arcodange-email-ingest/SKILL.md for OAuth setup.
EOF
;;
*) echo "arcodange email: unknown subcommand '${sub}' (try 'arcodange email help')" >&2; exit 2 ;;
esac
;;
whoami)
exec "${DOLC}" /users/info
;;

View File

@@ -10,8 +10,8 @@ data:
DOLI_DB_HOST_PORT: !!str 5432
# DOLI_DB_USER: root
# DOLI_DB_PASSWORD: root
DOLI_DB_NAME: erp
DOLI_URL_ROOT: 'https://erp.arcodange.lab'
DOLI_DB_NAME: {{ .Values.db.name }}
DOLI_URL_ROOT: 'https://{{ .Values.host }}'
# DOLI_ADMIN_LOGIN: 'admin'
# DOLI_ADMIN_PASSWORD: 'admininitialpassword'
DOLI_ENABLE_MODULES: Societe,Facture

View File

@@ -7,7 +7,7 @@ spec:
method: kubernetes
mount: kubernetes
kubernetes:
role: erp
role: {{ .Values.vault.k8sRole }}
serviceAccount: {{ include "erp.serviceAccountName" . }}
audiences:
- vault

View File

@@ -9,7 +9,7 @@ spec:
mount: postgres
# Path to the secret
path: creds/erp
path: {{ .Values.vault.dynamicPath }}
# Where to store the secrets, VSO will create the secret
destination:

View File

@@ -10,7 +10,7 @@ spec:
mount: kvv2
# path of the secret
path: erp/config
path: {{ .Values.vault.staticPath }}
# dest k8s secret
destination:

39
chart/values-sandbox.yaml Normal file
View File

@@ -0,0 +1,39 @@
# Sandbox overlay — to be combined with values.yaml:
# helm install erp-sandbox chart/ -f chart/values.yaml -f chart/values-sandbox.yaml \
# --namespace erp-sandbox --create-namespace
#
# Activates Phase D of the multi-env evolution (cf. PR thread). Prerequisites:
# - factory/postgres/iac/terraform.tfvars: erp has envs = ["prod", "sandbox"]
# - tools/hashicorp-vault/iac/modules/app_roles: env parameter applied
# - arcodange-org/erp/iac/main.tf: for_each over local.envs (Phase D commit)
# - ArgoCD: Application "erp-sandbox" registered (Phase E)
#
# Derived names follow the elision rule: env=sandbox → suffix "-sandbox".
env: sandbox
instance: erp-sandbox
host: erp-sandbox.arcodange.lab
db:
name: erp-sandbox
vault:
k8sRole: erp-sandbox
dynamicPath: creds/erp-sandbox
staticPath: erp-sandbox/config
# Ingress annotations + hosts — override to point at the sandbox FQDN
ingress:
enabled: true
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: websecure
traefik.ingress.kubernetes.io/router.tls: "true"
traefik.ingress.kubernetes.io/router.tls.certresolver: letsencrypt
traefik.ingress.kubernetes.io/router.tls.domains.0.main: arcodange.lab
traefik.ingress.kubernetes.io/router.tls.domains.0.sans: erp-sandbox.arcodange.lab
traefik.ingress.kubernetes.io/router.middlewares: localIp@file
hosts:
- host: erp-sandbox.arcodange.lab
paths:
- path: /
pathType: Prefix

View File

@@ -2,6 +2,26 @@
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
# ----------------------------------------------------------------------------
# Multi-environment coordinates (default = prod, elision rule applies).
# Override in values-<env>.yaml for any non-prod instance — see SKILL.md
# of the factory runbook (doc/runbooks/new-web-app/conventions.md).
# By the elision rule, env=prod produces names identical to single-env apps;
# env=sandbox produces "<app>-sandbox" everywhere except the Postgres owner
# role which uses snake-case "<app>_sandbox_role".
# ----------------------------------------------------------------------------
env: prod
instance: erp # derived id: env=prod → erp, else <app>-<env>
host: erp.arcodange.lab # internal hostname for this instance
db:
name: erp # PostgreSQL database name (matches factory tfvars)
vault:
k8sRole: erp # VaultAuth role (postgres/iac issues this per instance)
dynamicPath: creds/erp # path under postgres/ mount for short-lived DB creds
staticPath: erp/config # path under kvv2/ mount for the static admin config
replicaCount: 1
image: