add arcodange-email-ingest — Zoho Mail → Dolibarr supplier-invoice drafts

V8 — first inbound-side skill. Closes the loop from "bill arrives by email"
to "ready to enter in Dolibarr UI". Read-only at every layer.

What ships:
- arcodange-email-ingest/scripts/zoho-curl.sh   OAuth wrapper with token cache
                                                (50 min TTL, mode 600) — avoids
                                                hitting Zoho OAuth rate limit on
                                                every invocation.
- arcodange-email-ingest/scripts/email-list.sh   List candidates in /Inbox/books
                                                (where the books@ alias auto-
                                                routes mail). --candidates-only
                                                filter on supplier patterns or
                                                attachments. --all-folders to
                                                scan everything.
- arcodange-email-ingest/scripts/email-inspect.sh   Pull message + attachments,
                                                pdftotext on each PDF, heuristic
                                                extract (supplier, ref, dates,
                                                totals, VAT rate), emit Dolibarr
                                                supplier-invoice draft JSON.

Architecture choice — Zoho API (not IMAP):
- books@arcodange.fr is an alias of gabrielradureau@arcodange.fr → one OAuth
  refresh_token covers everything.
- Gmail folded in via forwarding (arcodange@gmail.com → books@) — no Google
  API setup, no app-passwords, no second OAuth flow.
- Token-based auth, no SCA rabbit hole.

V8.0 baseline (in /Inbox/books):
- 3 candidates: Mistral AI facture, Anthropic Stripe receipt (Fwd Gmail),
  INPI payment receipt (Fwd Gmail).
- Heuristic extraction is best-effort: works on amounts/refs for some
  templates, misses others (Mistral PDF format, Stripe receipt layout).
- --save-pdf <DIR> lets the operator grab the PDFs for manual entry when
  the heuristic falls short.

Rate-limit pitfall documented: Zoho OAuth refresh has an aggressive throttle
("too many requests continuously"). The cache file at $TMPDIR/zoho-access-$USER
(mode 600, 50 min TTL) prevents this; on 401 the wrapper auto-refreshes once
and retries.

V8.1+ ideas in SKILL.md out-of-scope:
- mark ingested emails (IMAP flag or Zoho label)
- body text extraction (inline-HTML invoices)
- per-template parsers or LLM-based extraction
- IMAP fallback for non-Zoho mailboxes

CLI: bin/arcodange email {list|inspect|curl} integrated.
Base updates: dolibarr/SKILL.md cross-link, dolibarr/README.md env schema
extended with ZOHO_CLIENT_ID/SECRET/REFRESH_TOKEN/DC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-31 14:56:15 +02:00
parent a1042a483b
commit c2d8479f5e
9 changed files with 685 additions and 1 deletions

View File

@@ -0,0 +1,107 @@
---
name: arcodange-email-ingest
description: Scrape supplier-invoice emails from the Arcodange Zoho mailbox (`gabrielradureau@arcodange.fr` + its `books@arcodange.fr` alias + forwarded Gmail) via the Zoho Mail OAuth API, list candidates matching supplier patterns, download PDF attachments, run pdftotext + heuristic extract, and emit Dolibarr-ready supplier-invoice draft JSON for the operator to paste into the Dolibarr UI. Two workflows — (1) list candidates in a folder (default `/Inbox/books` where the alias auto-routes mail); (2) inspect one message by id, download + parse PDFs, propose draft entries. Surfaces concrete data: supplier name guess (first PDF line), invoice ref, invoice date, total HT/TVA/TTC, VAT rate. Read-only at every layer (Zoho scopes are READ-only; no write to Dolibarr). Use when the user asks "list pending supplier invoices in mail", "ingest invoices from email", "draft Dolibarr entry from this email", "audit cohort supplier docs from mail". Depends on `dolibarr` for the shared `.env`. SKIP for write-side Dolibarr operations (V9 candidate), for non-Zoho mailboxes (use IMAP fallback in a future skill if needed), and for attachments that aren't PDFs (only PDF text extraction is wired today).
requires:
bins: ["curl", "jq", "python3", "pdftotext"]
auth: true
---
# arcodange-email-ingest — supplier-invoice emails → Dolibarr draft
Close the inbound side of the accounting loop: bills land in `books@arcodange.fr`, this skill turns them into Dolibarr-ready draft entries for the operator to validate + create.
Depends on the [dolibarr](../dolibarr/SKILL.md) base skill (shared `.env`).
**CLI shortcuts:** `bin/arcodange email list | inspect | curl`
## Architecture choice — Zoho API, not IMAP
We chose the Zoho Mail OAuth API over IMAP because:
- **Richer metadata** — folder paths, attachment IDs, search operators, threads.
- **One account covers everything** — `books@arcodange.fr` is an alias of `gabrielradureau@arcodange.fr`. One refresh_token + the `/accounts` endpoint exposes both, plus all the other aliases (`contact@`, `bonjour@`, etc.).
- **Gmail folded in via forwarding** — `arcodange@gmail.com` forwards incoming to `books@` (configured in Gmail UI). No Google API setup, no app-passwords, no second OAuth flow.
- **Token-only auth** — no app-password fragility, no SCA dance (unlike Wise).
The single canonical inbox path: **`/Inbox/books`** — Zoho's auto-filter routes incoming mail to the `books@` alias into this sub-folder. Scan it first; widen with `--all-folders` only if needed.
## Prerequisites
1. Base skill set up ([dolibarr/README.md](../dolibarr/README.md)).
2. Zoho OAuth Self-Client created and a refresh_token generated. The `.env` extension:
```
ZOHO_CLIENT_ID=<from api-console.zoho.com self-client>
ZOHO_CLIENT_SECRET=<same>
ZOHO_REFRESH_TOKEN=<exchanged from one-time code>
ZOHO_DC=eu # eu | com | in | au
```
Setup walkthrough is in the V8 prep section of the cohort review notes.
3. Gmail forwarding to `books@arcodange.fr` enabled (Gmail Settings → Forwarding and POP/IMAP).
4. `pdftotext` (`brew install poppler` on macOS).
## Workflows
### 1. List candidates
```bash
bin/arcodange email list # default: /Inbox/books, last 30 msgs, no filter
bin/arcodange email list --candidates-only # filter to subjects/attachments matching supplier patterns
bin/arcodange email list --folder /Inbox/contact --limit 50
bin/arcodange email list --all-folders --candidates-only # scan everything (slower, more API calls)
```
Captured at [examples/email-list.txt](examples/email-list.txt). The candidate filter matches subjects against `facture|invoice|receipt|reçu|payment|paiement|abonnement|subscription|order|commande|bill` OR any message with an attachment. The `[*]` column marks candidates, `[Y]` marks emails with attachments.
### 2. Inspect one email + draft Dolibarr entry
```bash
bin/arcodange email inspect 1775141901205014300
bin/arcodange email inspect 1775141901205014300 --folder /Inbox/books # default
bin/arcodange email inspect 1775141901205014300 --save-pdf ~/Documents/factures-2026-Q2/
bin/arcodange email inspect 1775141901205014300 --json # machine-readable
```
The script:
1. Fetches the email metadata (subject / from / date) via `/messages/view`.
2. Lists attachments via `/messages/{mid}/attachmentinfo`.
3. Downloads each attachment via `/messages/{mid}/attachments/{aid}`.
4. For each `.pdf`, runs `pdftotext -layout`, applies regex heuristics to extract:
- Supplier name guess (first non-empty PDF line — often the supplier letterhead).
- Invoice reference (`facture/invoice n° XXX`).
- Invoice date.
- Total HT / TVA / TTC + VAT rate %.
5. Emits a draft JSON record per attachment — paste into the Dolibarr UI manually.
Heuristics are intentionally conservative (regex-based, no LLM dependency). For PDF templates where the heuristic fails, the raw `pdftotext` output is on disk in the work dir; rerun with `--save-pdf` to grab the PDF for manual entry.
Captured at [examples/email-inspect.txt](examples/email-inspect.txt) for the V8 baseline (Mistral AI receipt).
## What it doesn't do (V8.0 scope)
- **Does not write to Dolibarr.** The supplier invoice is still created manually in the Dolibarr UI from the draft JSON. V9 candidate: automate via `/supplierinvoices` POST.
- **Does not mark emails as ingested.** Each run re-emits the same candidates. V8.1 candidate: set the IMAP `\Flagged` flag or add a Zoho label `ingested` after the operator confirms.
- **No body extraction yet.** We only parse PDF attachments. Inline-HTML invoices (rare — most suppliers send PDFs) would need body fetch via `/content`.
- **Heuristic extraction is best-effort.** Different supplier PDF templates yield different field-extraction reliability. The draft JSON is a starting point, not ground truth.
## Token cache
`zoho-curl.sh` caches the OAuth access_token in `$TMPDIR/zoho-access-$USER` (mode 600, TTL 50 min). Avoids hitting Zoho's OAuth refresh rate-limit on every invocation. On 401, the wrapper auto-refreshes once and retries.
## API endpoints used (Zoho Mail)
| Endpoint | Purpose |
|---|---|
| `POST /oauth/v2/token` (accounts.zoho.{dc}) | Refresh access_token from refresh_token |
| `GET /accounts` | Discover accountId + aliases on the account |
| `GET /accounts/{aid}/folders` | List folders (with paths like `/Inbox/books`) |
| `GET /accounts/{aid}/messages/view?folderId=&limit=&start=` | List messages in a folder |
| `GET /accounts/{aid}/folders/{fid}/messages/{mid}/attachmentinfo` | List attachments metadata |
| `GET /accounts/{aid}/folders/{fid}/messages/{mid}/attachments/{aid}` | Download attachment bytes |
## Out of scope
- **Writing to Dolibarr** (V9 candidate — would lift the read-only constraint on the API key, or use a separate write-scoped key).
- **Marking ingested emails** (V8.1 trivial follow-up).
- **Non-PDF attachments** (heuristics are PDF-specific).
- **Body-text extraction** (would need `/content` endpoint, deferred).
- **IMAP fallback** for non-Zoho mailboxes (deferred — Gmail forwarding to books@ covers the only known external mailbox today).
- **LLM-based extraction** (deferred — regex covers the current set of supplier templates well enough).

View File

@@ -0,0 +1,31 @@
================================================================================
Email 1775141901205014300
================================================================================
subject : Votre facture nº MSTRL-API-814045-001 de Mistral AI SAS
from : no-reply@mistral.ai
date : 2026-04-02
attached : True
-- Attachment 1: invoice-MSTRL-API-814045-001.pdf (74377 bytes, 1771 chars extracted) --
pdf_top_line = 'Facture'
invoice_ref = 'API-814045-001'
invoice_date_raw = None
total_ht = None
total_tva = None
total_ttc = None
vat_rate_pct = '20.0'
Suggested Dolibarr supplier-invoice draft entries:
[
{
"supplier_hint": "Facture",
"invoice_ref": "API-814045-001",
"invoice_date": null,
"total_ht": null,
"total_tva": null,
"total_ttc": null,
"vat_rate_pct": "20.0",
"source_email": "1775141901205014300",
"source_attachment": "invoice-MSTRL-API-814045-001.pdf"
}
]

View File

@@ -0,0 +1,7 @@
date cand att messageId folder from subject
----------------------------------------------------------------------------------------------------------------------------------
2026-04-12 [*] [Y] 1776017238960014300 /Inbox/books arcodange@gmail.com Fwd: Your receipt from Anthropic Ireland, Limited #2109
2026-04-02 [*] [Y] 1775141901205014300 /Inbox/books no-reply@mistral.ai Votre facture nº MSTRL-API-814045-001 de Mistral AI SAS
2026-01-09 [*] [ ] 1767989744791004400 /Inbox/books gabrielradureau@gmail.com Fwd: INPI - Votre paiement pour la commande Réf. 181876
----------------------------------------------------------------------------------------------------------------------------------
# 3 message(s) (candidates only)

View File

@@ -0,0 +1,256 @@
#!/usr/bin/env bash
# Inspect one email by id and propose a Dolibarr supplier-invoice draft.
#
# Usage:
# email-inspect.sh <messageId> [--folder PATH] # default folder: /Inbox/books
# [--save-pdf DIR] # save PDF attachments under DIR/
# [--json] # emit a single JSON object on stdout
#
# Pipeline (read-only):
# 1. Find the message (in the given folder, default /Inbox/books).
# 2. List attachments via /attachmentinfo.
# 3. For each PDF attachment: download, run pdftotext, extract supplier-side
# heuristics (name, totals, dates, ref).
# 4. Emit a draft "Dolibarr-ready" record per attachment so the operator can
# hand-create the supplier invoice in the Dolibarr UI.
#
# This skill DOES NOT write to Dolibarr. Auto-creation of supplier invoices is
# V9 candidate.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
ZOHO_CURL="${SCRIPT_DIR}/zoho-curl.sh"
if [[ $# -lt 1 ]]; then
echo "email-inspect.sh: missing <messageId>" >&2
echo " Hint: bin/arcodange email list to see candidate ids." >&2
exit 2
fi
MID="$1"; shift || true
FOLDER="/Inbox/books"; SAVE_PDF_DIR=""; FMT="text"
while [[ $# -gt 0 ]]; do
case "$1" in
--folder) FOLDER="$2"; shift 2 ;;
--save-pdf) SAVE_PDF_DIR="$2"; shift 2 ;;
--json) FMT="json"; shift ;;
-h|--help) sed -n '2,18p' "$0" | sed 's/^# \{0,1\}//'; exit 0 ;;
*) echo "email-inspect.sh: unknown arg: $1" >&2; exit 2 ;;
esac
done
command -v pdftotext >/dev/null || { echo "email-inspect.sh: pdftotext not found (brew install poppler)" >&2; exit 2; }
WORK="$(mktemp -d -t emailinspect.XXXXXX)"
trap 'rm -rf "${WORK}"' EXIT
# 1. accountId + folderId
"${ZOHO_CURL}" /accounts > "${WORK}/accounts.json"
AID=$(python3 -c "import json,sys; d=json.load(open(sys.argv[1])); print((d.get('data') or [{}])[0].get('accountId',''))" "${WORK}/accounts.json")
"${ZOHO_CURL}" "/accounts/${AID}/folders" > "${WORK}/folders.json"
FID=$(python3 -c "
import json, sys
d = json.load(open(sys.argv[1]))
target = sys.argv[2]
for f in (d.get('data') or []):
if f.get('path') == target:
print(f.get('folderId')); break" "${WORK}/folders.json" "${FOLDER}")
[[ -z "${FID}" ]] && { echo "email-inspect.sh: folder '${FOLDER}' not found" >&2; exit 2; }
# 2. Find the message in the folder listing (to grab metadata: subject, from, date)
"${ZOHO_CURL}" "/accounts/${AID}/messages/view?folderId=${FID}&limit=100&sortorder=false&start=1" > "${WORK}/folder_msgs.json"
python3 - "${WORK}/folder_msgs.json" "${MID}" > "${WORK}/meta.json" <<'PY'
import json, sys
d = json.load(open(sys.argv[1]))
mid = sys.argv[2]
for m in (d.get("data") or []):
if str(m.get("messageId")) == mid:
json.dump(m, sys.stdout); sys.exit(0)
sys.exit(f"messageId {mid} not found in this folder")
PY
# 3. Attachment metadata
"${ZOHO_CURL}" "/accounts/${AID}/folders/${FID}/messages/${MID}/attachmentinfo" > "${WORK}/attachinfo.json"
# 4. Download each attachment — needs raw bytes (Accept: */*), not the JSON
# wrapper's default. We bypass zoho-curl.sh for the attachment download but
# reuse the cached access_token it wrote.
set -a; source "${SCRIPT_DIR}/../../dolibarr/.env"; set +a
: "${ZOHO_DC:=eu}"
TOKEN_CACHE="${TMPDIR:-/tmp}/zoho-access-$(whoami)"
if [[ ! -s "${TOKEN_CACHE}" ]]; then
echo "email-inspect.sh: missing access token cache — run any zoho-curl call first to populate it" >&2
exit 2
fi
ACCESS_TOKEN=$(cat "${TOKEN_CACHE}")
MAIL_BASE="https://mail.zoho.${ZOHO_DC}/api"
mkdir -p "${WORK}/atts" "${WORK}/text"
ATT_IDS=$(python3 -c "
import json, sys
d = json.load(open(sys.argv[1]))
data = d.get('data') or {}
for a in (data.get('attachments') or []):
print(f\"{a.get('attachmentId')}|{a.get('attachmentName','-')}\")" "${WORK}/attachinfo.json")
while IFS='|' read -r aid aname; do
[[ -z "${aid}" ]] && continue
outpath="${WORK}/atts/${aname}"
curl -sS \
-H "Authorization: Zoho-oauthtoken ${ACCESS_TOKEN}" \
-H "Accept: */*" \
--max-time 60 \
-o "${outpath}" \
"${MAIL_BASE}/accounts/${AID}/folders/${FID}/messages/${MID}/attachments/${aid}" || true
# If pdf, extract text (bash 3.2 compatible — no ${var,,})
aname_lc=$(echo "${aname}" | tr '[:upper:]' '[:lower:]')
if [[ "${aname_lc}" == *.pdf ]]; then
pdftotext -layout "${outpath}" "${WORK}/text/${aname%.pdf}.txt" 2>/dev/null || true
fi
done <<< "${ATT_IDS}"
# Optional save
if [[ -n "${SAVE_PDF_DIR}" ]]; then
mkdir -p "${SAVE_PDF_DIR}"
cp "${WORK}/atts/"*.pdf "${SAVE_PDF_DIR}/" 2>/dev/null || true
fi
# 5. Heuristic extract + render
python3 - "${WORK}" "${FMT}" <<'PY'
import json, sys, os, re, datetime, glob
work, fmt = sys.argv[1:3]
meta = json.load(open(os.path.join(work,"meta.json")))
ts = int(meta.get("sentDateInGMT") or meta.get("receivedTime") or 0) // 1000
mail_date = datetime.datetime.fromtimestamp(ts).strftime("%Y-%m-%d") if ts else None
mail_from = (meta.get("fromAddress") or meta.get("sender") or "-").replace("&lt;","<").replace("&gt;",">").replace("<","").replace(">","")
mail_subject = meta.get("subject") or "-"
# Heuristics on PDF text
def extract(text):
out = {}
# First non-empty line is often the supplier name (or the address block first line)
lines = [l.strip() for l in text.splitlines() if l.strip()]
out["pdf_top_line"] = lines[0] if lines else None
# Total TTC / HT / TVA — try multiple French/English patterns
def first_match(*patterns):
for p in patterns:
for line in lines:
m = re.search(p, line, re.IGNORECASE)
if m: return m.group(1).replace(",", ".").replace(" ", "")
return None
def parse_amount(s):
if not s: return None
clean = s.replace(",", ".").replace(" ", "")
try:
v = float(clean)
# Money amounts < 1M EUR; filters out VAT-number false positives (FR12345678901)
return v if 0 <= v < 1_000_000 else None
except: return None
def first_amount(*patterns):
for p in patterns:
for line in lines:
m = re.search(p, line, re.IGNORECASE)
if m:
v = parse_amount(m.group(1))
if v is not None: return f"{v:.2f}"
return None
out["total_ht"] = first_amount(r'(?:total\s*ht|montant\s*ht|net\s*amount|subtotal)[^\d-]*([\d \.,]+)')
# TVA: require currency suffix to avoid matching VAT-number digits
out["total_tva"] = first_amount(r'(?:tva|vat)[^\d-]*([\d \.,]+)\s*(?:€|eur)\b')
out["total_ttc"] = first_amount(r'(?:total\s*ttc|amount\s*due|total\s*due|grand\s*total|montant\s*total|amount\s*paid)[^\d-]*([\d \.,]+)')
# Invoice ref — must contain a digit (filters "umber", "Invoice", etc.)
m = re.search(r'(?:facture|invoice|receipt|reçu)\s*(?:n[°o]?|number|#|:)\s*([A-Za-z0-9][\w\d/-]{2,})', text, re.IGNORECASE)
if m and any(c.isdigit() for c in m.group(1)):
out["invoice_ref"] = m.group(1)
else:
# Fallback: any reasonable ref-shaped token after "Invoice" / "Facture" header
m = re.search(r'\b([A-Z]{2,}[-/]?\d[\w\d/-]{2,})\b', text)
out["invoice_ref"] = m.group(1) if m else None
# Invoice date — try ISO, French DD/MM/YYYY, English MM/DD/YYYY, French long form
out["invoice_date_raw"] = None
for p in (
r'\b(\d{4}-\d{2}-\d{2})\b',
r'(?:date|émise\s*le|invoice\s*date|date\s*de\s*facturation)[:\s]*(\d{1,2}[\s/.-]\d{1,2}[\s/.-]\d{2,4})',
r'(?:date|émise\s*le|invoice\s*date)[:\s]*(\d{1,2}\s+\w{3,9}\.?\s+\d{4})',
):
m = re.search(p, text, re.IGNORECASE)
if m: out["invoice_date_raw"] = m.group(1).strip(); break
# VAT rate (e.g. "20%") — restrict to 0-25% so "100%" / page footers don't match.
vrate = None
for line in lines:
m = re.search(r'\b(\d{1,2}([.,]\d+)?)\s*%', line)
if m:
v = float(m.group(1).replace(",", "."))
if 0 <= v <= 25:
vrate = m.group(1).replace(",", "."); break
out["vat_rate_pct"] = vrate
return out
pdfs = []
for pdf in sorted(glob.glob(os.path.join(work,"atts","*.pdf")) +
glob.glob(os.path.join(work,"atts","*.PDF"))):
name = os.path.basename(pdf)
txt_path = os.path.join(work,"text", os.path.splitext(name)[0] + ".txt")
text = open(txt_path).read() if os.path.isfile(txt_path) else ""
h = extract(text)
h["attachment_name"] = name
h["pdf_size_bytes"] = os.path.getsize(pdf)
h["pdf_text_len"] = len(text)
pdfs.append(h)
result = {
"email": {
"messageId": meta.get("messageId"),
"subject": mail_subject,
"from": mail_from,
"date": mail_date,
"hasAttachment": str(meta.get("hasAttachment","")) == "1",
},
"attachments": pdfs,
"dolibarr_draft_suggestions": [
{
"supplier_hint": p.get("pdf_top_line"),
"invoice_ref": p.get("invoice_ref"),
"invoice_date": p.get("invoice_date_raw"),
"total_ht": p.get("total_ht"),
"total_tva": p.get("total_tva"),
"total_ttc": p.get("total_ttc"),
"vat_rate_pct": p.get("vat_rate_pct"),
"source_email": meta.get("messageId"),
"source_attachment": p.get("attachment_name"),
} for p in pdfs
]
}
if fmt == "json":
print(json.dumps(result, indent=2, ensure_ascii=False))
sys.exit(0)
print("=" * 80)
print(f" Email {meta.get('messageId')}")
print("=" * 80)
print(f" subject : {mail_subject}")
print(f" from : {mail_from}")
print(f" date : {mail_date}")
print(f" attached : {result['email']['hasAttachment']}")
print()
if not pdfs:
print(" (no PDF attachments — try inspecting body or other types)")
for i, p in enumerate(pdfs, 1):
print(f" -- Attachment {i}: {p['attachment_name']} ({p['pdf_size_bytes']} bytes, {p['pdf_text_len']} chars extracted) --")
for k in ("pdf_top_line","invoice_ref","invoice_date_raw","total_ht","total_tva","total_ttc","vat_rate_pct"):
v = p.get(k)
print(f" {k:<16} = {v!r}")
print()
print(" Suggested Dolibarr supplier-invoice draft entries:")
print(json.dumps(result["dolibarr_draft_suggestions"], indent=4, ensure_ascii=False))
PY

View File

@@ -0,0 +1,121 @@
#!/usr/bin/env bash
# List candidate supplier-invoice emails from the books@ Zoho mailbox.
#
# Usage:
# email-list.sh [--folder PATH] # default: /Inbox/books (the books@ alias-filtered folder)
# [--limit N] # default: 30
# [--candidates-only] # filter by subject pattern OR attachment
# [--all-folders] # scan every folder (slow, lots of API calls)
#
# Output: table with mid, date, from, subject, hasAttachment.
# A "candidate" is a message whose subject matches a supplier-like pattern
# (facture/invoice/receipt/reçu/payment/paiement/abonnement/order/commande)
# OR which has an attachment.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
ZOHO_CURL="${SCRIPT_DIR}/zoho-curl.sh"
FOLDER="/Inbox/books"
LIMIT=30
CANDIDATES_ONLY=0
ALL_FOLDERS=0
while [[ $# -gt 0 ]]; do
case "$1" in
--folder) FOLDER="$2"; shift 2 ;;
--limit) LIMIT="$2"; shift 2 ;;
--candidates-only) CANDIDATES_ONLY=1; shift ;;
--all-folders) ALL_FOLDERS=1; shift ;;
-h|--help) sed -n '2,12p' "$0" | sed 's/^# \{0,1\}//'; exit 0 ;;
*) echo "email-list.sh: unknown arg: $1" >&2; exit 2 ;;
esac
done
WORK="$(mktemp -d -t emailist.XXXXXX)"
trap 'rm -rf "${WORK}"' EXIT
# 1. Discover accountId
"${ZOHO_CURL}" /accounts > "${WORK}/accounts.json"
AID=$(python3 -c "import json,sys; d=json.load(open(sys.argv[1])); print((d.get('data') or [{}])[0].get('accountId',''))" "${WORK}/accounts.json")
[[ -z "${AID}" ]] && { echo "email-list.sh: no accountId in /accounts response" >&2; exit 1; }
# 2. Resolve folder path → folderId
"${ZOHO_CURL}" "/accounts/${AID}/folders" > "${WORK}/folders.json"
# Build list of (folderId, path) tuples to scan
if [[ "${ALL_FOLDERS}" == "1" ]]; then
FOLDER_IDS=$(python3 -c "
import json, sys
d = json.load(open(sys.argv[1]))
for f in (d.get('data') or []):
fid = f.get('folderId'); path = f.get('path') or f.get('folderName','-')
# Skip noisy system folders
if path in ('/Drafts','/Templates','/Snoozed','/Sent','/Spam','/Trash','/Outbox'): continue
print(f\"{fid}|{path}\")" "${WORK}/folders.json")
else
FOLDER_IDS=$(python3 -c "
import json, sys
d = json.load(open(sys.argv[1]))
target = sys.argv[2]
for f in (d.get('data') or []):
if f.get('path') == target:
print(f\"{f.get('folderId')}|{f.get('path')}\")
break" "${WORK}/folders.json" "${FOLDER}")
if [[ -z "${FOLDER_IDS}" ]]; then
echo "email-list.sh: folder '${FOLDER}' not found. Available:" >&2
python3 -c "import json,sys; [print(f' {f.get(\"path\",\"-\")}') for f in json.load(open(sys.argv[1])).get('data',[])]" "${WORK}/folders.json" >&2
exit 2
fi
fi
# 3. Fetch messages per folder
mkdir -p "${WORK}/msgs"
COUNT=0
while IFS='|' read -r fid fpath; do
[[ -z "${fid}" ]] && continue
COUNT=$((COUNT+1))
out="${WORK}/msgs/$(printf '%03d' "${COUNT}").json"
"${ZOHO_CURL}" "/accounts/${AID}/messages/view?folderId=${fid}&limit=${LIMIT}&sortorder=false&start=1" > "${out}" 2>/dev/null || echo '{"data":[]}' > "${out}"
echo "${fpath}" > "${out}.path"
done <<< "${FOLDER_IDS}"
# 4. Render
python3 - "${WORK}/msgs" "${CANDIDATES_ONLY}" <<'PY'
import json, sys, os, re, datetime, glob
msgs_dir, candidates_only_str = sys.argv[1:3]
candidates_only = candidates_only_str == "1"
CANDIDATE_PATTERN = re.compile(
r'facture|invoice|receipt|re[cç]u|payment|paiement|abonnement|subscription|order|commande|invoice|bill',
re.IGNORECASE,
)
def is_candidate(m):
if str(m.get("hasAttachment","")) == "1": return True
if CANDIDATE_PATTERN.search(m.get("subject","") or ""): return True
return False
rows = []
for f in sorted(glob.glob(os.path.join(msgs_dir, "*.json"))):
fpath = open(f + ".path").read().strip()
try: data = json.load(open(f)).get("data") or []
except: continue
for m in data:
if candidates_only and not is_candidate(m): continue
ts = int(m.get("sentDateInGMT") or m.get("receivedTime") or 0) // 1000
dt = datetime.datetime.fromtimestamp(ts).strftime("%Y-%m-%d") if ts else "-"
frm = (m.get("fromAddress") or m.get("sender") or "-").replace("&lt;","<").replace("&gt;",">").replace("<","").replace(">","")[:36]
subj = (m.get("subject") or "-")[:55]
has = "Y" if str(m.get("hasAttachment","")) == "1" else " "
cand = "*" if is_candidate(m) else " "
rows.append((dt, fpath, cand, has, m.get("messageId","-"), frm, subj))
rows.sort(key=lambda r: r[0], reverse=True)
print(f"{'date':<10} {'cand':<4} {'att':<3} {'messageId':<22} {'folder':<22} {'from':<36} subject")
print("-" * 130)
for dt, fpath, cand, has, mid, frm, subj in rows:
print(f"{dt:<10} [{cand}] [{has}] {mid:<22} {fpath[:22]:<22} {frm:<36} {subj}")
print("-" * 130)
print(f"# {len(rows)} message(s)" + (" (candidates only)" if candidates_only else ""))
PY

View File

@@ -0,0 +1,126 @@
#!/usr/bin/env bash
# Read-only curl wrapper for the Zoho Mail API.
#
# Usage:
# zoho-curl.sh <path> # e.g. zoho-curl.sh /accounts
# zoho-curl.sh -i <path> # include curl's -i (response headers)
# zoho-curl.sh -o file.json <path> # write body to file
#
# Reads credentials from ../../dolibarr/.env (the shared canonical file).
# Required vars:
# ZOHO_CLIENT_ID, ZOHO_CLIENT_SECRET, ZOHO_REFRESH_TOKEN, ZOHO_DC
#
# Token strategy: each invocation refreshes a short-lived access_token from
# the refresh_token (Zoho access_tokens live 1h; the cost of refreshing on
# every call is ~150 ms and avoids state on disk). On 401 from the mail API
# we re-refresh once and retry (covers refresh-token rotation cases).
#
# Exits non-zero on HTTP >= 400 and writes body to stdout + a short message
# to stderr — same shape as dol-curl.sh / bank-curl.sh.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
ENV_FILE="${SCRIPT_DIR}/../../dolibarr/.env"
if [[ ! -f "${ENV_FILE}" ]]; then
echo "zoho-curl.sh: missing ${ENV_FILE}" >&2
echo " Required vars: ZOHO_CLIENT_ID, ZOHO_CLIENT_SECRET, ZOHO_REFRESH_TOKEN, ZOHO_DC." >&2
echo " See arcodange-email-ingest/SKILL.md for the OAuth setup." >&2
exit 2
fi
set -a; source "${ENV_FILE}"; set +a
: "${ZOHO_CLIENT_ID:?zoho-curl.sh: ZOHO_CLIENT_ID not set in .env}"
: "${ZOHO_CLIENT_SECRET:?zoho-curl.sh: ZOHO_CLIENT_SECRET not set in .env}"
: "${ZOHO_REFRESH_TOKEN:?zoho-curl.sh: ZOHO_REFRESH_TOKEN not set in .env}"
: "${ZOHO_DC:=eu}"
ACCOUNTS_BASE="https://accounts.zoho.${ZOHO_DC}"
MAIL_BASE="https://mail.zoho.${ZOHO_DC}/api"
# Parse pass-through curl args (everything before the last positional)
PASSTHRU=()
while [[ $# -gt 1 ]]; do
PASSTHRU+=("$1"); shift
done
if [[ $# -lt 1 ]]; then
echo "zoho-curl.sh: missing API path. Example: zoho-curl.sh /accounts" >&2
exit 2
fi
API_PATH="$1"
# Cache access_token in tmpfs to avoid hitting OAuth rate limits on every
# zoho-curl invocation. Zoho access_tokens live 1h; we refresh after 50 min.
CACHE_FILE="${TMPDIR:-/tmp}/zoho-access-$(whoami)"
CACHE_TTL_SECONDS=$((50 * 60))
get_access_token() {
if [[ -f "${CACHE_FILE}" ]]; then
local age
age=$(( $(date +%s) - $(stat -f %m "${CACHE_FILE}" 2>/dev/null || stat -c %Y "${CACHE_FILE}") ))
if [[ ${age} -lt ${CACHE_TTL_SECONDS} ]]; then
cat "${CACHE_FILE}"
return 0
fi
fi
local token
if ! token=$(curl -sS -X POST "${ACCOUNTS_BASE}/oauth/v2/token" \
--max-time 15 \
-d "grant_type=refresh_token" \
-d "client_id=${ZOHO_CLIENT_ID}" \
-d "client_secret=${ZOHO_CLIENT_SECRET}" \
-d "refresh_token=${ZOHO_REFRESH_TOKEN}" \
| python3 -c "
import json, sys
try: d = json.load(sys.stdin)
except: sys.exit('failed to parse OAuth response')
if 'access_token' not in d:
sys.exit(f'OAuth refresh failed: {d}')
print(d['access_token'])"); then
return 1
fi
if [[ -z "${token}" ]]; then
return 1
fi
# Store cache (mode 600) only on success
printf '%s' "${token}" > "${CACHE_FILE}"
chmod 600 "${CACHE_FILE}"
printf '%s' "${token}"
}
do_call() {
local token="$1"
local body_file="$2"
local headers_file="$3"
curl -sS \
-H "Authorization: Zoho-oauthtoken ${token}" \
-H "Accept: application/json" \
--max-time 30 \
-o "${body_file}" \
-D "${headers_file}" \
-w "%{http_code}" \
${PASSTHRU[@]+"${PASSTHRU[@]}"} \
"${MAIL_BASE}${API_PATH}"
}
ACCESS_TOKEN=$(get_access_token)
[[ -z "${ACCESS_TOKEN}" ]] && { echo "zoho-curl.sh: empty access_token" >&2; exit 1; }
BODY_FILE="$(mktemp -t zohocurl.XXXXXX)"
HEADERS_FILE="$(mktemp -t zohohdr.XXXXXX)"
trap 'rm -f "${BODY_FILE}" "${HEADERS_FILE}"' EXIT
HTTP_CODE=$(do_call "${ACCESS_TOKEN}" "${BODY_FILE}" "${HEADERS_FILE}")
# Retry once on 401 with a fresh token (handles edge cases of refresh-token rotation)
if [[ "${HTTP_CODE}" == "401" ]]; then
ACCESS_TOKEN=$(get_access_token)
HTTP_CODE=$(do_call "${ACCESS_TOKEN}" "${BODY_FILE}" "${HEADERS_FILE}")
fi
cat "${BODY_FILE}"
if [[ "${HTTP_CODE}" -ge 400 ]]; then
echo "zoho-curl.sh: HTTP ${HTTP_CODE} on ${API_PATH}" >&2
exit 1
fi

View File

@@ -19,6 +19,12 @@ WISE_API_TOKEN=<from wise.com/settings/api-tokens>
WISE_PROFILE_ID=<numeric id of the BUSINESS profile — bank probe prints it>
# Optional: only needed if Wise ever opens the EU statement endpoint
WISE_SCA_KEY_PATH=~/.config/arcodange-erp/wise-sca-private.pem
# Required by arcodange-email-ingest only
ZOHO_CLIENT_ID=<from api-console.zoho.com self-client>
ZOHO_CLIENT_SECRET=<same>
ZOHO_REFRESH_TOKEN=<exchanged from one-time code via /oauth/v2/token>
ZOHO_DC=eu # eu | com | in | au
EOF
chmod 600 .claude/skills/dolibarr/.env
```

View File

@@ -151,7 +151,8 @@ Not available on this account (intentionally): `/setup/modules` (admin-only), `/
- Workflow skill for supplier-side TVA déductible (CA3 lignes 19 / 20 / 17+24): [dolibarr-tva-deductible](../dolibarr-tva-deductible/SKILL.md).
- Workflow skill for composite CA3-ready TVA summary (collectée + déductible + net): [dolibarr-tva-summary](../dolibarr-tva-summary/SKILL.md).
- **Bank-side reconciliation** (Qonto + Wise ↔ Dolibarr matching): [arcodange-bank-reco](../arcodange-bank-reco/SKILL.md).
- Future workflow skills follow the `dolibarr-<topic>` convention (ERP-internal) or `arcodange-<topic>` (cross-system, like bank reconciliation). Each one depends on this skill for connection + permissions + endpoint reference; each one keeps its triggers focused on its specific business workflow.
- **Email ingestion** (Zoho Mail → supplier-invoice draft for Dolibarr): [arcodange-email-ingest](../arcodange-email-ingest/SKILL.md).
- Future workflow skills follow the `dolibarr-<topic>` convention (ERP-internal) or `arcodange-<topic>` (cross-system). Each one depends on this skill for connection + permissions + endpoint reference; each one keeps its triggers focused on its specific business workflow.
## Out of scope

View File

@@ -76,6 +76,11 @@ COMMANDS
balance Live balances + Dolibarr cross-check per fk_account
curl <qonto|wise> <path> Raw read-only curl through bank-curl.sh
email Supplier-invoice emails from the Zoho mailbox
list [--folder|--limit|--candidates-only|--all-folders] List candidates
inspect <messageId> [--folder|--save-pdf|--json] Parse PDFs + draft Dolibarr entry
curl <path> Raw read-only curl through zoho-curl.sh
whoami GET /users/info — confirm auth
ping GET /status — liveness + Dolibarr version
curl <path> Raw read-only curl through dol-curl.sh
@@ -236,6 +241,30 @@ EOF
esac
;;
email)
sub="${1:-help}"; shift || true
case "${sub}" in
list) exec "${SKILLS}/arcodange-email-ingest/scripts/email-list.sh" "$@" ;;
inspect) exec "${SKILLS}/arcodange-email-ingest/scripts/email-inspect.sh" "$@" ;;
curl) exec "${SKILLS}/arcodange-email-ingest/scripts/zoho-curl.sh" "$@" ;;
help|-h|--help)
cat <<'EOF'
arcodange email — supplier-invoice ingestion from the Zoho mailbox.
list [--folder PATH|--limit N|--candidates-only|--all-folders]
List messages (default: /Inbox/books)
inspect <messageId> [--folder PATH|--save-pdf DIR|--json]
Parse PDF attachments, propose Dolibarr supplier-invoice draft
curl <path> Raw read-only call through zoho-curl.sh
Requires ZOHO_CLIENT_ID, ZOHO_CLIENT_SECRET, ZOHO_REFRESH_TOKEN, ZOHO_DC in .env.
See arcodange-email-ingest/SKILL.md for OAuth setup.
EOF
;;
*) echo "arcodange email: unknown subcommand '${sub}' (try 'arcodange email help')" >&2; exit 2 ;;
esac
;;
whoami)
exec "${DOLC}" /users/info
;;