add arcodange-email-ingest — Zoho Mail → Dolibarr supplier-invoice drafts #9
Reference in New Issue
Block a user
Delete Branch "claude/arcodange-email-ingest"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
V8 — first inbound cross-system skill. Closes the loop from "bill arrives by email" to "ready to enter in Dolibarr UI". 11th skill in the family, 2nd
arcodange-*afterarcodange-bank-reco.What ships
zoho-curl.sh— read-only OAuth wrapper for the Zoho Mail API. Caches the access_token in$TMPDIR/zoho-access-$USER(mode 600, 50 min TTL) to avoid Zoho's aggressive OAuth refresh throttle. Retries once on 401 with a fresh token.email-list.sh— list candidate supplier-invoice emails. Default scope/Inbox/books(the alias-auto-routed folder).--candidates-onlyfilters subjects matching supplier patterns OR attachments.--all-folderswidens the scan.email-inspect.sh— download all attachments for one message id, runpdftotexton each PDF, apply regex heuristics, emit a Dolibarr supplier-invoice draft JSON record per attachment.--save-pdf <DIR>to keep the PDFs for manual fallback when heuristics miss.Architecture choice
arcodange@gmail.com→books@arcodange.fr. Zero Google API setup.V8.0 baseline findings (in
/Inbox/books)3 candidates currently:
invoice-MSTRL-API-814045-001.pdf, 20 % FR VATInvoice-9BF...+Receipt-2109-4005.pdf, 180 € autoliquidation 0 %With
--all-folders --candidates-onlythe scan widens to 27 candidates including:/Notification— supplier invoice not in/Inbox/books!)/Inbox/abonnements)/clients/KissMetrics)Rate-limit pitfall documented
Zoho OAuth
/tokenendpoint has an aggressive throttle ("too many requests continuously" within a few seconds of refreshes). The cache file at$TMPDIR/zoho-access-$USER(mode 600, 50 min TTL) prevents this entirely. We hit it during V8 development — documented so the next operator knows.V8.1+ candidates (out of scope here)
\Flaggedor Zoho labelingested) to avoid re-processing on the next run.Invitation:/Updated invitation:/ sender domain matching newsletter patterns)./content).Heuristic extraction notes
PDF text varies wildly by template. The V8.0 heuristics work for some fields on some templates:
Operator uses the draft JSON as a starting point, fills missing fields from the PDF (saved via
--save-pdf).Test plan
bin/arcodange email curl /accounts→ returns the Arcodange account + alias listbin/arcodange email list --candidates-only→ 3 candidates in/Inbox/booksbin/arcodange email list --all-folders --candidates-only --limit 50→ 27+ candidates across foldersbin/arcodange email inspect 1775141901205014300→ Mistral PDF downloaded (74377 bytes), invoice_ref=API-814045-001, vat=20.0bin/arcodange email inspect 1776017238960014300→ 2 Anthropic PDFs, both with total_ht=180.00 / total_ttc=180.00$TMPDIR/zoho-access-$USERmode 600 after first callgit diff --cached | grep -F <ZOHO_REFRESH_TOKEN>empty (verified pre-commit)V8 — first inbound-side skill. Closes the loop from "bill arrives by email" to "ready to enter in Dolibarr UI". Read-only at every layer. What ships: - arcodange-email-ingest/scripts/zoho-curl.sh OAuth wrapper with token cache (50 min TTL, mode 600) — avoids hitting Zoho OAuth rate limit on every invocation. - arcodange-email-ingest/scripts/email-list.sh List candidates in /Inbox/books (where the books@ alias auto- routes mail). --candidates-only filter on supplier patterns or attachments. --all-folders to scan everything. - arcodange-email-ingest/scripts/email-inspect.sh Pull message + attachments, pdftotext on each PDF, heuristic extract (supplier, ref, dates, totals, VAT rate), emit Dolibarr supplier-invoice draft JSON. Architecture choice — Zoho API (not IMAP): - books@arcodange.fr is an alias of gabrielradureau@arcodange.fr → one OAuth refresh_token covers everything. - Gmail folded in via forwarding (arcodange@gmail.com → books@) — no Google API setup, no app-passwords, no second OAuth flow. - Token-based auth, no SCA rabbit hole. V8.0 baseline (in /Inbox/books): - 3 candidates: Mistral AI facture, Anthropic Stripe receipt (Fwd Gmail), INPI payment receipt (Fwd Gmail). - Heuristic extraction is best-effort: works on amounts/refs for some templates, misses others (Mistral PDF format, Stripe receipt layout). - --save-pdf <DIR> lets the operator grab the PDFs for manual entry when the heuristic falls short. Rate-limit pitfall documented: Zoho OAuth refresh has an aggressive throttle ("too many requests continuously"). The cache file at $TMPDIR/zoho-access-$USER (mode 600, 50 min TTL) prevents this; on 401 the wrapper auto-refreshes once and retries. V8.1+ ideas in SKILL.md out-of-scope: - mark ingested emails (IMAP flag or Zoho label) - body text extraction (inline-HTML invoices) - per-template parsers or LLM-based extraction - IMAP fallback for non-Zoho mailboxes CLI: bin/arcodange email {list|inspect|curl} integrated. Base updates: dolibarr/SKILL.md cross-link, dolibarr/README.md env schema extended with ZOHO_CLIENT_ID/SECRET/REFRESH_TOKEN/DC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>