add arcodange-email-ingest — Zoho Mail → Dolibarr supplier-invoice drafts
V8 — first inbound-side skill. Closes the loop from "bill arrives by email"
to "ready to enter in Dolibarr UI". Read-only at every layer.
What ships:
- arcodange-email-ingest/scripts/zoho-curl.sh OAuth wrapper with token cache
(50 min TTL, mode 600) — avoids
hitting Zoho OAuth rate limit on
every invocation.
- arcodange-email-ingest/scripts/email-list.sh List candidates in /Inbox/books
(where the books@ alias auto-
routes mail). --candidates-only
filter on supplier patterns or
attachments. --all-folders to
scan everything.
- arcodange-email-ingest/scripts/email-inspect.sh Pull message + attachments,
pdftotext on each PDF, heuristic
extract (supplier, ref, dates,
totals, VAT rate), emit Dolibarr
supplier-invoice draft JSON.
Architecture choice — Zoho API (not IMAP):
- books@arcodange.fr is an alias of gabrielradureau@arcodange.fr → one OAuth
refresh_token covers everything.
- Gmail folded in via forwarding (arcodange@gmail.com → books@) — no Google
API setup, no app-passwords, no second OAuth flow.
- Token-based auth, no SCA rabbit hole.
V8.0 baseline (in /Inbox/books):
- 3 candidates: Mistral AI facture, Anthropic Stripe receipt (Fwd Gmail),
INPI payment receipt (Fwd Gmail).
- Heuristic extraction is best-effort: works on amounts/refs for some
templates, misses others (Mistral PDF format, Stripe receipt layout).
- --save-pdf <DIR> lets the operator grab the PDFs for manual entry when
the heuristic falls short.
Rate-limit pitfall documented: Zoho OAuth refresh has an aggressive throttle
("too many requests continuously"). The cache file at $TMPDIR/zoho-access-$USER
(mode 600, 50 min TTL) prevents this; on 401 the wrapper auto-refreshes once
and retries.
V8.1+ ideas in SKILL.md out-of-scope:
- mark ingested emails (IMAP flag or Zoho label)
- body text extraction (inline-HTML invoices)
- per-template parsers or LLM-based extraction
- IMAP fallback for non-Zoho mailboxes
CLI: bin/arcodange email {list|inspect|curl} integrated.
Base updates: dolibarr/SKILL.md cross-link, dolibarr/README.md env schema
extended with ZOHO_CLIENT_ID/SECRET/REFRESH_TOKEN/DC.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
107
.claude/skills/arcodange-email-ingest/SKILL.md
Normal file
107
.claude/skills/arcodange-email-ingest/SKILL.md
Normal file
@@ -0,0 +1,107 @@
|
||||
---
|
||||
name: arcodange-email-ingest
|
||||
description: Scrape supplier-invoice emails from the Arcodange Zoho mailbox (`gabrielradureau@arcodange.fr` + its `books@arcodange.fr` alias + forwarded Gmail) via the Zoho Mail OAuth API, list candidates matching supplier patterns, download PDF attachments, run pdftotext + heuristic extract, and emit Dolibarr-ready supplier-invoice draft JSON for the operator to paste into the Dolibarr UI. Two workflows — (1) list candidates in a folder (default `/Inbox/books` where the alias auto-routes mail); (2) inspect one message by id, download + parse PDFs, propose draft entries. Surfaces concrete data: supplier name guess (first PDF line), invoice ref, invoice date, total HT/TVA/TTC, VAT rate. Read-only at every layer (Zoho scopes are READ-only; no write to Dolibarr). Use when the user asks "list pending supplier invoices in mail", "ingest invoices from email", "draft Dolibarr entry from this email", "audit cohort supplier docs from mail". Depends on `dolibarr` for the shared `.env`. SKIP for write-side Dolibarr operations (V9 candidate), for non-Zoho mailboxes (use IMAP fallback in a future skill if needed), and for attachments that aren't PDFs (only PDF text extraction is wired today).
|
||||
requires:
|
||||
bins: ["curl", "jq", "python3", "pdftotext"]
|
||||
auth: true
|
||||
---
|
||||
|
||||
# arcodange-email-ingest — supplier-invoice emails → Dolibarr draft
|
||||
|
||||
Close the inbound side of the accounting loop: bills land in `books@arcodange.fr`, this skill turns them into Dolibarr-ready draft entries for the operator to validate + create.
|
||||
|
||||
Depends on the [dolibarr](../dolibarr/SKILL.md) base skill (shared `.env`).
|
||||
|
||||
**CLI shortcuts:** `bin/arcodange email list | inspect | curl`
|
||||
|
||||
## Architecture choice — Zoho API, not IMAP
|
||||
|
||||
We chose the Zoho Mail OAuth API over IMAP because:
|
||||
- **Richer metadata** — folder paths, attachment IDs, search operators, threads.
|
||||
- **One account covers everything** — `books@arcodange.fr` is an alias of `gabrielradureau@arcodange.fr`. One refresh_token + the `/accounts` endpoint exposes both, plus all the other aliases (`contact@`, `bonjour@`, etc.).
|
||||
- **Gmail folded in via forwarding** — `arcodange@gmail.com` forwards incoming to `books@` (configured in Gmail UI). No Google API setup, no app-passwords, no second OAuth flow.
|
||||
- **Token-only auth** — no app-password fragility, no SCA dance (unlike Wise).
|
||||
|
||||
The single canonical inbox path: **`/Inbox/books`** — Zoho's auto-filter routes incoming mail to the `books@` alias into this sub-folder. Scan it first; widen with `--all-folders` only if needed.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. Base skill set up ([dolibarr/README.md](../dolibarr/README.md)).
|
||||
2. Zoho OAuth Self-Client created and a refresh_token generated. The `.env` extension:
|
||||
```
|
||||
ZOHO_CLIENT_ID=<from api-console.zoho.com self-client>
|
||||
ZOHO_CLIENT_SECRET=<same>
|
||||
ZOHO_REFRESH_TOKEN=<exchanged from one-time code>
|
||||
ZOHO_DC=eu # eu | com | in | au
|
||||
```
|
||||
Setup walkthrough is in the V8 prep section of the cohort review notes.
|
||||
3. Gmail forwarding to `books@arcodange.fr` enabled (Gmail Settings → Forwarding and POP/IMAP).
|
||||
4. `pdftotext` (`brew install poppler` on macOS).
|
||||
|
||||
## Workflows
|
||||
|
||||
### 1. List candidates
|
||||
|
||||
```bash
|
||||
bin/arcodange email list # default: /Inbox/books, last 30 msgs, no filter
|
||||
bin/arcodange email list --candidates-only # filter to subjects/attachments matching supplier patterns
|
||||
bin/arcodange email list --folder /Inbox/contact --limit 50
|
||||
bin/arcodange email list --all-folders --candidates-only # scan everything (slower, more API calls)
|
||||
```
|
||||
|
||||
Captured at [examples/email-list.txt](examples/email-list.txt). The candidate filter matches subjects against `facture|invoice|receipt|reçu|payment|paiement|abonnement|subscription|order|commande|bill` OR any message with an attachment. The `[*]` column marks candidates, `[Y]` marks emails with attachments.
|
||||
|
||||
### 2. Inspect one email + draft Dolibarr entry
|
||||
|
||||
```bash
|
||||
bin/arcodange email inspect 1775141901205014300
|
||||
bin/arcodange email inspect 1775141901205014300 --folder /Inbox/books # default
|
||||
bin/arcodange email inspect 1775141901205014300 --save-pdf ~/Documents/factures-2026-Q2/
|
||||
bin/arcodange email inspect 1775141901205014300 --json # machine-readable
|
||||
```
|
||||
|
||||
The script:
|
||||
1. Fetches the email metadata (subject / from / date) via `/messages/view`.
|
||||
2. Lists attachments via `/messages/{mid}/attachmentinfo`.
|
||||
3. Downloads each attachment via `/messages/{mid}/attachments/{aid}`.
|
||||
4. For each `.pdf`, runs `pdftotext -layout`, applies regex heuristics to extract:
|
||||
- Supplier name guess (first non-empty PDF line — often the supplier letterhead).
|
||||
- Invoice reference (`facture/invoice n° XXX`).
|
||||
- Invoice date.
|
||||
- Total HT / TVA / TTC + VAT rate %.
|
||||
5. Emits a draft JSON record per attachment — paste into the Dolibarr UI manually.
|
||||
|
||||
Heuristics are intentionally conservative (regex-based, no LLM dependency). For PDF templates where the heuristic fails, the raw `pdftotext` output is on disk in the work dir; rerun with `--save-pdf` to grab the PDF for manual entry.
|
||||
|
||||
Captured at [examples/email-inspect.txt](examples/email-inspect.txt) for the V8 baseline (Mistral AI receipt).
|
||||
|
||||
## What it doesn't do (V8.0 scope)
|
||||
|
||||
- **Does not write to Dolibarr.** The supplier invoice is still created manually in the Dolibarr UI from the draft JSON. V9 candidate: automate via `/supplierinvoices` POST.
|
||||
- **Does not mark emails as ingested.** Each run re-emits the same candidates. V8.1 candidate: set the IMAP `\Flagged` flag or add a Zoho label `ingested` after the operator confirms.
|
||||
- **No body extraction yet.** We only parse PDF attachments. Inline-HTML invoices (rare — most suppliers send PDFs) would need body fetch via `/content`.
|
||||
- **Heuristic extraction is best-effort.** Different supplier PDF templates yield different field-extraction reliability. The draft JSON is a starting point, not ground truth.
|
||||
|
||||
## Token cache
|
||||
|
||||
`zoho-curl.sh` caches the OAuth access_token in `$TMPDIR/zoho-access-$USER` (mode 600, TTL 50 min). Avoids hitting Zoho's OAuth refresh rate-limit on every invocation. On 401, the wrapper auto-refreshes once and retries.
|
||||
|
||||
## API endpoints used (Zoho Mail)
|
||||
|
||||
| Endpoint | Purpose |
|
||||
|---|---|
|
||||
| `POST /oauth/v2/token` (accounts.zoho.{dc}) | Refresh access_token from refresh_token |
|
||||
| `GET /accounts` | Discover accountId + aliases on the account |
|
||||
| `GET /accounts/{aid}/folders` | List folders (with paths like `/Inbox/books`) |
|
||||
| `GET /accounts/{aid}/messages/view?folderId=&limit=&start=` | List messages in a folder |
|
||||
| `GET /accounts/{aid}/folders/{fid}/messages/{mid}/attachmentinfo` | List attachments metadata |
|
||||
| `GET /accounts/{aid}/folders/{fid}/messages/{mid}/attachments/{aid}` | Download attachment bytes |
|
||||
|
||||
## Out of scope
|
||||
|
||||
- **Writing to Dolibarr** (V9 candidate — would lift the read-only constraint on the API key, or use a separate write-scoped key).
|
||||
- **Marking ingested emails** (V8.1 trivial follow-up).
|
||||
- **Non-PDF attachments** (heuristics are PDF-specific).
|
||||
- **Body-text extraction** (would need `/content` endpoint, deferred).
|
||||
- **IMAP fallback** for non-Zoho mailboxes (deferred — Gmail forwarding to books@ covers the only known external mailbox today).
|
||||
- **LLM-based extraction** (deferred — regex covers the current set of supplier templates well enough).
|
||||
@@ -0,0 +1,31 @@
|
||||
================================================================================
|
||||
Email 1775141901205014300
|
||||
================================================================================
|
||||
subject : Votre facture nº MSTRL-API-814045-001 de Mistral AI SAS
|
||||
from : no-reply@mistral.ai
|
||||
date : 2026-04-02
|
||||
attached : True
|
||||
|
||||
-- Attachment 1: invoice-MSTRL-API-814045-001.pdf (74377 bytes, 1771 chars extracted) --
|
||||
pdf_top_line = 'Facture'
|
||||
invoice_ref = 'API-814045-001'
|
||||
invoice_date_raw = None
|
||||
total_ht = None
|
||||
total_tva = None
|
||||
total_ttc = None
|
||||
vat_rate_pct = '20.0'
|
||||
|
||||
Suggested Dolibarr supplier-invoice draft entries:
|
||||
[
|
||||
{
|
||||
"supplier_hint": "Facture",
|
||||
"invoice_ref": "API-814045-001",
|
||||
"invoice_date": null,
|
||||
"total_ht": null,
|
||||
"total_tva": null,
|
||||
"total_ttc": null,
|
||||
"vat_rate_pct": "20.0",
|
||||
"source_email": "1775141901205014300",
|
||||
"source_attachment": "invoice-MSTRL-API-814045-001.pdf"
|
||||
}
|
||||
]
|
||||
@@ -0,0 +1,7 @@
|
||||
date cand att messageId folder from subject
|
||||
----------------------------------------------------------------------------------------------------------------------------------
|
||||
2026-04-12 [*] [Y] 1776017238960014300 /Inbox/books arcodange@gmail.com Fwd: Your receipt from Anthropic Ireland, Limited #2109
|
||||
2026-04-02 [*] [Y] 1775141901205014300 /Inbox/books no-reply@mistral.ai Votre facture nº MSTRL-API-814045-001 de Mistral AI SAS
|
||||
2026-01-09 [*] [ ] 1767989744791004400 /Inbox/books gabrielradureau@gmail.com Fwd: INPI - Votre paiement pour la commande Réf. 181876
|
||||
----------------------------------------------------------------------------------------------------------------------------------
|
||||
# 3 message(s) (candidates only)
|
||||
256
.claude/skills/arcodange-email-ingest/scripts/email-inspect.sh
Executable file
256
.claude/skills/arcodange-email-ingest/scripts/email-inspect.sh
Executable file
@@ -0,0 +1,256 @@
|
||||
#!/usr/bin/env bash
|
||||
# Inspect one email by id and propose a Dolibarr supplier-invoice draft.
|
||||
#
|
||||
# Usage:
|
||||
# email-inspect.sh <messageId> [--folder PATH] # default folder: /Inbox/books
|
||||
# [--save-pdf DIR] # save PDF attachments under DIR/
|
||||
# [--json] # emit a single JSON object on stdout
|
||||
#
|
||||
# Pipeline (read-only):
|
||||
# 1. Find the message (in the given folder, default /Inbox/books).
|
||||
# 2. List attachments via /attachmentinfo.
|
||||
# 3. For each PDF attachment: download, run pdftotext, extract supplier-side
|
||||
# heuristics (name, totals, dates, ref).
|
||||
# 4. Emit a draft "Dolibarr-ready" record per attachment so the operator can
|
||||
# hand-create the supplier invoice in the Dolibarr UI.
|
||||
#
|
||||
# This skill DOES NOT write to Dolibarr. Auto-creation of supplier invoices is
|
||||
# V9 candidate.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
ZOHO_CURL="${SCRIPT_DIR}/zoho-curl.sh"
|
||||
|
||||
if [[ $# -lt 1 ]]; then
|
||||
echo "email-inspect.sh: missing <messageId>" >&2
|
||||
echo " Hint: bin/arcodange email list to see candidate ids." >&2
|
||||
exit 2
|
||||
fi
|
||||
MID="$1"; shift || true
|
||||
FOLDER="/Inbox/books"; SAVE_PDF_DIR=""; FMT="text"
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--folder) FOLDER="$2"; shift 2 ;;
|
||||
--save-pdf) SAVE_PDF_DIR="$2"; shift 2 ;;
|
||||
--json) FMT="json"; shift ;;
|
||||
-h|--help) sed -n '2,18p' "$0" | sed 's/^# \{0,1\}//'; exit 0 ;;
|
||||
*) echo "email-inspect.sh: unknown arg: $1" >&2; exit 2 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
command -v pdftotext >/dev/null || { echo "email-inspect.sh: pdftotext not found (brew install poppler)" >&2; exit 2; }
|
||||
|
||||
WORK="$(mktemp -d -t emailinspect.XXXXXX)"
|
||||
trap 'rm -rf "${WORK}"' EXIT
|
||||
|
||||
# 1. accountId + folderId
|
||||
"${ZOHO_CURL}" /accounts > "${WORK}/accounts.json"
|
||||
AID=$(python3 -c "import json,sys; d=json.load(open(sys.argv[1])); print((d.get('data') or [{}])[0].get('accountId',''))" "${WORK}/accounts.json")
|
||||
"${ZOHO_CURL}" "/accounts/${AID}/folders" > "${WORK}/folders.json"
|
||||
FID=$(python3 -c "
|
||||
import json, sys
|
||||
d = json.load(open(sys.argv[1]))
|
||||
target = sys.argv[2]
|
||||
for f in (d.get('data') or []):
|
||||
if f.get('path') == target:
|
||||
print(f.get('folderId')); break" "${WORK}/folders.json" "${FOLDER}")
|
||||
[[ -z "${FID}" ]] && { echo "email-inspect.sh: folder '${FOLDER}' not found" >&2; exit 2; }
|
||||
|
||||
# 2. Find the message in the folder listing (to grab metadata: subject, from, date)
|
||||
"${ZOHO_CURL}" "/accounts/${AID}/messages/view?folderId=${FID}&limit=100&sortorder=false&start=1" > "${WORK}/folder_msgs.json"
|
||||
python3 - "${WORK}/folder_msgs.json" "${MID}" > "${WORK}/meta.json" <<'PY'
|
||||
import json, sys
|
||||
d = json.load(open(sys.argv[1]))
|
||||
mid = sys.argv[2]
|
||||
for m in (d.get("data") or []):
|
||||
if str(m.get("messageId")) == mid:
|
||||
json.dump(m, sys.stdout); sys.exit(0)
|
||||
sys.exit(f"messageId {mid} not found in this folder")
|
||||
PY
|
||||
|
||||
# 3. Attachment metadata
|
||||
"${ZOHO_CURL}" "/accounts/${AID}/folders/${FID}/messages/${MID}/attachmentinfo" > "${WORK}/attachinfo.json"
|
||||
|
||||
# 4. Download each attachment — needs raw bytes (Accept: */*), not the JSON
|
||||
# wrapper's default. We bypass zoho-curl.sh for the attachment download but
|
||||
# reuse the cached access_token it wrote.
|
||||
set -a; source "${SCRIPT_DIR}/../../dolibarr/.env"; set +a
|
||||
: "${ZOHO_DC:=eu}"
|
||||
TOKEN_CACHE="${TMPDIR:-/tmp}/zoho-access-$(whoami)"
|
||||
if [[ ! -s "${TOKEN_CACHE}" ]]; then
|
||||
echo "email-inspect.sh: missing access token cache — run any zoho-curl call first to populate it" >&2
|
||||
exit 2
|
||||
fi
|
||||
ACCESS_TOKEN=$(cat "${TOKEN_CACHE}")
|
||||
MAIL_BASE="https://mail.zoho.${ZOHO_DC}/api"
|
||||
|
||||
mkdir -p "${WORK}/atts" "${WORK}/text"
|
||||
ATT_IDS=$(python3 -c "
|
||||
import json, sys
|
||||
d = json.load(open(sys.argv[1]))
|
||||
data = d.get('data') or {}
|
||||
for a in (data.get('attachments') or []):
|
||||
print(f\"{a.get('attachmentId')}|{a.get('attachmentName','-')}\")" "${WORK}/attachinfo.json")
|
||||
while IFS='|' read -r aid aname; do
|
||||
[[ -z "${aid}" ]] && continue
|
||||
outpath="${WORK}/atts/${aname}"
|
||||
curl -sS \
|
||||
-H "Authorization: Zoho-oauthtoken ${ACCESS_TOKEN}" \
|
||||
-H "Accept: */*" \
|
||||
--max-time 60 \
|
||||
-o "${outpath}" \
|
||||
"${MAIL_BASE}/accounts/${AID}/folders/${FID}/messages/${MID}/attachments/${aid}" || true
|
||||
# If pdf, extract text (bash 3.2 compatible — no ${var,,})
|
||||
aname_lc=$(echo "${aname}" | tr '[:upper:]' '[:lower:]')
|
||||
if [[ "${aname_lc}" == *.pdf ]]; then
|
||||
pdftotext -layout "${outpath}" "${WORK}/text/${aname%.pdf}.txt" 2>/dev/null || true
|
||||
fi
|
||||
done <<< "${ATT_IDS}"
|
||||
|
||||
# Optional save
|
||||
if [[ -n "${SAVE_PDF_DIR}" ]]; then
|
||||
mkdir -p "${SAVE_PDF_DIR}"
|
||||
cp "${WORK}/atts/"*.pdf "${SAVE_PDF_DIR}/" 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# 5. Heuristic extract + render
|
||||
python3 - "${WORK}" "${FMT}" <<'PY'
|
||||
import json, sys, os, re, datetime, glob
|
||||
work, fmt = sys.argv[1:3]
|
||||
|
||||
meta = json.load(open(os.path.join(work,"meta.json")))
|
||||
ts = int(meta.get("sentDateInGMT") or meta.get("receivedTime") or 0) // 1000
|
||||
mail_date = datetime.datetime.fromtimestamp(ts).strftime("%Y-%m-%d") if ts else None
|
||||
mail_from = (meta.get("fromAddress") or meta.get("sender") or "-").replace("<","<").replace(">",">").replace("<","").replace(">","")
|
||||
mail_subject = meta.get("subject") or "-"
|
||||
|
||||
# Heuristics on PDF text
|
||||
def extract(text):
|
||||
out = {}
|
||||
# First non-empty line is often the supplier name (or the address block first line)
|
||||
lines = [l.strip() for l in text.splitlines() if l.strip()]
|
||||
out["pdf_top_line"] = lines[0] if lines else None
|
||||
|
||||
# Total TTC / HT / TVA — try multiple French/English patterns
|
||||
def first_match(*patterns):
|
||||
for p in patterns:
|
||||
for line in lines:
|
||||
m = re.search(p, line, re.IGNORECASE)
|
||||
if m: return m.group(1).replace(",", ".").replace(" ", "")
|
||||
return None
|
||||
|
||||
def parse_amount(s):
|
||||
if not s: return None
|
||||
clean = s.replace(",", ".").replace(" ", "")
|
||||
try:
|
||||
v = float(clean)
|
||||
# Money amounts < 1M EUR; filters out VAT-number false positives (FR12345678901)
|
||||
return v if 0 <= v < 1_000_000 else None
|
||||
except: return None
|
||||
|
||||
def first_amount(*patterns):
|
||||
for p in patterns:
|
||||
for line in lines:
|
||||
m = re.search(p, line, re.IGNORECASE)
|
||||
if m:
|
||||
v = parse_amount(m.group(1))
|
||||
if v is not None: return f"{v:.2f}"
|
||||
return None
|
||||
|
||||
out["total_ht"] = first_amount(r'(?:total\s*ht|montant\s*ht|net\s*amount|subtotal)[^\d-]*([\d \.,]+)')
|
||||
# TVA: require currency suffix to avoid matching VAT-number digits
|
||||
out["total_tva"] = first_amount(r'(?:tva|vat)[^\d-]*([\d \.,]+)\s*(?:€|eur)\b')
|
||||
out["total_ttc"] = first_amount(r'(?:total\s*ttc|amount\s*due|total\s*due|grand\s*total|montant\s*total|amount\s*paid)[^\d-]*([\d \.,]+)')
|
||||
|
||||
# Invoice ref — must contain a digit (filters "umber", "Invoice", etc.)
|
||||
m = re.search(r'(?:facture|invoice|receipt|reçu)\s*(?:n[°o]?|number|#|:)\s*([A-Za-z0-9][\w\d/-]{2,})', text, re.IGNORECASE)
|
||||
if m and any(c.isdigit() for c in m.group(1)):
|
||||
out["invoice_ref"] = m.group(1)
|
||||
else:
|
||||
# Fallback: any reasonable ref-shaped token after "Invoice" / "Facture" header
|
||||
m = re.search(r'\b([A-Z]{2,}[-/]?\d[\w\d/-]{2,})\b', text)
|
||||
out["invoice_ref"] = m.group(1) if m else None
|
||||
|
||||
# Invoice date — try ISO, French DD/MM/YYYY, English MM/DD/YYYY, French long form
|
||||
out["invoice_date_raw"] = None
|
||||
for p in (
|
||||
r'\b(\d{4}-\d{2}-\d{2})\b',
|
||||
r'(?:date|émise\s*le|invoice\s*date|date\s*de\s*facturation)[:\s]*(\d{1,2}[\s/.-]\d{1,2}[\s/.-]\d{2,4})',
|
||||
r'(?:date|émise\s*le|invoice\s*date)[:\s]*(\d{1,2}\s+\w{3,9}\.?\s+\d{4})',
|
||||
):
|
||||
m = re.search(p, text, re.IGNORECASE)
|
||||
if m: out["invoice_date_raw"] = m.group(1).strip(); break
|
||||
|
||||
# VAT rate (e.g. "20%") — restrict to 0-25% so "100%" / page footers don't match.
|
||||
vrate = None
|
||||
for line in lines:
|
||||
m = re.search(r'\b(\d{1,2}([.,]\d+)?)\s*%', line)
|
||||
if m:
|
||||
v = float(m.group(1).replace(",", "."))
|
||||
if 0 <= v <= 25:
|
||||
vrate = m.group(1).replace(",", "."); break
|
||||
out["vat_rate_pct"] = vrate
|
||||
|
||||
return out
|
||||
|
||||
pdfs = []
|
||||
for pdf in sorted(glob.glob(os.path.join(work,"atts","*.pdf")) +
|
||||
glob.glob(os.path.join(work,"atts","*.PDF"))):
|
||||
name = os.path.basename(pdf)
|
||||
txt_path = os.path.join(work,"text", os.path.splitext(name)[0] + ".txt")
|
||||
text = open(txt_path).read() if os.path.isfile(txt_path) else ""
|
||||
h = extract(text)
|
||||
h["attachment_name"] = name
|
||||
h["pdf_size_bytes"] = os.path.getsize(pdf)
|
||||
h["pdf_text_len"] = len(text)
|
||||
pdfs.append(h)
|
||||
|
||||
result = {
|
||||
"email": {
|
||||
"messageId": meta.get("messageId"),
|
||||
"subject": mail_subject,
|
||||
"from": mail_from,
|
||||
"date": mail_date,
|
||||
"hasAttachment": str(meta.get("hasAttachment","")) == "1",
|
||||
},
|
||||
"attachments": pdfs,
|
||||
"dolibarr_draft_suggestions": [
|
||||
{
|
||||
"supplier_hint": p.get("pdf_top_line"),
|
||||
"invoice_ref": p.get("invoice_ref"),
|
||||
"invoice_date": p.get("invoice_date_raw"),
|
||||
"total_ht": p.get("total_ht"),
|
||||
"total_tva": p.get("total_tva"),
|
||||
"total_ttc": p.get("total_ttc"),
|
||||
"vat_rate_pct": p.get("vat_rate_pct"),
|
||||
"source_email": meta.get("messageId"),
|
||||
"source_attachment": p.get("attachment_name"),
|
||||
} for p in pdfs
|
||||
]
|
||||
}
|
||||
|
||||
if fmt == "json":
|
||||
print(json.dumps(result, indent=2, ensure_ascii=False))
|
||||
sys.exit(0)
|
||||
|
||||
print("=" * 80)
|
||||
print(f" Email {meta.get('messageId')}")
|
||||
print("=" * 80)
|
||||
print(f" subject : {mail_subject}")
|
||||
print(f" from : {mail_from}")
|
||||
print(f" date : {mail_date}")
|
||||
print(f" attached : {result['email']['hasAttachment']}")
|
||||
print()
|
||||
if not pdfs:
|
||||
print(" (no PDF attachments — try inspecting body or other types)")
|
||||
for i, p in enumerate(pdfs, 1):
|
||||
print(f" -- Attachment {i}: {p['attachment_name']} ({p['pdf_size_bytes']} bytes, {p['pdf_text_len']} chars extracted) --")
|
||||
for k in ("pdf_top_line","invoice_ref","invoice_date_raw","total_ht","total_tva","total_ttc","vat_rate_pct"):
|
||||
v = p.get(k)
|
||||
print(f" {k:<16} = {v!r}")
|
||||
print()
|
||||
|
||||
print(" Suggested Dolibarr supplier-invoice draft entries:")
|
||||
print(json.dumps(result["dolibarr_draft_suggestions"], indent=4, ensure_ascii=False))
|
||||
PY
|
||||
121
.claude/skills/arcodange-email-ingest/scripts/email-list.sh
Executable file
121
.claude/skills/arcodange-email-ingest/scripts/email-list.sh
Executable file
@@ -0,0 +1,121 @@
|
||||
#!/usr/bin/env bash
|
||||
# List candidate supplier-invoice emails from the books@ Zoho mailbox.
|
||||
#
|
||||
# Usage:
|
||||
# email-list.sh [--folder PATH] # default: /Inbox/books (the books@ alias-filtered folder)
|
||||
# [--limit N] # default: 30
|
||||
# [--candidates-only] # filter by subject pattern OR attachment
|
||||
# [--all-folders] # scan every folder (slow, lots of API calls)
|
||||
#
|
||||
# Output: table with mid, date, from, subject, hasAttachment.
|
||||
# A "candidate" is a message whose subject matches a supplier-like pattern
|
||||
# (facture/invoice/receipt/reçu/payment/paiement/abonnement/order/commande)
|
||||
# OR which has an attachment.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
ZOHO_CURL="${SCRIPT_DIR}/zoho-curl.sh"
|
||||
|
||||
FOLDER="/Inbox/books"
|
||||
LIMIT=30
|
||||
CANDIDATES_ONLY=0
|
||||
ALL_FOLDERS=0
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--folder) FOLDER="$2"; shift 2 ;;
|
||||
--limit) LIMIT="$2"; shift 2 ;;
|
||||
--candidates-only) CANDIDATES_ONLY=1; shift ;;
|
||||
--all-folders) ALL_FOLDERS=1; shift ;;
|
||||
-h|--help) sed -n '2,12p' "$0" | sed 's/^# \{0,1\}//'; exit 0 ;;
|
||||
*) echo "email-list.sh: unknown arg: $1" >&2; exit 2 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
WORK="$(mktemp -d -t emailist.XXXXXX)"
|
||||
trap 'rm -rf "${WORK}"' EXIT
|
||||
|
||||
# 1. Discover accountId
|
||||
"${ZOHO_CURL}" /accounts > "${WORK}/accounts.json"
|
||||
AID=$(python3 -c "import json,sys; d=json.load(open(sys.argv[1])); print((d.get('data') or [{}])[0].get('accountId',''))" "${WORK}/accounts.json")
|
||||
[[ -z "${AID}" ]] && { echo "email-list.sh: no accountId in /accounts response" >&2; exit 1; }
|
||||
|
||||
# 2. Resolve folder path → folderId
|
||||
"${ZOHO_CURL}" "/accounts/${AID}/folders" > "${WORK}/folders.json"
|
||||
|
||||
# Build list of (folderId, path) tuples to scan
|
||||
if [[ "${ALL_FOLDERS}" == "1" ]]; then
|
||||
FOLDER_IDS=$(python3 -c "
|
||||
import json, sys
|
||||
d = json.load(open(sys.argv[1]))
|
||||
for f in (d.get('data') or []):
|
||||
fid = f.get('folderId'); path = f.get('path') or f.get('folderName','-')
|
||||
# Skip noisy system folders
|
||||
if path in ('/Drafts','/Templates','/Snoozed','/Sent','/Spam','/Trash','/Outbox'): continue
|
||||
print(f\"{fid}|{path}\")" "${WORK}/folders.json")
|
||||
else
|
||||
FOLDER_IDS=$(python3 -c "
|
||||
import json, sys
|
||||
d = json.load(open(sys.argv[1]))
|
||||
target = sys.argv[2]
|
||||
for f in (d.get('data') or []):
|
||||
if f.get('path') == target:
|
||||
print(f\"{f.get('folderId')}|{f.get('path')}\")
|
||||
break" "${WORK}/folders.json" "${FOLDER}")
|
||||
if [[ -z "${FOLDER_IDS}" ]]; then
|
||||
echo "email-list.sh: folder '${FOLDER}' not found. Available:" >&2
|
||||
python3 -c "import json,sys; [print(f' {f.get(\"path\",\"-\")}') for f in json.load(open(sys.argv[1])).get('data',[])]" "${WORK}/folders.json" >&2
|
||||
exit 2
|
||||
fi
|
||||
fi
|
||||
|
||||
# 3. Fetch messages per folder
|
||||
mkdir -p "${WORK}/msgs"
|
||||
COUNT=0
|
||||
while IFS='|' read -r fid fpath; do
|
||||
[[ -z "${fid}" ]] && continue
|
||||
COUNT=$((COUNT+1))
|
||||
out="${WORK}/msgs/$(printf '%03d' "${COUNT}").json"
|
||||
"${ZOHO_CURL}" "/accounts/${AID}/messages/view?folderId=${fid}&limit=${LIMIT}&sortorder=false&start=1" > "${out}" 2>/dev/null || echo '{"data":[]}' > "${out}"
|
||||
echo "${fpath}" > "${out}.path"
|
||||
done <<< "${FOLDER_IDS}"
|
||||
|
||||
# 4. Render
|
||||
python3 - "${WORK}/msgs" "${CANDIDATES_ONLY}" <<'PY'
|
||||
import json, sys, os, re, datetime, glob
|
||||
msgs_dir, candidates_only_str = sys.argv[1:3]
|
||||
candidates_only = candidates_only_str == "1"
|
||||
|
||||
CANDIDATE_PATTERN = re.compile(
|
||||
r'facture|invoice|receipt|re[cç]u|payment|paiement|abonnement|subscription|order|commande|invoice|bill',
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
def is_candidate(m):
|
||||
if str(m.get("hasAttachment","")) == "1": return True
|
||||
if CANDIDATE_PATTERN.search(m.get("subject","") or ""): return True
|
||||
return False
|
||||
|
||||
rows = []
|
||||
for f in sorted(glob.glob(os.path.join(msgs_dir, "*.json"))):
|
||||
fpath = open(f + ".path").read().strip()
|
||||
try: data = json.load(open(f)).get("data") or []
|
||||
except: continue
|
||||
for m in data:
|
||||
if candidates_only and not is_candidate(m): continue
|
||||
ts = int(m.get("sentDateInGMT") or m.get("receivedTime") or 0) // 1000
|
||||
dt = datetime.datetime.fromtimestamp(ts).strftime("%Y-%m-%d") if ts else "-"
|
||||
frm = (m.get("fromAddress") or m.get("sender") or "-").replace("<","<").replace(">",">").replace("<","").replace(">","")[:36]
|
||||
subj = (m.get("subject") or "-")[:55]
|
||||
has = "Y" if str(m.get("hasAttachment","")) == "1" else " "
|
||||
cand = "*" if is_candidate(m) else " "
|
||||
rows.append((dt, fpath, cand, has, m.get("messageId","-"), frm, subj))
|
||||
|
||||
rows.sort(key=lambda r: r[0], reverse=True)
|
||||
print(f"{'date':<10} {'cand':<4} {'att':<3} {'messageId':<22} {'folder':<22} {'from':<36} subject")
|
||||
print("-" * 130)
|
||||
for dt, fpath, cand, has, mid, frm, subj in rows:
|
||||
print(f"{dt:<10} [{cand}] [{has}] {mid:<22} {fpath[:22]:<22} {frm:<36} {subj}")
|
||||
print("-" * 130)
|
||||
print(f"# {len(rows)} message(s)" + (" (candidates only)" if candidates_only else ""))
|
||||
PY
|
||||
126
.claude/skills/arcodange-email-ingest/scripts/zoho-curl.sh
Executable file
126
.claude/skills/arcodange-email-ingest/scripts/zoho-curl.sh
Executable file
@@ -0,0 +1,126 @@
|
||||
#!/usr/bin/env bash
|
||||
# Read-only curl wrapper for the Zoho Mail API.
|
||||
#
|
||||
# Usage:
|
||||
# zoho-curl.sh <path> # e.g. zoho-curl.sh /accounts
|
||||
# zoho-curl.sh -i <path> # include curl's -i (response headers)
|
||||
# zoho-curl.sh -o file.json <path> # write body to file
|
||||
#
|
||||
# Reads credentials from ../../dolibarr/.env (the shared canonical file).
|
||||
# Required vars:
|
||||
# ZOHO_CLIENT_ID, ZOHO_CLIENT_SECRET, ZOHO_REFRESH_TOKEN, ZOHO_DC
|
||||
#
|
||||
# Token strategy: each invocation refreshes a short-lived access_token from
|
||||
# the refresh_token (Zoho access_tokens live 1h; the cost of refreshing on
|
||||
# every call is ~150 ms and avoids state on disk). On 401 from the mail API
|
||||
# we re-refresh once and retry (covers refresh-token rotation cases).
|
||||
#
|
||||
# Exits non-zero on HTTP >= 400 and writes body to stdout + a short message
|
||||
# to stderr — same shape as dol-curl.sh / bank-curl.sh.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
ENV_FILE="${SCRIPT_DIR}/../../dolibarr/.env"
|
||||
|
||||
if [[ ! -f "${ENV_FILE}" ]]; then
|
||||
echo "zoho-curl.sh: missing ${ENV_FILE}" >&2
|
||||
echo " Required vars: ZOHO_CLIENT_ID, ZOHO_CLIENT_SECRET, ZOHO_REFRESH_TOKEN, ZOHO_DC." >&2
|
||||
echo " See arcodange-email-ingest/SKILL.md for the OAuth setup." >&2
|
||||
exit 2
|
||||
fi
|
||||
set -a; source "${ENV_FILE}"; set +a
|
||||
|
||||
: "${ZOHO_CLIENT_ID:?zoho-curl.sh: ZOHO_CLIENT_ID not set in .env}"
|
||||
: "${ZOHO_CLIENT_SECRET:?zoho-curl.sh: ZOHO_CLIENT_SECRET not set in .env}"
|
||||
: "${ZOHO_REFRESH_TOKEN:?zoho-curl.sh: ZOHO_REFRESH_TOKEN not set in .env}"
|
||||
: "${ZOHO_DC:=eu}"
|
||||
|
||||
ACCOUNTS_BASE="https://accounts.zoho.${ZOHO_DC}"
|
||||
MAIL_BASE="https://mail.zoho.${ZOHO_DC}/api"
|
||||
|
||||
# Parse pass-through curl args (everything before the last positional)
|
||||
PASSTHRU=()
|
||||
while [[ $# -gt 1 ]]; do
|
||||
PASSTHRU+=("$1"); shift
|
||||
done
|
||||
if [[ $# -lt 1 ]]; then
|
||||
echo "zoho-curl.sh: missing API path. Example: zoho-curl.sh /accounts" >&2
|
||||
exit 2
|
||||
fi
|
||||
API_PATH="$1"
|
||||
|
||||
# Cache access_token in tmpfs to avoid hitting OAuth rate limits on every
|
||||
# zoho-curl invocation. Zoho access_tokens live 1h; we refresh after 50 min.
|
||||
CACHE_FILE="${TMPDIR:-/tmp}/zoho-access-$(whoami)"
|
||||
CACHE_TTL_SECONDS=$((50 * 60))
|
||||
|
||||
get_access_token() {
|
||||
if [[ -f "${CACHE_FILE}" ]]; then
|
||||
local age
|
||||
age=$(( $(date +%s) - $(stat -f %m "${CACHE_FILE}" 2>/dev/null || stat -c %Y "${CACHE_FILE}") ))
|
||||
if [[ ${age} -lt ${CACHE_TTL_SECONDS} ]]; then
|
||||
cat "${CACHE_FILE}"
|
||||
return 0
|
||||
fi
|
||||
fi
|
||||
local token
|
||||
if ! token=$(curl -sS -X POST "${ACCOUNTS_BASE}/oauth/v2/token" \
|
||||
--max-time 15 \
|
||||
-d "grant_type=refresh_token" \
|
||||
-d "client_id=${ZOHO_CLIENT_ID}" \
|
||||
-d "client_secret=${ZOHO_CLIENT_SECRET}" \
|
||||
-d "refresh_token=${ZOHO_REFRESH_TOKEN}" \
|
||||
| python3 -c "
|
||||
import json, sys
|
||||
try: d = json.load(sys.stdin)
|
||||
except: sys.exit('failed to parse OAuth response')
|
||||
if 'access_token' not in d:
|
||||
sys.exit(f'OAuth refresh failed: {d}')
|
||||
print(d['access_token'])"); then
|
||||
return 1
|
||||
fi
|
||||
if [[ -z "${token}" ]]; then
|
||||
return 1
|
||||
fi
|
||||
# Store cache (mode 600) only on success
|
||||
printf '%s' "${token}" > "${CACHE_FILE}"
|
||||
chmod 600 "${CACHE_FILE}"
|
||||
printf '%s' "${token}"
|
||||
}
|
||||
|
||||
do_call() {
|
||||
local token="$1"
|
||||
local body_file="$2"
|
||||
local headers_file="$3"
|
||||
curl -sS \
|
||||
-H "Authorization: Zoho-oauthtoken ${token}" \
|
||||
-H "Accept: application/json" \
|
||||
--max-time 30 \
|
||||
-o "${body_file}" \
|
||||
-D "${headers_file}" \
|
||||
-w "%{http_code}" \
|
||||
${PASSTHRU[@]+"${PASSTHRU[@]}"} \
|
||||
"${MAIL_BASE}${API_PATH}"
|
||||
}
|
||||
|
||||
ACCESS_TOKEN=$(get_access_token)
|
||||
[[ -z "${ACCESS_TOKEN}" ]] && { echo "zoho-curl.sh: empty access_token" >&2; exit 1; }
|
||||
|
||||
BODY_FILE="$(mktemp -t zohocurl.XXXXXX)"
|
||||
HEADERS_FILE="$(mktemp -t zohohdr.XXXXXX)"
|
||||
trap 'rm -f "${BODY_FILE}" "${HEADERS_FILE}"' EXIT
|
||||
|
||||
HTTP_CODE=$(do_call "${ACCESS_TOKEN}" "${BODY_FILE}" "${HEADERS_FILE}")
|
||||
|
||||
# Retry once on 401 with a fresh token (handles edge cases of refresh-token rotation)
|
||||
if [[ "${HTTP_CODE}" == "401" ]]; then
|
||||
ACCESS_TOKEN=$(get_access_token)
|
||||
HTTP_CODE=$(do_call "${ACCESS_TOKEN}" "${BODY_FILE}" "${HEADERS_FILE}")
|
||||
fi
|
||||
|
||||
cat "${BODY_FILE}"
|
||||
if [[ "${HTTP_CODE}" -ge 400 ]]; then
|
||||
echo "zoho-curl.sh: HTTP ${HTTP_CODE} on ${API_PATH}" >&2
|
||||
exit 1
|
||||
fi
|
||||
@@ -19,6 +19,12 @@ WISE_API_TOKEN=<from wise.com/settings/api-tokens>
|
||||
WISE_PROFILE_ID=<numeric id of the BUSINESS profile — bank probe prints it>
|
||||
# Optional: only needed if Wise ever opens the EU statement endpoint
|
||||
WISE_SCA_KEY_PATH=~/.config/arcodange-erp/wise-sca-private.pem
|
||||
|
||||
# Required by arcodange-email-ingest only
|
||||
ZOHO_CLIENT_ID=<from api-console.zoho.com self-client>
|
||||
ZOHO_CLIENT_SECRET=<same>
|
||||
ZOHO_REFRESH_TOKEN=<exchanged from one-time code via /oauth/v2/token>
|
||||
ZOHO_DC=eu # eu | com | in | au
|
||||
EOF
|
||||
chmod 600 .claude/skills/dolibarr/.env
|
||||
```
|
||||
|
||||
@@ -151,7 +151,8 @@ Not available on this account (intentionally): `/setup/modules` (admin-only), `/
|
||||
- Workflow skill for supplier-side TVA déductible (CA3 lignes 19 / 20 / 17+24): [dolibarr-tva-deductible](../dolibarr-tva-deductible/SKILL.md).
|
||||
- Workflow skill for composite CA3-ready TVA summary (collectée + déductible + net): [dolibarr-tva-summary](../dolibarr-tva-summary/SKILL.md).
|
||||
- **Bank-side reconciliation** (Qonto + Wise ↔ Dolibarr matching): [arcodange-bank-reco](../arcodange-bank-reco/SKILL.md).
|
||||
- Future workflow skills follow the `dolibarr-<topic>` convention (ERP-internal) or `arcodange-<topic>` (cross-system, like bank reconciliation). Each one depends on this skill for connection + permissions + endpoint reference; each one keeps its triggers focused on its specific business workflow.
|
||||
- **Email ingestion** (Zoho Mail → supplier-invoice draft for Dolibarr): [arcodange-email-ingest](../arcodange-email-ingest/SKILL.md).
|
||||
- Future workflow skills follow the `dolibarr-<topic>` convention (ERP-internal) or `arcodange-<topic>` (cross-system). Each one depends on this skill for connection + permissions + endpoint reference; each one keeps its triggers focused on its specific business workflow.
|
||||
|
||||
## Out of scope
|
||||
|
||||
|
||||
@@ -76,6 +76,11 @@ COMMANDS
|
||||
balance Live balances + Dolibarr cross-check per fk_account
|
||||
curl <qonto|wise> <path> Raw read-only curl through bank-curl.sh
|
||||
|
||||
email Supplier-invoice emails from the Zoho mailbox
|
||||
list [--folder|--limit|--candidates-only|--all-folders] List candidates
|
||||
inspect <messageId> [--folder|--save-pdf|--json] Parse PDFs + draft Dolibarr entry
|
||||
curl <path> Raw read-only curl through zoho-curl.sh
|
||||
|
||||
whoami GET /users/info — confirm auth
|
||||
ping GET /status — liveness + Dolibarr version
|
||||
curl <path> Raw read-only curl through dol-curl.sh
|
||||
@@ -236,6 +241,30 @@ EOF
|
||||
esac
|
||||
;;
|
||||
|
||||
email)
|
||||
sub="${1:-help}"; shift || true
|
||||
case "${sub}" in
|
||||
list) exec "${SKILLS}/arcodange-email-ingest/scripts/email-list.sh" "$@" ;;
|
||||
inspect) exec "${SKILLS}/arcodange-email-ingest/scripts/email-inspect.sh" "$@" ;;
|
||||
curl) exec "${SKILLS}/arcodange-email-ingest/scripts/zoho-curl.sh" "$@" ;;
|
||||
help|-h|--help)
|
||||
cat <<'EOF'
|
||||
arcodange email — supplier-invoice ingestion from the Zoho mailbox.
|
||||
|
||||
list [--folder PATH|--limit N|--candidates-only|--all-folders]
|
||||
List messages (default: /Inbox/books)
|
||||
inspect <messageId> [--folder PATH|--save-pdf DIR|--json]
|
||||
Parse PDF attachments, propose Dolibarr supplier-invoice draft
|
||||
curl <path> Raw read-only call through zoho-curl.sh
|
||||
|
||||
Requires ZOHO_CLIENT_ID, ZOHO_CLIENT_SECRET, ZOHO_REFRESH_TOKEN, ZOHO_DC in .env.
|
||||
See arcodange-email-ingest/SKILL.md for OAuth setup.
|
||||
EOF
|
||||
;;
|
||||
*) echo "arcodange email: unknown subcommand '${sub}' (try 'arcodange email help')" >&2; exit 2 ;;
|
||||
esac
|
||||
;;
|
||||
|
||||
whoami)
|
||||
exec "${DOLC}" /users/info
|
||||
;;
|
||||
|
||||
Reference in New Issue
Block a user