V8 — first inbound-side skill. Closes the loop from "bill arrives by email"
to "ready to enter in Dolibarr UI". Read-only at every layer.
What ships:
- arcodange-email-ingest/scripts/zoho-curl.sh OAuth wrapper with token cache
(50 min TTL, mode 600) — avoids
hitting Zoho OAuth rate limit on
every invocation.
- arcodange-email-ingest/scripts/email-list.sh List candidates in /Inbox/books
(where the books@ alias auto-
routes mail). --candidates-only
filter on supplier patterns or
attachments. --all-folders to
scan everything.
- arcodange-email-ingest/scripts/email-inspect.sh Pull message + attachments,
pdftotext on each PDF, heuristic
extract (supplier, ref, dates,
totals, VAT rate), emit Dolibarr
supplier-invoice draft JSON.
Architecture choice — Zoho API (not IMAP):
- books@arcodange.fr is an alias of gabrielradureau@arcodange.fr → one OAuth
refresh_token covers everything.
- Gmail folded in via forwarding (arcodange@gmail.com → books@) — no Google
API setup, no app-passwords, no second OAuth flow.
- Token-based auth, no SCA rabbit hole.
V8.0 baseline (in /Inbox/books):
- 3 candidates: Mistral AI facture, Anthropic Stripe receipt (Fwd Gmail),
INPI payment receipt (Fwd Gmail).
- Heuristic extraction is best-effort: works on amounts/refs for some
templates, misses others (Mistral PDF format, Stripe receipt layout).
- --save-pdf <DIR> lets the operator grab the PDFs for manual entry when
the heuristic falls short.
Rate-limit pitfall documented: Zoho OAuth refresh has an aggressive throttle
("too many requests continuously"). The cache file at $TMPDIR/zoho-access-$USER
(mode 600, 50 min TTL) prevents this; on 401 the wrapper auto-refreshes once
and retries.
V8.1+ ideas in SKILL.md out-of-scope:
- mark ingested emails (IMAP flag or Zoho label)
- body text extraction (inline-HTML invoices)
- per-template parsers or LLM-based extraction
- IMAP fallback for non-Zoho mailboxes
CLI: bin/arcodange email {list|inspect|curl} integrated.
Base updates: dolibarr/SKILL.md cross-link, dolibarr/README.md env schema
extended with ZOHO_CLIENT_ID/SECRET/REFRESH_TOKEN/DC.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
122 lines
4.8 KiB
Bash
Executable File
122 lines
4.8 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# List candidate supplier-invoice emails from the books@ Zoho mailbox.
|
|
#
|
|
# Usage:
|
|
# email-list.sh [--folder PATH] # default: /Inbox/books (the books@ alias-filtered folder)
|
|
# [--limit N] # default: 30
|
|
# [--candidates-only] # filter by subject pattern OR attachment
|
|
# [--all-folders] # scan every folder (slow, lots of API calls)
|
|
#
|
|
# Output: table with mid, date, from, subject, hasAttachment.
|
|
# A "candidate" is a message whose subject matches a supplier-like pattern
|
|
# (facture/invoice/receipt/reçu/payment/paiement/abonnement/order/commande)
|
|
# OR which has an attachment.
|
|
|
|
set -euo pipefail
|
|
|
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
ZOHO_CURL="${SCRIPT_DIR}/zoho-curl.sh"
|
|
|
|
FOLDER="/Inbox/books"
|
|
LIMIT=30
|
|
CANDIDATES_ONLY=0
|
|
ALL_FOLDERS=0
|
|
while [[ $# -gt 0 ]]; do
|
|
case "$1" in
|
|
--folder) FOLDER="$2"; shift 2 ;;
|
|
--limit) LIMIT="$2"; shift 2 ;;
|
|
--candidates-only) CANDIDATES_ONLY=1; shift ;;
|
|
--all-folders) ALL_FOLDERS=1; shift ;;
|
|
-h|--help) sed -n '2,12p' "$0" | sed 's/^# \{0,1\}//'; exit 0 ;;
|
|
*) echo "email-list.sh: unknown arg: $1" >&2; exit 2 ;;
|
|
esac
|
|
done
|
|
|
|
WORK="$(mktemp -d -t emailist.XXXXXX)"
|
|
trap 'rm -rf "${WORK}"' EXIT
|
|
|
|
# 1. Discover accountId
|
|
"${ZOHO_CURL}" /accounts > "${WORK}/accounts.json"
|
|
AID=$(python3 -c "import json,sys; d=json.load(open(sys.argv[1])); print((d.get('data') or [{}])[0].get('accountId',''))" "${WORK}/accounts.json")
|
|
[[ -z "${AID}" ]] && { echo "email-list.sh: no accountId in /accounts response" >&2; exit 1; }
|
|
|
|
# 2. Resolve folder path → folderId
|
|
"${ZOHO_CURL}" "/accounts/${AID}/folders" > "${WORK}/folders.json"
|
|
|
|
# Build list of (folderId, path) tuples to scan
|
|
if [[ "${ALL_FOLDERS}" == "1" ]]; then
|
|
FOLDER_IDS=$(python3 -c "
|
|
import json, sys
|
|
d = json.load(open(sys.argv[1]))
|
|
for f in (d.get('data') or []):
|
|
fid = f.get('folderId'); path = f.get('path') or f.get('folderName','-')
|
|
# Skip noisy system folders
|
|
if path in ('/Drafts','/Templates','/Snoozed','/Sent','/Spam','/Trash','/Outbox'): continue
|
|
print(f\"{fid}|{path}\")" "${WORK}/folders.json")
|
|
else
|
|
FOLDER_IDS=$(python3 -c "
|
|
import json, sys
|
|
d = json.load(open(sys.argv[1]))
|
|
target = sys.argv[2]
|
|
for f in (d.get('data') or []):
|
|
if f.get('path') == target:
|
|
print(f\"{f.get('folderId')}|{f.get('path')}\")
|
|
break" "${WORK}/folders.json" "${FOLDER}")
|
|
if [[ -z "${FOLDER_IDS}" ]]; then
|
|
echo "email-list.sh: folder '${FOLDER}' not found. Available:" >&2
|
|
python3 -c "import json,sys; [print(f' {f.get(\"path\",\"-\")}') for f in json.load(open(sys.argv[1])).get('data',[])]" "${WORK}/folders.json" >&2
|
|
exit 2
|
|
fi
|
|
fi
|
|
|
|
# 3. Fetch messages per folder
|
|
mkdir -p "${WORK}/msgs"
|
|
COUNT=0
|
|
while IFS='|' read -r fid fpath; do
|
|
[[ -z "${fid}" ]] && continue
|
|
COUNT=$((COUNT+1))
|
|
out="${WORK}/msgs/$(printf '%03d' "${COUNT}").json"
|
|
"${ZOHO_CURL}" "/accounts/${AID}/messages/view?folderId=${fid}&limit=${LIMIT}&sortorder=false&start=1" > "${out}" 2>/dev/null || echo '{"data":[]}' > "${out}"
|
|
echo "${fpath}" > "${out}.path"
|
|
done <<< "${FOLDER_IDS}"
|
|
|
|
# 4. Render
|
|
python3 - "${WORK}/msgs" "${CANDIDATES_ONLY}" <<'PY'
|
|
import json, sys, os, re, datetime, glob
|
|
msgs_dir, candidates_only_str = sys.argv[1:3]
|
|
candidates_only = candidates_only_str == "1"
|
|
|
|
CANDIDATE_PATTERN = re.compile(
|
|
r'facture|invoice|receipt|re[cç]u|payment|paiement|abonnement|subscription|order|commande|invoice|bill',
|
|
re.IGNORECASE,
|
|
)
|
|
|
|
def is_candidate(m):
|
|
if str(m.get("hasAttachment","")) == "1": return True
|
|
if CANDIDATE_PATTERN.search(m.get("subject","") or ""): return True
|
|
return False
|
|
|
|
rows = []
|
|
for f in sorted(glob.glob(os.path.join(msgs_dir, "*.json"))):
|
|
fpath = open(f + ".path").read().strip()
|
|
try: data = json.load(open(f)).get("data") or []
|
|
except: continue
|
|
for m in data:
|
|
if candidates_only and not is_candidate(m): continue
|
|
ts = int(m.get("sentDateInGMT") or m.get("receivedTime") or 0) // 1000
|
|
dt = datetime.datetime.fromtimestamp(ts).strftime("%Y-%m-%d") if ts else "-"
|
|
frm = (m.get("fromAddress") or m.get("sender") or "-").replace("<","<").replace(">",">").replace("<","").replace(">","")[:36]
|
|
subj = (m.get("subject") or "-")[:55]
|
|
has = "Y" if str(m.get("hasAttachment","")) == "1" else " "
|
|
cand = "*" if is_candidate(m) else " "
|
|
rows.append((dt, fpath, cand, has, m.get("messageId","-"), frm, subj))
|
|
|
|
rows.sort(key=lambda r: r[0], reverse=True)
|
|
print(f"{'date':<10} {'cand':<4} {'att':<3} {'messageId':<22} {'folder':<22} {'from':<36} subject")
|
|
print("-" * 130)
|
|
for dt, fpath, cand, has, mid, frm, subj in rows:
|
|
print(f"{dt:<10} [{cand}] [{has}] {mid:<22} {fpath[:22]:<22} {frm:<36} {subj}")
|
|
print("-" * 130)
|
|
print(f"# {len(rows)} message(s)" + (" (candidates only)" if candidates_only else ""))
|
|
PY
|