Files
Gabriel Radureau c2d8479f5e add arcodange-email-ingest — Zoho Mail → Dolibarr supplier-invoice drafts
V8 — first inbound-side skill. Closes the loop from "bill arrives by email"
to "ready to enter in Dolibarr UI". Read-only at every layer.

What ships:
- arcodange-email-ingest/scripts/zoho-curl.sh   OAuth wrapper with token cache
                                                (50 min TTL, mode 600) — avoids
                                                hitting Zoho OAuth rate limit on
                                                every invocation.
- arcodange-email-ingest/scripts/email-list.sh   List candidates in /Inbox/books
                                                (where the books@ alias auto-
                                                routes mail). --candidates-only
                                                filter on supplier patterns or
                                                attachments. --all-folders to
                                                scan everything.
- arcodange-email-ingest/scripts/email-inspect.sh   Pull message + attachments,
                                                pdftotext on each PDF, heuristic
                                                extract (supplier, ref, dates,
                                                totals, VAT rate), emit Dolibarr
                                                supplier-invoice draft JSON.

Architecture choice — Zoho API (not IMAP):
- books@arcodange.fr is an alias of gabrielradureau@arcodange.fr → one OAuth
  refresh_token covers everything.
- Gmail folded in via forwarding (arcodange@gmail.com → books@) — no Google
  API setup, no app-passwords, no second OAuth flow.
- Token-based auth, no SCA rabbit hole.

V8.0 baseline (in /Inbox/books):
- 3 candidates: Mistral AI facture, Anthropic Stripe receipt (Fwd Gmail),
  INPI payment receipt (Fwd Gmail).
- Heuristic extraction is best-effort: works on amounts/refs for some
  templates, misses others (Mistral PDF format, Stripe receipt layout).
- --save-pdf <DIR> lets the operator grab the PDFs for manual entry when
  the heuristic falls short.

Rate-limit pitfall documented: Zoho OAuth refresh has an aggressive throttle
("too many requests continuously"). The cache file at $TMPDIR/zoho-access-$USER
(mode 600, 50 min TTL) prevents this; on 401 the wrapper auto-refreshes once
and retries.

V8.1+ ideas in SKILL.md out-of-scope:
- mark ingested emails (IMAP flag or Zoho label)
- body text extraction (inline-HTML invoices)
- per-template parsers or LLM-based extraction
- IMAP fallback for non-Zoho mailboxes

CLI: bin/arcodange email {list|inspect|curl} integrated.
Base updates: dolibarr/SKILL.md cross-link, dolibarr/README.md env schema
extended with ZOHO_CLIENT_ID/SECRET/REFRESH_TOKEN/DC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 14:56:15 +02:00

127 lines
4.1 KiB
Bash
Executable File

#!/usr/bin/env bash
# Read-only curl wrapper for the Zoho Mail API.
#
# Usage:
# zoho-curl.sh <path> # e.g. zoho-curl.sh /accounts
# zoho-curl.sh -i <path> # include curl's -i (response headers)
# zoho-curl.sh -o file.json <path> # write body to file
#
# Reads credentials from ../../dolibarr/.env (the shared canonical file).
# Required vars:
# ZOHO_CLIENT_ID, ZOHO_CLIENT_SECRET, ZOHO_REFRESH_TOKEN, ZOHO_DC
#
# Token strategy: each invocation refreshes a short-lived access_token from
# the refresh_token (Zoho access_tokens live 1h; the cost of refreshing on
# every call is ~150 ms and avoids state on disk). On 401 from the mail API
# we re-refresh once and retry (covers refresh-token rotation cases).
#
# Exits non-zero on HTTP >= 400 and writes body to stdout + a short message
# to stderr — same shape as dol-curl.sh / bank-curl.sh.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
ENV_FILE="${SCRIPT_DIR}/../../dolibarr/.env"
if [[ ! -f "${ENV_FILE}" ]]; then
echo "zoho-curl.sh: missing ${ENV_FILE}" >&2
echo " Required vars: ZOHO_CLIENT_ID, ZOHO_CLIENT_SECRET, ZOHO_REFRESH_TOKEN, ZOHO_DC." >&2
echo " See arcodange-email-ingest/SKILL.md for the OAuth setup." >&2
exit 2
fi
set -a; source "${ENV_FILE}"; set +a
: "${ZOHO_CLIENT_ID:?zoho-curl.sh: ZOHO_CLIENT_ID not set in .env}"
: "${ZOHO_CLIENT_SECRET:?zoho-curl.sh: ZOHO_CLIENT_SECRET not set in .env}"
: "${ZOHO_REFRESH_TOKEN:?zoho-curl.sh: ZOHO_REFRESH_TOKEN not set in .env}"
: "${ZOHO_DC:=eu}"
ACCOUNTS_BASE="https://accounts.zoho.${ZOHO_DC}"
MAIL_BASE="https://mail.zoho.${ZOHO_DC}/api"
# Parse pass-through curl args (everything before the last positional)
PASSTHRU=()
while [[ $# -gt 1 ]]; do
PASSTHRU+=("$1"); shift
done
if [[ $# -lt 1 ]]; then
echo "zoho-curl.sh: missing API path. Example: zoho-curl.sh /accounts" >&2
exit 2
fi
API_PATH="$1"
# Cache access_token in tmpfs to avoid hitting OAuth rate limits on every
# zoho-curl invocation. Zoho access_tokens live 1h; we refresh after 50 min.
CACHE_FILE="${TMPDIR:-/tmp}/zoho-access-$(whoami)"
CACHE_TTL_SECONDS=$((50 * 60))
get_access_token() {
if [[ -f "${CACHE_FILE}" ]]; then
local age
age=$(( $(date +%s) - $(stat -f %m "${CACHE_FILE}" 2>/dev/null || stat -c %Y "${CACHE_FILE}") ))
if [[ ${age} -lt ${CACHE_TTL_SECONDS} ]]; then
cat "${CACHE_FILE}"
return 0
fi
fi
local token
if ! token=$(curl -sS -X POST "${ACCOUNTS_BASE}/oauth/v2/token" \
--max-time 15 \
-d "grant_type=refresh_token" \
-d "client_id=${ZOHO_CLIENT_ID}" \
-d "client_secret=${ZOHO_CLIENT_SECRET}" \
-d "refresh_token=${ZOHO_REFRESH_TOKEN}" \
| python3 -c "
import json, sys
try: d = json.load(sys.stdin)
except: sys.exit('failed to parse OAuth response')
if 'access_token' not in d:
sys.exit(f'OAuth refresh failed: {d}')
print(d['access_token'])"); then
return 1
fi
if [[ -z "${token}" ]]; then
return 1
fi
# Store cache (mode 600) only on success
printf '%s' "${token}" > "${CACHE_FILE}"
chmod 600 "${CACHE_FILE}"
printf '%s' "${token}"
}
do_call() {
local token="$1"
local body_file="$2"
local headers_file="$3"
curl -sS \
-H "Authorization: Zoho-oauthtoken ${token}" \
-H "Accept: application/json" \
--max-time 30 \
-o "${body_file}" \
-D "${headers_file}" \
-w "%{http_code}" \
${PASSTHRU[@]+"${PASSTHRU[@]}"} \
"${MAIL_BASE}${API_PATH}"
}
ACCESS_TOKEN=$(get_access_token)
[[ -z "${ACCESS_TOKEN}" ]] && { echo "zoho-curl.sh: empty access_token" >&2; exit 1; }
BODY_FILE="$(mktemp -t zohocurl.XXXXXX)"
HEADERS_FILE="$(mktemp -t zohohdr.XXXXXX)"
trap 'rm -f "${BODY_FILE}" "${HEADERS_FILE}"' EXIT
HTTP_CODE=$(do_call "${ACCESS_TOKEN}" "${BODY_FILE}" "${HEADERS_FILE}")
# Retry once on 401 with a fresh token (handles edge cases of refresh-token rotation)
if [[ "${HTTP_CODE}" == "401" ]]; then
ACCESS_TOKEN=$(get_access_token)
HTTP_CODE=$(do_call "${ACCESS_TOKEN}" "${BODY_FILE}" "${HEADERS_FILE}")
fi
cat "${BODY_FILE}"
if [[ "${HTTP_CODE}" -ge 400 ]]; then
echo "zoho-curl.sh: HTTP ${HTTP_CODE} on ${API_PATH}" >&2
exit 1
fi