From 1d38f25c23c54e75b076712718f08ea897dccd13 Mon Sep 17 00:00:00 2001 From: Gabriel Radureau Date: Sun, 31 May 2026 15:18:31 +0200 Subject: [PATCH] arcodange-email-ingest V8.1: filter calendar invites + newsletter senders MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit email-list.sh gains two hard-exclusion filters (applied before the candidate test, regardless of attachments): - EXCLUDE_PATTERN matches subjects starting with Invitation: / Updated invitation: / Canceled event: / Accepted: / Declined: / Tentative: / Maybe: (after stripping Re:/Fwd:/Tr: prefixes). Filters Google Calendar events that always carry an .ics attachment. - EXCLUDE_SENDER matches updates., noreply@*calendar, news@, newsletter@. Filters newsletter blast traffic. Effect on --all-folders --candidates-only baseline: 27 noisy → 12 actionable (calendar invites + the staying-ahead.ai newsletter blast removed). Real supplier docs intact: Darnis F1042 in /Notification, 3 Free Mobile factures in /Inbox/abonnements, Mistral + Anthropic in /Inbox/books. The originally-planned --mark-ingested feature is deferred to V8.2: flag-set requires the Zoho OAuth scope ZohoMail.messages.UPDATE which our read-only refresh_token doesn't have. Documented in SKILL.md: once the user opts in to the wider scope, --mark-ingested becomes a one-line flag on email-inspect.sh and is_candidate() learns to skip flag_info messages. Captured the new --all-folders baseline at examples/email-list-all-folders.txt. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../skills/arcodange-email-ingest/SKILL.md | 10 +++++++-- .../examples/email-list-all-folders.txt | 16 ++++++++++++++ .../scripts/email-list.sh | 22 ++++++++++++++++++- 3 files changed, 45 insertions(+), 3 deletions(-) create mode 100644 .claude/skills/arcodange-email-ingest/examples/email-list-all-folders.txt diff --git a/.claude/skills/arcodange-email-ingest/SKILL.md b/.claude/skills/arcodange-email-ingest/SKILL.md index 38519c4..16bcb16 100644 --- a/.claude/skills/arcodange-email-ingest/SKILL.md +++ b/.claude/skills/arcodange-email-ingest/SKILL.md @@ -49,7 +49,13 @@ bin/arcodange email list --folder /Inbox/contact --limit 50 bin/arcodange email list --all-folders --candidates-only # scan everything (slower, more API calls) ``` -Captured at [examples/email-list.txt](examples/email-list.txt). The candidate filter matches subjects against `facture|invoice|receipt|reçu|payment|paiement|abonnement|subscription|order|commande|bill` OR any message with an attachment. The `[*]` column marks candidates, `[Y]` marks emails with attachments. +Captured at [examples/email-list.txt](examples/email-list.txt). The candidate filter matches subjects against `facture|invoice|receipt|reçu|payment|paiement|abonnement|subscription|order|commande|bill` OR any message with an attachment. + +**Hard exclusions** (V8.1) — applied before the candidate test, regardless of attachments: +- Subjects starting with `Invitation:` / `Updated invitation:` / `Canceled event:` / `Accepted:` / `Declined:` / `Tentative:` / `Maybe:` (after stripping `Re:` / `Fwd:` / `Tr:` prefixes) → filters calendar events that always carry an `.ics` attachment. +- Senders matching newsletter/marketing patterns (`updates.`, `noreply@*calendar*`, `news@`, `newsletter@`, etc.). + +The `[*]` column marks candidates, `[Y]` marks emails with attachments. Compared to V8.0, V8.1 cuts the `--all-folders --candidates-only` baseline from ~27 noisy entries down to ~12 actionable ones. ### 2. Inspect one email + draft Dolibarr entry @@ -78,7 +84,7 @@ Captured at [examples/email-inspect.txt](examples/email-inspect.txt) for the V8 ## What it doesn't do (V8.0 scope) - **Does not write to Dolibarr.** The supplier invoice is still created manually in the Dolibarr UI from the draft JSON. V9 candidate: automate via `/supplierinvoices` POST. -- **Does not mark emails as ingested.** Each run re-emits the same candidates. V8.1 candidate: set the IMAP `\Flagged` flag or add a Zoho label `ingested` after the operator confirms. +- **Does not mark emails as ingested.** Each run re-emits the same candidates. Implementing this requires extending the OAuth scope: the current refresh_token only has READ scopes (`ZohoMail.messages.READ` etc.). The flag-set endpoint (`PUT /api/accounts/{aid}/updatemessage`) requires `ZohoMail.messages.UPDATE`, which would force the user to regenerate the refresh_token. **V8.2 candidate** — once the user opts in to the wider scope, `--mark-ingested` becomes a one-line flag on `email-inspect.sh` and `is_candidate()` in `email-list.sh` learns to skip messages with `flagid == flag_info`. - **No body extraction yet.** We only parse PDF attachments. Inline-HTML invoices (rare — most suppliers send PDFs) would need body fetch via `/content`. - **Heuristic extraction is best-effort.** Different supplier PDF templates yield different field-extraction reliability. The draft JSON is a starting point, not ground truth. diff --git a/.claude/skills/arcodange-email-ingest/examples/email-list-all-folders.txt b/.claude/skills/arcodange-email-ingest/examples/email-list-all-folders.txt new file mode 100644 index 0000000..678d16a --- /dev/null +++ b/.claude/skills/arcodange-email-ingest/examples/email-list-all-folders.txt @@ -0,0 +1,16 @@ +date cand att messageId folder from subject +---------------------------------------------------------------------------------------------------------------------------------- +2026-05-20 [*] [Y] 1779312401677014300 /clients/KissMetrics rsirvent@digitalocean.com Re: VM not running despite status=active, after volume +2026-05-20 [*] [Y] 1779298419301014300 /clients/KissMetrics tdziuba@kissmetrics.io Re: VM not running despite status=active, after volume +2026-05-20 [*] [Y] 1779285954272004400 /clients/KissMetrics tdziuba@kissmetrics.io Re: VM not running despite status=active, after volume +2026-05-05 [*] [ ] 1777970798248014300 /Inbox/abonnements freemobile@free-mobile.fr Votre facture mobile Free est disponible +2026-04-21 [*] [Y] 1776785469477004300 /Notification noreply@hiway.fr Darnis Operations - Facture F1042 +2026-04-12 [*] [Y] 1776017238960014300 /Inbox/books arcodange@gmail.com Fwd: Your receipt from Anthropic Ireland, Limited #2109 +2026-04-04 [*] [ ] 1775264759983014300 /Inbox/abonnements freemobile@free-mobile.fr Votre facture mobile Free est disponible +2026-04-02 [*] [Y] 1775141901205014300 /Inbox/books no-reply@mistral.ai Votre facture nº MSTRL-API-814045-001 de Mistral AI SAS +2026-03-05 [*] [ ] 1772689535069004400 /Inbox/helloworld freemobile@free-mobile.fr Votre facture mobile Free est disponible +2026-02-08 [*] [Y] 1770582421208004400 /Inbox/bureaux ne-pas-repondre@portailpro.gouv.fr Valider votre espace personnel sur Portailpro.gouv +2026-01-09 [*] [ ] 1767989744791004400 /Inbox/books gabrielradureau@gmail.com Fwd: INPI - Votre paiement pour la commande Réf. 181876 +2026-01-06 [*] [Y] 1767710535894005600 /Inbox gabrielradureau@gmail.com Statuts +---------------------------------------------------------------------------------------------------------------------------------- +# 12 message(s) (candidates only) diff --git a/.claude/skills/arcodange-email-ingest/scripts/email-list.sh b/.claude/skills/arcodange-email-ingest/scripts/email-list.sh index 1faa38b..4083492 100755 --- a/.claude/skills/arcodange-email-ingest/scripts/email-list.sh +++ b/.claude/skills/arcodange-email-ingest/scripts/email-list.sh @@ -91,9 +91,29 @@ CANDIDATE_PATTERN = re.compile( re.IGNORECASE, ) +# Subjects that look like calendar invites / event updates / generic notifications +# get filtered out of --candidates-only — they always have a .ics attachment so +# the "has-attachment" heuristic alone catches them as false positives. +EXCLUDE_PATTERN = re.compile( + r'^(?:re:\s*|fwd:\s*|tr:\s*)*' # strip Re:/Fwd:/Tr: prefixes + r'(?:invitation|updated\s+invitation|canceled\s+event|accepted|declined|tentative|maybe)\s*:', + re.IGNORECASE, +) + +# Senders that are pure noise — newsletter/marketing patterns. +EXCLUDE_SENDER = re.compile( + r'(updates\.|noreply@.*calendar|@calendar\.|news@|newsletter@|@updates\.)', + re.IGNORECASE, +) + def is_candidate(m): + subj = m.get("subject","") or "" + sender = m.get("fromAddress","") or m.get("sender","") or "" + # Hard exclusions take precedence over inclusions + if EXCLUDE_PATTERN.match(subj.strip()): return False + if EXCLUDE_SENDER.search(sender): return False if str(m.get("hasAttachment","")) == "1": return True - if CANDIDATE_PATTERN.search(m.get("subject","") or ""): return True + if CANDIDATE_PATTERN.search(subj): return True return False rows = [] -- 2.49.1