arcodange-email-ingest V8.1: filter calendar invites + newsletter senders
email-list.sh gains two hard-exclusion filters (applied before the candidate test, regardless of attachments): - EXCLUDE_PATTERN matches subjects starting with Invitation: / Updated invitation: / Canceled event: / Accepted: / Declined: / Tentative: / Maybe: (after stripping Re:/Fwd:/Tr: prefixes). Filters Google Calendar events that always carry an .ics attachment. - EXCLUDE_SENDER matches updates.<domain>, noreply@*calendar, news@, newsletter@. Filters newsletter blast traffic. Effect on --all-folders --candidates-only baseline: 27 noisy → 12 actionable (calendar invites + the staying-ahead.ai newsletter blast removed). Real supplier docs intact: Darnis F1042 in /Notification, 3 Free Mobile factures in /Inbox/abonnements, Mistral + Anthropic in /Inbox/books. The originally-planned --mark-ingested feature is deferred to V8.2: flag-set requires the Zoho OAuth scope ZohoMail.messages.UPDATE which our read-only refresh_token doesn't have. Documented in SKILL.md: once the user opts in to the wider scope, --mark-ingested becomes a one-line flag on email-inspect.sh and is_candidate() learns to skip flag_info messages. Captured the new --all-folders baseline at examples/email-list-all-folders.txt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -49,7 +49,13 @@ bin/arcodange email list --folder /Inbox/contact --limit 50
|
||||
bin/arcodange email list --all-folders --candidates-only # scan everything (slower, more API calls)
|
||||
```
|
||||
|
||||
Captured at [examples/email-list.txt](examples/email-list.txt). The candidate filter matches subjects against `facture|invoice|receipt|reçu|payment|paiement|abonnement|subscription|order|commande|bill` OR any message with an attachment. The `[*]` column marks candidates, `[Y]` marks emails with attachments.
|
||||
Captured at [examples/email-list.txt](examples/email-list.txt). The candidate filter matches subjects against `facture|invoice|receipt|reçu|payment|paiement|abonnement|subscription|order|commande|bill` OR any message with an attachment.
|
||||
|
||||
**Hard exclusions** (V8.1) — applied before the candidate test, regardless of attachments:
|
||||
- Subjects starting with `Invitation:` / `Updated invitation:` / `Canceled event:` / `Accepted:` / `Declined:` / `Tentative:` / `Maybe:` (after stripping `Re:` / `Fwd:` / `Tr:` prefixes) → filters calendar events that always carry an `.ics` attachment.
|
||||
- Senders matching newsletter/marketing patterns (`updates.<domain>`, `noreply@*calendar*`, `news@`, `newsletter@`, etc.).
|
||||
|
||||
The `[*]` column marks candidates, `[Y]` marks emails with attachments. Compared to V8.0, V8.1 cuts the `--all-folders --candidates-only` baseline from ~27 noisy entries down to ~12 actionable ones.
|
||||
|
||||
### 2. Inspect one email + draft Dolibarr entry
|
||||
|
||||
@@ -78,7 +84,7 @@ Captured at [examples/email-inspect.txt](examples/email-inspect.txt) for the V8
|
||||
## What it doesn't do (V8.0 scope)
|
||||
|
||||
- **Does not write to Dolibarr.** The supplier invoice is still created manually in the Dolibarr UI from the draft JSON. V9 candidate: automate via `/supplierinvoices` POST.
|
||||
- **Does not mark emails as ingested.** Each run re-emits the same candidates. V8.1 candidate: set the IMAP `\Flagged` flag or add a Zoho label `ingested` after the operator confirms.
|
||||
- **Does not mark emails as ingested.** Each run re-emits the same candidates. Implementing this requires extending the OAuth scope: the current refresh_token only has READ scopes (`ZohoMail.messages.READ` etc.). The flag-set endpoint (`PUT /api/accounts/{aid}/updatemessage`) requires `ZohoMail.messages.UPDATE`, which would force the user to regenerate the refresh_token. **V8.2 candidate** — once the user opts in to the wider scope, `--mark-ingested` becomes a one-line flag on `email-inspect.sh` and `is_candidate()` in `email-list.sh` learns to skip messages with `flagid == flag_info`.
|
||||
- **No body extraction yet.** We only parse PDF attachments. Inline-HTML invoices (rare — most suppliers send PDFs) would need body fetch via `/content`.
|
||||
- **Heuristic extraction is best-effort.** Different supplier PDF templates yield different field-extraction reliability. The draft JSON is a starting point, not ground truth.
|
||||
|
||||
|
||||
@@ -0,0 +1,16 @@
|
||||
date cand att messageId folder from subject
|
||||
----------------------------------------------------------------------------------------------------------------------------------
|
||||
2026-05-20 [*] [Y] 1779312401677014300 /clients/KissMetrics rsirvent@digitalocean.com Re: VM not running despite status=active, after volume
|
||||
2026-05-20 [*] [Y] 1779298419301014300 /clients/KissMetrics tdziuba@kissmetrics.io Re: VM not running despite status=active, after volume
|
||||
2026-05-20 [*] [Y] 1779285954272004400 /clients/KissMetrics tdziuba@kissmetrics.io Re: VM not running despite status=active, after volume
|
||||
2026-05-05 [*] [ ] 1777970798248014300 /Inbox/abonnements freemobile@free-mobile.fr Votre facture mobile Free est disponible
|
||||
2026-04-21 [*] [Y] 1776785469477004300 /Notification noreply@hiway.fr Darnis Operations - Facture F1042
|
||||
2026-04-12 [*] [Y] 1776017238960014300 /Inbox/books arcodange@gmail.com Fwd: Your receipt from Anthropic Ireland, Limited #2109
|
||||
2026-04-04 [*] [ ] 1775264759983014300 /Inbox/abonnements freemobile@free-mobile.fr Votre facture mobile Free est disponible
|
||||
2026-04-02 [*] [Y] 1775141901205014300 /Inbox/books no-reply@mistral.ai Votre facture nº MSTRL-API-814045-001 de Mistral AI SAS
|
||||
2026-03-05 [*] [ ] 1772689535069004400 /Inbox/helloworld freemobile@free-mobile.fr Votre facture mobile Free est disponible
|
||||
2026-02-08 [*] [Y] 1770582421208004400 /Inbox/bureaux ne-pas-repondre@portailpro.gouv.fr Valider votre espace personnel sur Portailpro.gouv
|
||||
2026-01-09 [*] [ ] 1767989744791004400 /Inbox/books gabrielradureau@gmail.com Fwd: INPI - Votre paiement pour la commande Réf. 181876
|
||||
2026-01-06 [*] [Y] 1767710535894005600 /Inbox gabrielradureau@gmail.com Statuts
|
||||
----------------------------------------------------------------------------------------------------------------------------------
|
||||
# 12 message(s) (candidates only)
|
||||
@@ -91,9 +91,29 @@ CANDIDATE_PATTERN = re.compile(
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# Subjects that look like calendar invites / event updates / generic notifications
|
||||
# get filtered out of --candidates-only — they always have a .ics attachment so
|
||||
# the "has-attachment" heuristic alone catches them as false positives.
|
||||
EXCLUDE_PATTERN = re.compile(
|
||||
r'^(?:re:\s*|fwd:\s*|tr:\s*)*' # strip Re:/Fwd:/Tr: prefixes
|
||||
r'(?:invitation|updated\s+invitation|canceled\s+event|accepted|declined|tentative|maybe)\s*:',
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# Senders that are pure noise — newsletter/marketing patterns.
|
||||
EXCLUDE_SENDER = re.compile(
|
||||
r'(updates\.|noreply@.*calendar|@calendar\.|news@|newsletter@|@updates\.)',
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
def is_candidate(m):
|
||||
subj = m.get("subject","") or ""
|
||||
sender = m.get("fromAddress","") or m.get("sender","") or ""
|
||||
# Hard exclusions take precedence over inclusions
|
||||
if EXCLUDE_PATTERN.match(subj.strip()): return False
|
||||
if EXCLUDE_SENDER.search(sender): return False
|
||||
if str(m.get("hasAttachment","")) == "1": return True
|
||||
if CANDIDATE_PATTERN.search(m.get("subject","") or ""): return True
|
||||
if CANDIDATE_PATTERN.search(subj): return True
|
||||
return False
|
||||
|
||||
rows = []
|
||||
|
||||
Reference in New Issue
Block a user