arcodange-email-ingest V8.1: filter calendar invites + newsletter senders

email-list.sh gains two hard-exclusion filters (applied before the
candidate test, regardless of attachments):

- EXCLUDE_PATTERN matches subjects starting with Invitation: / Updated
  invitation: / Canceled event: / Accepted: / Declined: / Tentative: /
  Maybe: (after stripping Re:/Fwd:/Tr: prefixes). Filters Google Calendar
  events that always carry an .ics attachment.
- EXCLUDE_SENDER matches updates.<domain>, noreply@*calendar, news@,
  newsletter@. Filters newsletter blast traffic.

Effect on --all-folders --candidates-only baseline: 27 noisy → 12
actionable (calendar invites + the staying-ahead.ai newsletter blast
removed). Real supplier docs intact: Darnis F1042 in /Notification, 3 Free
Mobile factures in /Inbox/abonnements, Mistral + Anthropic in /Inbox/books.

The originally-planned --mark-ingested feature is deferred to V8.2:
flag-set requires the Zoho OAuth scope ZohoMail.messages.UPDATE which our
read-only refresh_token doesn't have. Documented in SKILL.md: once the
user opts in to the wider scope, --mark-ingested becomes a one-line flag
on email-inspect.sh and is_candidate() learns to skip flag_info messages.

Captured the new --all-folders baseline at examples/email-list-all-folders.txt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-31 15:18:31 +02:00
parent 794aa18d2a
commit 1d38f25c23
3 changed files with 45 additions and 3 deletions

View File

@@ -91,9 +91,29 @@ CANDIDATE_PATTERN = re.compile(
re.IGNORECASE,
)
# Subjects that look like calendar invites / event updates / generic notifications
# get filtered out of --candidates-only — they always have a .ics attachment so
# the "has-attachment" heuristic alone catches them as false positives.
EXCLUDE_PATTERN = re.compile(
r'^(?:re:\s*|fwd:\s*|tr:\s*)*' # strip Re:/Fwd:/Tr: prefixes
r'(?:invitation|updated\s+invitation|canceled\s+event|accepted|declined|tentative|maybe)\s*:',
re.IGNORECASE,
)
# Senders that are pure noise — newsletter/marketing patterns.
EXCLUDE_SENDER = re.compile(
r'(updates\.|noreply@.*calendar|@calendar\.|news@|newsletter@|@updates\.)',
re.IGNORECASE,
)
def is_candidate(m):
subj = m.get("subject","") or ""
sender = m.get("fromAddress","") or m.get("sender","") or ""
# Hard exclusions take precedence over inclusions
if EXCLUDE_PATTERN.match(subj.strip()): return False
if EXCLUDE_SENDER.search(sender): return False
if str(m.get("hasAttachment","")) == "1": return True
if CANDIDATE_PATTERN.search(m.get("subject","") or ""): return True
if CANDIDATE_PATTERN.search(subj): return True
return False
rows = []