arcodange-email-ingest V8.1: filter calendar invites + newsletter senders
email-list.sh gains two hard-exclusion filters (applied before the candidate test, regardless of attachments): - EXCLUDE_PATTERN matches subjects starting with Invitation: / Updated invitation: / Canceled event: / Accepted: / Declined: / Tentative: / Maybe: (after stripping Re:/Fwd:/Tr: prefixes). Filters Google Calendar events that always carry an .ics attachment. - EXCLUDE_SENDER matches updates.<domain>, noreply@*calendar, news@, newsletter@. Filters newsletter blast traffic. Effect on --all-folders --candidates-only baseline: 27 noisy → 12 actionable (calendar invites + the staying-ahead.ai newsletter blast removed). Real supplier docs intact: Darnis F1042 in /Notification, 3 Free Mobile factures in /Inbox/abonnements, Mistral + Anthropic in /Inbox/books. The originally-planned --mark-ingested feature is deferred to V8.2: flag-set requires the Zoho OAuth scope ZohoMail.messages.UPDATE which our read-only refresh_token doesn't have. Documented in SKILL.md: once the user opts in to the wider scope, --mark-ingested becomes a one-line flag on email-inspect.sh and is_candidate() learns to skip flag_info messages. Captured the new --all-folders baseline at examples/email-list-all-folders.txt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -91,9 +91,29 @@ CANDIDATE_PATTERN = re.compile(
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# Subjects that look like calendar invites / event updates / generic notifications
|
||||
# get filtered out of --candidates-only — they always have a .ics attachment so
|
||||
# the "has-attachment" heuristic alone catches them as false positives.
|
||||
EXCLUDE_PATTERN = re.compile(
|
||||
r'^(?:re:\s*|fwd:\s*|tr:\s*)*' # strip Re:/Fwd:/Tr: prefixes
|
||||
r'(?:invitation|updated\s+invitation|canceled\s+event|accepted|declined|tentative|maybe)\s*:',
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# Senders that are pure noise — newsletter/marketing patterns.
|
||||
EXCLUDE_SENDER = re.compile(
|
||||
r'(updates\.|noreply@.*calendar|@calendar\.|news@|newsletter@|@updates\.)',
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
def is_candidate(m):
|
||||
subj = m.get("subject","") or ""
|
||||
sender = m.get("fromAddress","") or m.get("sender","") or ""
|
||||
# Hard exclusions take precedence over inclusions
|
||||
if EXCLUDE_PATTERN.match(subj.strip()): return False
|
||||
if EXCLUDE_SENDER.search(sender): return False
|
||||
if str(m.get("hasAttachment","")) == "1": return True
|
||||
if CANDIDATE_PATTERN.search(m.get("subject","") or ""): return True
|
||||
if CANDIDATE_PATTERN.search(subj): return True
|
||||
return False
|
||||
|
||||
rows = []
|
||||
|
||||
Reference in New Issue
Block a user