sa-harvest

Summary

sa-harvest is a configurable script for training SpamAssassin. It combines the process from How To Train SpamAssassin with automatic generation of the whitelist based on the contents of your ham inboxes and your outbox

The goal is to let you type one simple command (sa-harvest) rather than a series of complex commands with varying flags.

Caveats

Details

In your ~/.spamassassin directory you will create 6 new files:

user_prefs.base addressbook addressbook.negative mail.spam mail.ham mail.sent

user_prefs.base
everything in the user_prefs except whitelist_from entries -- these are autogenerated by the script by using the addressbook files and your history of recent mail
addressbook:
a list of any addresses you want let through
addressbook.negative
patterns of addresses you want sacked. e.g. your own email address because you don't want to give a free pass to any spammer who knew to forge your address as the from address. This may be overly greedy -- it's a substring match, so paypal.com might as well be *paypal.com*.
mail.spam
list of paths - relative to your home dir - to mailboxes you consider spam, one mailbox per mailbox, e.g.
Maildir/.Spam/cur
mail.ham
similarly, list of paths to mailboxes you consider ham, e.g.
Maildir/cur
Maildir/.family/cur
mail.sent
mailboxes you consider sent mail, e.g.
Maildir/.Sent/cur

What It Does

Setup

Future Work

One Last Note

if you're really feeling lucky you can set this up with cron. Note that it can be fairly processor intensive.

If you do that, you need to check your ham and spam when you log in so you quickly catch any mis-identification (and fix the resulting incorrect training). If you use Maildir then restricting your training to cur directories helps cut down on that problem, but it isn't perfect, and mbox and mbx don't have such an option.

Feedback

Please send me some: faisal AT faisal DOT com.