Phisher

This is a placeholder for the Phisher project. Everything here may move.

Summary

Phisher is a SpamAssassin plugin which looks for anchors whose text resembles a domain name but whose href does not match the text.

For example, these would be caught:

The function also does some normalization of urls and domains, so some similar matches should not be caught. For example:

Downloading

Phisher.pm

Feeble instructions for using it are in the header. If you are not comfortable screwing around with your SA config, or if you do not have access to the site-wide config files (local.cf), you probably won’t be able to use this yet.

Why This Is A Bad Idea

This approach has been suggested before, usually as a regexp. Some people don’t like the general approach because it can lead to false positives:

My opinion is that this is more a matter of setting appropriate scores, and letting the presence of the mismatched anchor inform SA, than a matter of not wanting to use it because it might be wrong (as, in fact, many SA rules FP all the time). Further, I think you can’t implement this as a single line regexp because the string normalization becomes too hairy, and the pattern will break down all over the place. I tried it that way at first and it was a mess.

To Do

Validate on SA 3.1.x after 3.1 (which works)
Replace regexp with HTML::Parser based matching using uri_anchor_text.
Test with larger corpus (volunteers? Bueller?)
Figure out what a good target score should be (or let the scoring system figure it out?)
Edge cases
- what happens if there are nested html tags
- what happens if there are improperly nested html tags (<a><a></a>)
- what happens if there are spaces in the href
- what happens if there are spaces in the visible urls and %20s (etc) in the href
Get more people testing, and then, depending on feedback, either
- Submit to SA for inclusion in upcoming version, or
- Repackage as real Perl module so people can easily install it, and run it alongside SA