Sunday, November 12, 2006

More Anti-Spam Techniques

The current version of hMail has greylisting as a built-in option. I've enabled that and easily seen a 90% drop in spam. Greylisting does entail a minor delay in e-mail delivery, but it's usually so minor that you the delay has no real effect (most mail servers try again within a few minutes).

On top of greylisting I still have my e-mail validation script. I no longer redirect based on the results of this script, but I still add a results header. The header is now used as a weighting factor by SpamAssassin (SA).

Yes, I've finally implemented the big gun. One of the reasons it's taken so long is that setting up SA isn't a quick install since it is originally written for *nix. And while the installation process isn't overly difficult, the tuning aspect of using SA is most difficult for the uninitiated.

Luckily there is plenty of good documentation out there and I was able to get things up and running over the course of a few days. So, here's a quick run-down on how to get SA running on Windows 2000:
  1. Download SA for Windows (version 3.1.7.0 as of this writing). I grab the SpamAssassin command line tools, sa-learn, and sa-update.
  2. Install (I chose to place it next to Apache in "Program Files\Apache Software Foundation"
  3. Configure SA (using local.cf). Really you can keep the defaults if you want, but I made a few modifications since I didn't want subject modification (only headers), needed to add a test to check sender validation based on the results of my custom script, wanted to specify the Bayesian filter data store path, and needed to specify the internal/trusted network information.
  4. Set up spamd to run as a service. Use whatever software you like, I've been using XYNTService by Xiangyang Liu available from The Code Project. It's simple and works well enough in this situation.
  5. Set up hMail to run SA. I do this via event handler scripting. During the OnDeliveryStart event subprocedure I run a batch file that scans the incoming message and writes the results to disc. The results are then copied back over the original, which hMail then uses for further processing.
  6. Now set up sa-update so that you can ensure your SA rules are up-to-date. I do this through a Windows Scheduled Task.
As for specific configurations ... well, that's up to you. I make copies of all my important config files and store them centrally to make installs, re-installs, and upgrades easy.

A note regarding internal/trusted IPs. I have only one public IP, so all the devices on my network use an IP from one of the private ranges. Unfortunately this can have an ill affect on SA when there's only one Received header (e.g. the e-mail came directly from the originating server). SA assumes that the first public IP must be from a trusted MX. The result of which is that pretty much every directly-connecting, originating mail server triggers the ALL_TRUSTED rule. This significantly decreases the spam caught by SA. Specifying your trusted/internal IPs in the configuration file fixes the error. For the nitty gritty, see the following
I really need to spend some more time figuring out the various tests and tuning my install of SA. But that's for later. It's working and catching a decent amount of spam out of the box.

I was thinking it might be a good idea to rewrite my sender verification script so that it runs as a native SA test. That might be more work than I'm willing to put in, though, considering some of the options I'd have to take into account such as the perl modules the script relies on and the disk-based cache.

It would be nice to provide an easy way to parse messages through the Bayesian filter. I'm not sure what would work best, but I was thinking of adding a header that contains a URL that will parse a message as spam if it was previously considered not spam and vice versa. This is something I'll need to research.

No comments: