Saturday August 30 2008
Information Technology Central Services at the University of Michigan

SpamBox Frequently Asked Questions

This FAQ is designed for use by U-M IT staff, although others may also find it useful.

What software does SpamBox use?

SpamBox uses the open source DSPAM software in conjunction with a server-side filter. DSPAM tags incoming messages as either innocent or spam. The filter then puts spam messages into a SpamBox folder rather than into the Inbox.

Does SpamBox block mail from particular addresses?

No. SpamBox does not filter on individual addresses or IPs.

What parts of the message does SpamBox examine?

SpamBox (DSPAM) examines every part of the messages it analyzes, including the headers and the content.

How does SpamBox decide if a message is spam or not?

SpamBox (DSPAM) breaks up the entire message—content and headers— into small parts called "tokens." It then compares these tokens to the ones in its dictionary of innocent and spam tokens. ITCS's implementation of DSPAM uses the standard Graham-Bayesian and Burton-Bayesian algorithms for determining the probability that a message is spam, based on the tokens of which it is made up.

Some technical resources:

What factors identify a message as particularly "spammy"?

DSPAM identifies a message as spam if a lot of the mail received in the past with similar tokens has been spam (as verfied by training). It tags messages as spam based on whether similar messages have been identified by U-M users as spam or innocent.

If, for example, spammers start inserting Outlook-like headers in the spam they send, to make it look legitimate, and people report these messages as spam, then DSPAM will start to identify messages with tokens representing Outlook-like headers as spam. This would also be balanced by innocent mail, however. If a great deal of innocent mail with similar tokens is received here at the University, DSPAM is far less likely to consider those tokens indicative of spam and to look to other tokens in the message to make its decisions. DSPAM is very good at making fine distinctions once it is trained, because Bayesian filtering sorts out the relevant and irrelevant data.

Mail with images, certain words, HTML formatting, lots of links, and so on has frequently turned out to be spam in the past, but spammers continue to adjust their messages as they work to thwart spam filters.

How do spam messages get to the SpamBox folder (tags and filters)?

SpamBox uses DSPAM to tag mail as either spam or innocent. For IMAP users, a server-side filter is then used to direct the mail to either the user's Inbox or SpamBox folder.

When users sign up for SpamBox, they are asked whether they use IMAP or POP. (Actually, they are asked which mail program they use. If they select a program that can do either POP or IMAP, then they are asked which they use.)

For IMAP users, SpamBox enables both DSPAM tagging and a server-side filter to put the mail in the SpamBox folder. For POP users, SpamBox just enables the DSPAM tagging. POP users are given instructions for setting up their own local filter for filtering the messages tagged as spam.

How is SpamBox trained?

ITCS's implementation of DSPAM uses a shared dictionary of spam and non-spam tokens. This dictionary was created at U-M based on DSPAM training done by participants in the 2004 DSPAM pilot project. It is customized for U-M users based on mail received at U-M.

The shared dictionary is continually updated by U-M's SpamBox administrators. SpamBox users send reports of SpamBox errors. The SpamBox administrators then verify the error reports and update the shared dictionary as appropriate.

Manual verification is needed because of variations in how users identify spam. Some users identify all messages from University administration as spam. Others report messages they have opted into receiving (such as those from amazon.com or nytimes.com) as spam. Some users mistakenly report phishing messages as not-spam. Additionally, there is always the potential for deliberate abuse.

I use a POP mail client. Can I still train SpamBox?

Yes, but it will be a bit awkward. You could tell your POP client to leave messages on the server until you delete them. Then you would ned to use mail.umich.edu to do the training. To report innocent messages in your Inbox, you would select those messages then click the This Is Spam link. To report innocent messages that were misidentified in your POP mail client (that is, messages misstagged by DSPAM), you would need to move those messages from your Inbox to your SpamBox folder on the server with mail.umich.edu, select them, and then click the This Is Not Spam link.

Why do users have to use web mail to report spam errors?

DSPAM requires the entire message as it was delivered. A passed-on copy of the message won't work. Different mail programs handle forwarding, redirecting, attachments, pasting, and other operations differently. The result of these operations is often a mangled and useless message as far as DSPAM is concerned.

The links in the web mail client provide the message in the original form as required by DSPAM.

How can I convince SpamBox that mail from my mom (or from any particular address) is not spam?

You can bypass SpamBox by using a server-side filter to catch all mail from a particular address and have that mail delivered immediately to your Inbox. See the Using Your Accept List section of Using Server-Side Filters to Manage/Organize Your E-Mail (Including Accept and Block Lists) [S4325] for instructions.

It is also helpful for everyone if you routinely report such errors.

If I use SpamBox, do I need the Do Not Spam List as well?

Yes, you need both. The Do Not Spam List and SpamBox are designed to work together to reduce spam.

The shared dictionary is trained based on use of the Do Not Spam List. The list blocks messages from known spammers, and SpamBox assumes these messages are blocked before they can get to DSPAM.

Additional Resources

 

This page last verified November 20, 2006

 

ITCS Home  |  Contact ITCS  |  U-M IT Resources  |  Wolverine Access  |  University of Michigan