Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Improve the accuracy of your spam filter Apps
This is less a hint about improving accuracy than one about not deteriorating the accuracy of your junk mail filter.

Most email spam filters classify spam based on word frequency. When you train the filter, you give the filter a list of bad words. If a particular bad word comes up frequently it increases the likelihood that that email is spam. Then when an email comes in with enough bad words it is classified as spam.

The .gif emails that are currently going around do not get filtered well by spam filters. These emails contain two parts: the first part, the .gif, contains the text that the filter would normally learn to trigger off. Since this text is in the image, the filter can't see it. Instead the filter sees the second part of the email: a list of random phrases and words. The filter picks up these words and calculates a spam score from them.

The best thing to do with these spams is simply delete them and not try to train your filter with them. If you do have your filter learn the words in these messages, it will only be learning common words, which will skew its results. Use a rule to highlight these emails or to move them to your spam folder, as explained in this hint; it will maintain the integrity of your spam filter.

[kirkmc adds: For what it's worth, I had not paid attention to the attachment spams that still reached my inbox recently, but while editing this hint, looked at a few and see that they contain png images now. (The gifs are all filtered.) So if you do use a rule, such as in the recent hint linked above, think about creating another one for png format graphics. It's only a matter of time before spammers start using jpgs and other formats, though...]
    •    
  • Currently 1.00 / 5
  • 1
  • 2
  • 3
  • 4
  • 5
  (1 vote cast)
 
[17,280 views]  

Improve the accuracy of your spam filter | 10 comments | Create New Account
Click here to return to the 'Improve the accuracy of your spam filter' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Hint assumes too much
Authored by: cbiagini on Nov 02, '06 07:56:06AM
I think this "hint" may apply to a subset of spam filters, particularly older or naively-implemented ones, but it would be harmful if followed on a large scale, especially in the case of filters that apply to many users, like GMail's.

GMail and similar systems are striving to emulate what is essentially a horde of human spam filters: us. Their effectiveness relies on having known examples of "spam" and "non-spam" emails, and all this hint accomplishes is teaching those filters that these attachment-style emails aren't spam.

[ Reply to This | # ]

Re: Hint assumes too much
Authored by: chaos215bar2 on Nov 16, '06 05:16:34AM

Indeed... I have now had about 20 of these messages to train my Apple Mail spam filter with, and while this did temporarily seem to generate two false positives, Mail now identifies at least one version of these gif mails perfectly and is back to generating no false positives.



[ Reply to This | # ]
Improve the accuracy of your spam filter
Authored by: Anonymous on Nov 02, '06 08:23:06AM
This would be a good hint. If not for that fact that the author have misunderstood how modern spam filters work: Semantic taggig.

More information: http://db.tidbits.com/article/07677

---
Life only demands from you the strength you possess. Only one feat is possible - not to have run away.

[ Reply to This | # ]

Improve the accuracy of your spam filter
Authored by: mzajac on Nov 12, '06 11:13:10AM

Even if a spam filter's rules are simple, the picture it develops from a huge body of spam and non-spam email is a complex black box as far as most of us are concerned, and it's probably a mistake to try to try to second-guess it.



[ Reply to This | # ]
Improve the accuracy of your spam filter
Authored by: comodin on Nov 02, '06 03:29:37PM

i got tons of this spam per day.. and i gave up to set filter via apple-mail.app.

the only thing that was working for me ..
i installed the last versions of postfix/amavisd-new on my server and told them to use RBL as filter. and voila NO gif-spam since the new installation.

note .. the apple mac os x server 10.4 do have all this stuff per default.
but you can install it on a freebsd-box per ports too (like me).

i think .. best way is to look for a mailserver who use this feature.

---
--



[ Reply to This | # ]
Improve the accuracy of your spam filter
Authored by: ihelp-mac on Nov 02, '06 10:52:00PM
I've also struggled with crafting a Mail.app filter

that will handle an assortment of "image based" SPAM

I came up with something that seems to work well killing off JPG, PNG, GIF related junk...

As they ALL seem to have the string "multipart/related" in their headers:

Set "ALL Condtions" flag to TRUE and then have

"Sender is not in my address book"

AND

Content-Type CONTAINS "multipart/related"

This requires editing the HEADER_LIST tag in the RULES setup screen in

the MAIL.app, and ADDING "multipart/related" as a choice on the menu

Seems to work like a charm---I haven't had a drug-advert. or stock-report for several weeks now!

Of course, if your "Aunt Agatha" is sending you JPG images of her cat :-),

and she's NOT in your address-book they'll get trashed too

---but my mother used to tell me "don't worry about answering the phone during dinner---if it's important they'll call back later"

same thing applies to most email! ((I already have enough pictures of her cat ALREADY on file..) ;-P

---
BeSeeingYou...
David A. Lewis

[ Reply to This | # ]

Improve the accuracy of your spam filter
Authored by: dan55304 on Nov 03, '06 07:04:11AM

Okay, I now have many ways to ID gif spam.

I have a different problem. I view email as text. My lovely Windows friends like to send emails with boarders and other GIFs that come in as attachments in Mail. I would like to modify these scripts some to just remove image attachments that are there just for show.

Do any of the email systems send PNG or JPG smilies and borders? How do I make sure I don't filter out a photo someone has sent?



[ Reply to This | # ]
Improve the accuracy of your spam filter
Authored by: Peganthyrus on Nov 03, '06 02:39:48PM
The spam filter in Apple's Mail refused to train on these things no matter how many I marked as spam. SpamSieve, however, is kicking ass. Once I got it running it's been catching all those stock-scam GIFs, as well as the completely blank ones I keep getting. And legitimate stuff with attachments makes it through with no problem.

I've been regularly checking to make sure it's not marking real mail as spam; it's been doing fine, so far. I'm an artist who does freelance work over the Internet - if I started marking email with attached GIFs as always spam, I'd be throwing away a significant chunk of my business-related correspondence!

[ Reply to This | # ]

Improve the accuracy of your spam filter
Authored by: heavyboots on Nov 03, '06 03:03:04PM

Well, as IT guy I receive everything that our server-side ASSP mail filter thinks is spam, regardless of recipient, and my Mail is now correctly catching about 90-95% of them correctly, so I wouldn't say its impossible to train Mail--just tedious. At home, where I'm not able to cower behind a really good server-side spam filter like ASSP, I do an inverted junk filter. Everything goes to the Junk box unless the address is in my previous recipients or address book list.



[ Reply to This | # ]
Improve the accuracy of your spam filter
Authored by: AdrianB on Nov 12, '06 05:33:52AM
SpamSieve author Michael Tsai has this to say about this:

http://c-command.com/blog/2006/11/11/tell-spamsieve-the-truth/

[ Reply to This | # ]