antispam is written in Python and is distributed under GNU GPL license.
antispam goal is to be light, fast, easy to configure and block all spam (it may block non-spam...).
Rules
Line rules
Line rules compute a score for each line of text. It uses text patterns with black list and white list. Examples:
Match text pattern (-5.0): debian Match text pattern (-1.0): linux Match text pattern (10.0): viagra
URL rules
URL rules find all urls. Whitelist urls has negative score, and other gets default score of 1. Examples:
Match URL (-5.0): http://software.inl.fr/trac/trac.cgi/wiki/ Match URL (-1.0): http://www.nufw.org/ Match URL (+1.0): http://el-diario-de-juarez.acdiplomf.cn
After first filtering (skip url with negative score), domain rate is computed: number of unique main domain / number of domains. Main domain is "inl.fr" for "software.inl.fr". If the rate is bigger than 3, it gets a score of +5. Example with 10 subdomains of acdiplomf.cn:
Match URL (+1.0): http://el-diario-de-juarez.acdiplomf.cn Match URL (+1.0): http://aruba-teen-missing.acdiplomf.cn Match URL (+1.0): http://lisa-raye-wedding.acdiplomf.cn ... Match URL (+1.0): http://bach-pamela-picture.acdiplomf.cn Domain rate (+5.0): 10.0 url/domain
Text rules
Rules applied to the whole text.
ShortText removes all links, HTML tags, characters different than letters, and then count text length:
CumForCover! :) <a href="http://groups.google.com/group/cumforcover/web/">Cumforcover</a> | http://groups.google.com/group/cumforcover/web/
Message score:
Match URL (+1.0): http://groups.google.com/group/cumforcover/web/ Match URL (+1.0): http://groups.google.com/group/cumforcover/web/ Short text (+4.0): (len=11) "CumForCover" -stdin- score: +6.00 ***SPAM***
Email rules
Find all email addresses in the text. DomainRateRule computes email score depending of the domain. Example:
Match email domain (+1.0): gmail.com
Configure
To configure antispam, you have to define whitelists: use --whitelist and --domain options. To avoid false positive, you can use --default=SCORE with negative score. Eg. --default=-2 allows 2 externals URLs.
Download
svn co http://haypo.hachoir.org/svn/antispam/trunk antispam
Why not using xxx project?
SpamAssassin and Bogofilter targets email spam which is different than blog or forum spam. We have few informations about the sender (only the IP), no attachment, no MIME encoding, etc. Other service like Akismet are commercial and unfree (source code is not available).
Similar projects
- rss_score: similar project but to sort news instead of filtering spam
- Bayesian filter:
- Bogofilter
- (for Dotcler) SpamClear
- SpamAssassin
- Bad behavior (GPLv2, written in PHP)
- Trac antispam
- Wordpress: Spam Karma
- DotClear: Spamplemousse
Non-free: Akismet, ...
Links
- Combating Comment Spam (WordPress)
- Common Spam Words (WordPress)
- Nouveaux mots-clefs dans Spamplemousse (old keyword list, 2006)
- surbl.org
- Distributed Spam Harvester Tracking Network | Project Honey Pot
