2009-03-05
Damn spam. Don't get me wrong, I like spam of the Hormel variety, I'm talking about internet blog spam.
Having imported blog postings as well as blog comments from the exported blogspot XML file, it was time to create a mechanism for allowing new comments to be submitted by readers. Within hours of uploading the new code, the first piece of blog spam was posted.
Honestly, I was somewhat happy. It had to happen eventually and I was hoping to get some blog spam text with which to test some spam detecting code. The basics go something like this:
If the text has two or more flags, the form will ask the commenter if they are a spammer. Yea, that usually takes care of the bots.
mmmmm spam
Having imported blog postings as well as blog comments from the exported blogspot XML file, it was time to create a mechanism for allowing new comments to be submitted by readers. Within hours of uploading the new code, the first piece of blog spam was posted.
Honestly, I was somewhat happy. It had to happen eventually and I was hoping to get some blog spam text with which to test some spam detecting code. The basics go something like this:
- text has HTML, that's a flag
- text has bbcode, that's a flag
- more than 40% of the text is html and bbcode, that's a flag
- the percentage of unique words in the text is < 40%, that's a flag
If the text has two or more flags, the form will ask the commenter if they are a spammer. Yea, that usually takes care of the bots.
mmmmm spam
Comments