Blogs are highly popular media of the Web 2.0. Spam blogs (splogs), however, interrupt normal information retrieval and waste network resources. Previous studies for detecting splogs are not always effective in coping with massively-generated splogs. In this paper, we propose a new method for detecting splogs. Our method aims to detect posts generated by machines. It is based on relational properties between posts. The key idea is that a splog has a structure similar in appearances to other posts and contains links that collectively direct to a specific target page or pages. Structural similarity between posts and URL biasedness in the posts of a blog will be used to decide whether or not the blog is a splog.
Read full abstract