Abstract
Web spam techniques aim to mislead search engines so that web spam pages get ranked higher than they deserve. This leads to misleading search results as spam pages might appear in search results although the content of these spam pages might not be related to the search terms. Despite the effort of search engines to deploy various techniques to detect and filter out web spam pages from being listed in their search results, spammers continue to develop new tactics to evade search engines detection mechanisms. In this paper, we study the effectiveness and accuracy of newly developed anti-spamming techniques in Google search engine. Focusing on Arabic spam pages, our study results show that Google anti-spamming techniques are ineffective against spam pages with Arabic content. We explore various types of web spam detection features to obtain an appropriate set of detection features that yield a reasonable detection accuracy. In order to build and evaluate our classifier, we collect and manually label a dataset of Arabic web pages, including both benign and spam pages. We believe this Arabic web spam corpus helps researchers in conducting sound measurement studies. We also develop a browser plug-in that utilizes our classifier and warns the user about web spam pages before accessing them, upon clicking on a search term. The plug-in has also the ability to filter out search engine results.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.