A dynamic model for integrating simple web spam classification techniques

Jorge Fdez-Glez,David Ruano-Ordas,José Ramón Méndez,Florentino Fdez-Riverola,Rosalía Laza,Reyes Pavón

doi:10.1016/j.eswa.2015.06.043

Abstract

Over the last years, Internet spam content has spread enormously inside web sites mainly due to the emergence of new web technologies oriented towards the online sharing of resources and information. In such a situation, both academia and industry have shown their concern to accurately detect and effectively control web spam, resulting in a good number of anti-spam techniques currently available. However, the successful integration of different algorithms for web spam classification is still a challenge. In this context, the present study introduces WSF2, a novel web spam filtering framework specifically designed to take advantage of multiple classification schemes and algorithms. In detail, our approach encodes the life cycle of a case-based reasoning system, being able to use appropriate knowledge and dynamically adjust different parameters to ensure continuous improvement in filtering precision with the passage of time. In order to correctly evaluate the effectiveness of the dynamic model, we designed a set of experiments involving a publicly available corpus, as well as different simple well-known classifiers and ensemble approaches. The results revealed that WSF2 performed well, being able to take advantage of each classifier and to achieve a better performance when compared to other alternatives. WSF2 is an open-source project licensed under the terms of the LGPL publicly available at https://sourceforge.net/projects/wsf2c/.

Full Text