Spam Content Research Articles

Over the last years, Internet spam content has spread enormously inside web sites mainly due to the emergence of new web technologies oriented towards the online sharing of resources and information. In such a situation, both academia and industry have shown their concern to accurately detect and effectively control web spam, resulting in a good number of anti-spam techniques currently available. However, the successful integration of different algorithms for web spam classification is still a challenge. In this context, the present study introduces WSF2, a novel web spam filtering framework specifically designed to take advantage of multiple classification schemes and algorithms. In detail, our approach encodes the life cycle of a case-based reasoning system, being able to use appropriate knowledge and dynamically adjust different parameters to ensure continuous improvement in filtering precision with the passage of time. In order to correctly evaluate the effectiveness of the dynamic model, we designed a set of experiments involving a publicly available corpus, as well as different simple well-known classifiers and ensemble approaches. The results revealed that WSF2 performed well, being able to take advantage of each classifier and to achieve a better performance when compared to other alternatives. WSF2 is an open-source project licensed under the terms of the LGPL publicly available at https://sourceforge.net/projects/wsf2c/.

PurposeThis purpose of this paper is to discuss some of the problems that exist with Google Scholar, particularly regarding content spam and citation spam.Design/methodology/approachThe paper provides an analysis of how Google Scholar has been duped by real but manipulated documents and reference lists, as well as by fake documents and references. Details of research regarding the duping of Google Scholar is presented and a possible solution is offered.FindingsResearchers showed how easy it was to dupe Google Scholar. In one case, the researchers added invisible words to the first page of one of their conference papers (using the well‐known white letter on white screen/paper technique), and modified the content and bibliography of some of their already published papers, then posted them on the web to see if Google Scholar would bite, i.e. would improve their rank position, and increase the number of citations that the targeted papers received, and the number of papers published by the authors. Google Scholar did bite. While the size of Google Scholar kept growing at an impressive rate, the intellectual growth of the Google Scholar software has been stunted.Originality/valueThe paper makes the point that the best move from Google Scholar would be to realise that the existing metadata created by competent human indexers, cataloguers, librarians and other information professionals for tens of millions of scholarly documents is far superior to the parser's results.

Spam Content Research Articles

Related Topics

Articles published on Spam Content

A dynamic model for integrating simple web spam classification techniques

A Comparative Study of Email Forensic Tools

A link and Content Hybrid Approach for Arabic Web Spam Detection

The Study of Content Security for Mobile Internet

Content-based analysis to detect Arabic web spam

Google Scholar duped and deduped – the aura of “robometrics”

Ethical dimensions of spam

A Chronicle of a Journey

Knowledge Transfer

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Spam Content Research Articles

Related Topics

Articles published on Spam Content

A dynamic model for integrating simple web spam classification techniques

A Comparative Study of Email Forensic Tools

A link and Content Hybrid Approach for Arabic Web Spam Detection

The Study of Content Security for Mobile Internet

Content-based analysis to detect Arabic web spam

Google Scholar duped and deduped – the aura of “robometrics”

Ethical dimensions of spam

A Chronicle of a Journey

Knowledge Transfer