Abstract

Search engines use several criteria to rank webpages and choose which pages to display when answering a request. Those criteria can be separated into two notions, relevance and popularity. The notion of popularity is calculated by the search engine and is related to links made to the webpage. Malicious webmasters want to artificially increase their popularity; the techniques they use are often referred to as Webspam. It can take many forms and is in constant evolution, but Webspam usually consists of building a specific dedicated structure of spam pages around a given target page. It is important for a search engine to address the issue of Webspam; otherwise, it cannot provide users with fair and reliable results. In this paper, the authors propose a technique to identify Webspam through the frequency language associated with random walks among those dedicated structures. The authors identify the language by calculating the frequency of appearance of k-grams on random walks launched from every node.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.