Quality Information Retrieval for the World Wide Web

Milly Kc,Markus Hagenbuchner,Ah Chung Tsoi

doi:10.1109/wiiat.2008.378

Abstract

The World Wide Web is an unregulated communication medium which exhibits very limited means of quality control. Quality assurance has become a key issue for many information retrieval services on the Internet, e.g. web search engines. This paper introduces some quality evaluation and assessment methods to assess the quality of Web pages. The proposed quality evaluation mechanisms are based on a set of quality criteria which were extracted from a targeted user survey. A weighted algorithmic interpretation of the most significant user quoted quality criteria is proposed. In addition, the paper utilizes machine learning methods to produce a prediction of quality for Web pages before they are downloaded. The set of quality criteria allows us to implement a Web search engine with quality ranking schemes, leading to Web crawlers which can crawl directly quality Web pages. The proposed approaches produce some very promising results on a sizeable Web repository.

Full Text