Automating the identification of the genre of web pages becomes a promising research area in web pages classification, as it can be used to improve the quality of the web search result and to reduce search time. Many studies have been proposed to identify the genre of web pages. These studies differ with respect to three main factors which are the features used, the classification algorithm and the list of genres used for the evaluation. The main idea of this paper is to combine the predictions produced by different classifiers using the internal and external structures of a web page. To combine the predictions of the different classifiers we used different OWA operators and the Dempster-Shafer (DS) combination rule. Moreover, we proposed an improved DS combination method based on the ranks of the predicted genres. The experiments conducted using the two known datasets (KI-04 and SANTINIS), show that our study achieves better results in comparison with other ensemble classifiers and genre identification works as well.
Read full abstract