An Experimental Study of Convolutional Neural Networks for Functional and Subject Classification of Web Pages

Codruţ-Georgian Artene,Marius Nicolae Tibeică,Florin Leon,Dumitru-Daniel Vecliuc

doi:10.1142/s2196888822500245

Codruţ-Georgian Artene, Marius Nicolae Tibeică + Show 2 more

Open Access

https://doi.org/10.1142/s2196888822500245

Copy DOI

Abstract

Information filtering and information retrieving applications are based on web page classification methods. Usually, web pages serve different functionalities or develop different topics or subjects. The diversity of web page content increases the need for automatic web page classification, making it a challenging task at the same time. Considering that the main component of the content of a web page is most often represented by the text and the classification of the text is a problem intensively studied in the last years, with researchers reporting state-of-the-art results for various methods, the idea of applying these methods on the text extracted from web pages could lead to important results. In this work, we revisit our experimental study on convolutional neural networks for multi-label multi-language web page classification with a new approach that consists of dividing the classification problem into functional classification and subject classification of web pages. From the experimental evaluation, one may conclude that the separation of the functional and subject classification of web pages leads to an improvement of the overall results.

Full Text