Abstract

Information filtering and information retrieving applications are based on web page classification methods. Usually, web pages serve different functionalities or develop different topics or subjects. The diversity of web page content increases the need for automatic web page classification, making it a challenging task at the same time. Considering that the main component of the content of a web page is most often represented by the text and the classification of the text is a problem intensively studied in the last years, with researchers reporting state-of-the-art results for various methods, the idea of applying these methods on the text extracted from web pages could lead to important results. In this work, we revisit our experimental study on convolutional neural networks for multi-label multi-language web page classification with a new approach that consists of dividing the classification problem into functional classification and subject classification of web pages. From the experimental evaluation, one may conclude that the separation of the functional and subject classification of web pages leads to an improvement of the overall results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call