Abstract

Over the decades World Wide Web has become abundance source of distributed web content repository hyper-linked with diverse information domains. Performance of search engines in locating the information is exemplary but still there is inadequacy in search engines for focused crawling of web content. Web Page Classification being pivotal for information retrieval and management task plays imperative role for natural language processing in creating classified web document repositories and building indexed web directories. The conventional machine learning approaches extract the desired features from web pages in order to classify them whereas deep leaning algorithms learns the covet features as the network goes deeper and deeper. Transfer learning based Pre-trained models such as BERT attains impressive performance for text classification. In this study, we evaluate the effectiveness of adopting pre-trained model BERT for the task of classifying web pages into different categories. In this paper, we proposed an ensemble approach for web page classification by learning contextual representation using pre-trained bidirectional BERT and then applying deep Inception modelling with Residual connections for fine-tunes the target task by utilizing parallel multi-scale semantics. Experimental evaluation exhibit that proposed ensemble model outperforms benchmark baselines and achieve better performance in contrast to other transfer learning approaches evaluated on the web page classification task for different classification datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call