World Wide Web (WWW) is a platform that explores a wide range of information used for the development of web applications. Some examples of these applications include social network analysis, personalized item recommendations, and web page classification and ranking. Among these applications, search engines and web page ranking are particularly important as they consistently index and store billions of web pages on the internet. The main objective of this paper is to create an innovative framework for the classification and re-ranking of web pages using intelligent techniques. The framework is structured into two key phases: classification and re-ranking-based retrieval. In the initial classification phase, a series of pre-processing steps are implemented, including the elimination of HTML tags, punctuation, stop words, and the application of stemming. After pre-processing, a word-to-vector conversion is performed, followed by feature extraction utilizing Principal Component Analysis (PCA). This sequence of actions leads to optimal feature selection, which is vital for the precise classification of web pages. Given the multitude of features present in web pages that can compromise classification accuracy, this study employs a novel meta-heuristic algorithm, the Opposition Based-Tunicate Swarm Algorithm (O-TSA), to facilitate optimal feature selection. The refined features are subsequently processed through the Enhanced Convolutional-Recurrent Neural Network (E-CRNN), enhanced by O-TSA, resulting in the effective classification of diverse web page categories. In the second phase, the re-ranking process is executed using O-TSA, which establishes the objective function based on a similarity function (correlation) for URL matching, leading to optimal re-ranking of web.
Read full abstract