Semantic Features with Contextual Knowledge-Based Web Page Categorization Using the GloVe Model and Stacked BiLSTM

Amit Kumar Nandanwar,Jaytrilok Choudhary

doi:10.3390/sym13101772

Amit Kumar Nandanwar, Jaytrilok Choudhary

Open Access

https://doi.org/10.3390/sym13101772

Copy DOI

Journal: Symmetry	Publication Date: Sep 23, 2021
Citations: 16	License type: CC BY 4.0

Affiliation: Maulana Azad National Institute of Technology

Abstract

Internet technologies are emerging very fast nowadays, due to which web pages are generated exponentially. Web page categorization is required for searching and exploring relevant web pages based on users’ queries and is a tedious task. The majority of web page categorization techniques ignore semantic features and the contextual knowledge of the web page. This paper proposes a web page categorization method that categorizes web pages based on semantic features and contextual knowledge. Initially, the GloVe model is applied to capture the semantic features of the web pages. Thereafter, a Stacked Bidirectional long short-term memory (BiLSTM) with symmetric structure is applied to extract the contextual and latent symmetry information from the semantic features for web page categorization. The performance of the proposed model has been evaluated on the publicly available WebKB dataset. The proposed model shows superiority over the existing state-of-the-art machine learning and deep learning methods.

Highlights

Nowadays, information available on the World Wide Web (WWW) is growing exponentially, due to which finding user-relevant web pages has become challenging and tedious
This paper proposed and implemented a model for web page categorization that utilized the GloVe and Stacked bidirectional Long Short Term Memory (LSTM) (BiLSTM)
Feature extraction and classifier design are crucial processes to achieve this task, and many machine learning models have shown a better performance in this field

Summary

Introduction

Information available on the World Wide Web (WWW) is growing exponentially, due to which finding user-relevant web pages has become challenging and tedious. A search engine either returns too many results or misinterprets the user query due to linguistic ambiguity [1]. Earlier research approaches have addressed the problem of web page classification as per the user’s preferences [15]. This is a simple document classification problem based on the textual contents and features of web pages. The classification of web pages is based on counting the frequency of text terms to form a term frequency feature vector. These feature vectors are applied to train the classifier to classify web pages. Feature vectors extracted from the title and main text of the web pages were utilized by the naive Bayesian classifier

Methods

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semantic Features with Contextual Knowledge-Based Web Page Categorization Using the GloVe Model and Stacked BiLSTM

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Similar Papers

Automatic Web Page Classification System with Improved Accuracy
Chait Hra ... Dr.G.M Lingaraju
Webology | VOL. 18
Chait Hra, et. al.Chait Hra ... Dr.G.M Lingaraju
23 Dec 2021
Webology | VOL. 18

Automatic Topic-Based Web Page Classification Using Deep Learning
Siti Hawa Apandi ... Norkhairi Ahmad
JOIV : International Journal on Informatics Visualization | VOL. 7
Siti Hawa Apandi, et. al.Siti Hawa Apandi ... Norkhairi Ahmad
30 Nov 2023
JOIV : International Journal on Informatics Visualization | VOL. 7

Web page classification based on heterogeneous features and a combination of multiple classifiers
Li Deng ... Ji-Zhong Shen
Frontiers of Information Technology & Electronic Engineering | VOL. 21
Li Deng, et. al.Li Deng ... Ji-Zhong Shen
01 Jul 2020
Frontiers of Information Technology & Electronic Engineering | VOL. 21

Classification & detection of near duplicate web pages using five stage algorithm
Eldhose P Sim
-
Eldhose P SimEldhose P Sim
01 Nov 2015
01 Nov 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semantic Features with Contextual Knowledge-Based Web Page Categorization Using the GloVe Model and Stacked BiLSTM

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry