Abstract
Automatic web page classification has become inevitable for web directories due to the multitude of web pages in the World Wide Web. In this paper an improved Term Weighting technique is proposed for automatic and effective classification of web pages. The web documents are represented as set of features. The proposed method selects and extracts the most prominent features reducing the high dimensionality problem of classifier. The proper selection of features among the large set improves the performance of the classifier. The proposed algorithm is implemented and tested on a benchmarked dataset. The results show the better performance than most of the existing term weighting techniques.
Highlights
The rapid development of technology leads human beings and the devices to connect to internet and share the data
The results show the better performance than most of the existing term weighting techniques
With the similarity between the pages and its different attributes, the classifiers have a tough time to make the decision about the category of the web pages
Summary
The rapid development of technology leads human beings and the devices to connect to internet and share the data. Ali Selamat and Sigeru Omatu [6] have proposed automatic categorization method that deals with the scaling problem of the World Wide Web. A news web page classification method (WPCM) uses a neural network with inputs obtained by both the principal components and class profile-based features. Behzad et al [16] investigated two different kinds of feature selection metrics (one-sided and two-sided) as a global component of term weighting schemes (called as tffs) in scenarios where different complexities and imbalance ratios are available They concluded that supervised term weighting methods based on one-sided term selection metrics are the best choice for SVM in the imbalanced datasets and k-NN algorithm usually perform well with tfidf. In this paper feature vectors of the web pages are classified using a new term weighting scheme.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of Intelligent Learning Systems and Applications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.