Abstract

Automatic web page classification has become inevitable for web directories due to the multitude of web pages in the World Wide Web. In this paper an improved Term Weighting technique is proposed for automatic and effective classification of web pages. The web documents are represented as set of features. The proposed method selects and extracts the most prominent features reducing the high dimensionality problem of classifier. The proper selection of features among the large set improves the performance of the classifier. The proposed algorithm is implemented and tested on a benchmarked dataset. The results show the better performance than most of the existing term weighting techniques.

Highlights

  • The rapid development of technology leads human beings and the devices to connect to internet and share the data

  • The results show the better performance than most of the existing term weighting techniques

  • With the similarity between the pages and its different attributes, the classifiers have a tough time to make the decision about the category of the web pages

Read more

Summary

Introduction

The rapid development of technology leads human beings and the devices to connect to internet and share the data. Ali Selamat and Sigeru Omatu [6] have proposed automatic categorization method that deals with the scaling problem of the World Wide Web. A news web page classification method (WPCM) uses a neural network with inputs obtained by both the principal components and class profile-based features. Behzad et al [16] investigated two different kinds of feature selection metrics (one-sided and two-sided) as a global component of term weighting schemes (called as tffs) in scenarios where different complexities and imbalance ratios are available They concluded that supervised term weighting methods based on one-sided term selection metrics are the best choice for SVM in the imbalanced datasets and k-NN algorithm usually perform well with tfidf. In this paper feature vectors of the web pages are classified using a new term weighting scheme.

Traditional and Proposed Term Weighting Scheme
Classifier and Its Training Method
Web Page Classification Method
Data Sets
Feature Extraction and Feature Selection
Neural Network Classifier
Evaluation
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.