Abstract

This paper proposes a modified chi square-based feature selection algorithm in conjunction with a random vector functional link network-based text classifier for improving the classification performance of multi-labeled text documents with unbalanced class distributions. In the proposed feature selection method, maximum features are selected from classes that have a great deal of training and testing documents as an improvement towards original chi-square method. On two benchmark datasets that are multi-labeled, multi-class, and unbalanced, a comparison of the model with three conventional selection techniques such as chi-square, term frequency-inverse document frequency, and mutual information is accumulated for assessing its effectiveness. Additionally, the proposed model is compared with four different classifiers. In the study, it was found that the proposed model performs better in terms of precision, recall, f-measure, and hamming losses and is able to select the majority of true positive documents despite an unbalanced class distribution for both the datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.