Abstract

The classification of large-scale textual dataset is associated with a huge number of instances and millions of features which must be discriminated between large numbers of categories. The task requires the utilization of a defined hierarchy structure and tools that automatically classify instances within the hierarchy known as Large Scale Hierarchical Text Classification (LSHTC). Predicting the labels of instances by the employed classifiers is challenging due to the high number of features. Furthermore, the existing Dimensional Reduction (DR) approaches in cooperation with the LSHTC framework are still quite inefficient. In such a problem, an effective Hierarchical Dimensional Reduction approach can be advantageous in improving the performance of the LSHTC. Therefore, in this paper, we enhance the performance of LSHTC by proposing a Multi-stage Hierarchical Dimensional Reduction (MHDR) approach based on Modified Feature Hashing (MFH) and Hierarchical Bi-Filtering (HBF) method. In addition to alleviating bad collision and result discrepancy, experimental results show that the proposed approach has achieve the best performance in terms of micro-f1 and macro-f1 by recording average scores of 58.47% and 54.77% using TD-SVM, and average scores of 51.14% and 48.70% using TD-LR, respectively. The method also achieved 11% speed-up than the approaches compared.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call