An improved multi-stage framework for large-scale hierarchical text classification problems using a modified feature hashing and bi-filtering strategy

Abubakar Ado,Abdulkadir Abubakar Bichi,Usman Haruna,Mohammed Almaiah,Yahaya Garba Shawai,Rommel Alali,Tayseer Alkhdour,Theyazn H.H Aldhyani,Mahmoad Al-Rawad,Rami Shehab

doi:10.5267/j.ijdns.2024.6.012

Abubakar Ado, Abdulkadir Abubakar Bichi + Show 8 more

Open Access

https://doi.org/10.5267/j.ijdns.2024.6.012

Copy DOI

Abstract

The classification of large-scale textual dataset is associated with a huge number of instances and millions of features which must be discriminated between large numbers of categories. The task requires the utilization of a defined hierarchy structure and tools that automatically classify instances within the hierarchy known as Large Scale Hierarchical Text Classification (LSHTC). Predicting the labels of instances by the employed classifiers is challenging due to the high number of features. Furthermore, the existing Dimensional Reduction (DR) approaches in cooperation with the LSHTC framework are still quite inefficient. In such a problem, an effective Hierarchical Dimensional Reduction approach can be advantageous in improving the performance of the LSHTC. Therefore, in this paper, we enhance the performance of LSHTC by proposing a Multi-stage Hierarchical Dimensional Reduction (MHDR) approach based on Modified Feature Hashing (MFH) and Hierarchical Bi-Filtering (HBF) method. In addition to alleviating bad collision and result discrepancy, experimental results show that the proposed approach has achieve the best performance in terms of micro-f1 and macro-f1 by recording average scores of 58.47% and 54.77% using TD-SVM, and average scores of 51.14% and 48.70% using TD-LR, respectively. The method also achieved 11% speed-up than the approaches compared.

Full Text