Feature selection for multi-label text data: An ensemble approach using geometric mean aggregation

Mohsen Miri,Amin Hashemi,Mohammad Bagher Dowlatshahi

doi:10.1109/cfis54774.2022.9756484

Abstract

Text datasets have many terms, which decrease the classification accuracy. According to the high-dimensional text data, there are more challenges for these methods. Each classification method has strengths and weaknesses in its feature selection function. Therefore, ensembling should be used for better classifications and exploitation of strengths. In this paper, for the first time, we have presented an ensemble multi-label (ML) feature selection method for the text datasets using the Geometric-Mean (GM) aggregation approach. For this purpose, we have used four multi-label feature selection (MLFS) algorithms with different structures to achieve a good result. Then, the performance and results obtained by the GM method are compared with the four algorithms and based on the six classification criteria on three ML datasets with text domains. According to the obtained outputs, it is possible to realize the ability of the GMA (proposed method) method in using strengths and ignoring weaknesses in the path of feature selection, and be more accurate. Finally, according to experiments and obtained results, superiority of the GMA over other methods can be well seen.

Full Text