Abstract

Multi-label feature selection can effectively resolve the challenges of high or even ultra-high dimensionality in multi-label data. However, most existing multi-label feature selection algorithms can only handle a single data type, assume all labels are equally significant and utilize heuristic search strategies, which results in inefficient and relatively unsatisfactory classification accuracy. In view of the above shortcomings, this paper proposes a new multi-label feature selection algorithm that effectively resolves existing algorithms' issues through three innovative procedures. First, a new similarity relation metric is proposed to deal with hybrid data types effectively. Second, a label enhancement algorithm is designed to enhance and transform the logical labels into a label distribution by fully considering the analytic hierarchy process (AHP) embedded with label correlation, which can automatically identify the significance of different labels. Third, a feature weighting evaluation is redesigned in the feature selection process to obtain the optimal feature subset through feature ranking directly. Under these proposed procedures, multi-label feature selection can effectively utilize the abundant semantic information of the label significance and can significantly improve the operating accuracy and efficiency simultaneously. Comparative experiments are conducted on 20 real multi-label datasets with seven state-of-the-art multi-label feature selection algorithms. Experimental results show that the proposed multi-label feature selection algorithm in this paper is about 5–10% better than the algorithms in 80% of the compared datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call