Attribute reduction in partially labeled data, also called semi-supervised attribute reduction, is an important issue. In recent years, the research on semi-supervised attribute reduction has attracted the attention of many scholars. Unfortunately, most existing semi-supervised attribute reduction methods do not handle the information loss caused by missing labels well. Meanwhile, these methods in general only consider the relevance between attributes and labels to measure attribute correlations, which ignores the irrelevant information contained in the attributes with respect to the labels. In view of this, this paper proposes a novel semi-supervised attribute reduction algorithm considering attribute relevance, redundancy and label irrelevance from the perspective of label distribution. Firstly, the membership degree of unlabeled objects relative to labels is defined by fuzzy similarity relation, which implements information restoration and converts partially labeled data into label distribution data. Secondly, some fuzzy uncertainty measures for label distribution are defined and related properties are investigated accordingly. Additionally, considering that irrelevant information brought by attributes may lead to over-fitting, label irrelevance criterion based on fuzzy uncertainty measures is constructed. Thirdly, a novel semi-supervised attribute reduction algorithm via the maximum relevance, minimum redundancy, and minimum irrelevance is proposed. Finally, compared with the representative semi-supervised attribute reduction algorithms and supervised attribute reduction algorithm, the effectiveness of the proposed algorithm is verified by various experiments.
Read full abstract