In many practical applications of machine learning, there are a large number of partially labeled categorical data due to the high cost of labeling data. The semi-supervised learning algorithm is needed to deal with such data. This paper studies the label prediction of partially labeled categorical data and considers semi-supervised attribute reduction in a partially labeled categorical decision information system (p-CDIS) with predicted labels. The labels of unlabeled data are first predicted by means of the conditional probability. Then, uncertainty measurement for a p-CDIS with predicted labels is studied, and the dependence and conditional information entropy (CIE) are defined. Next, based on the dependence and CIE, two attribute reduction algorithms are designed. In addition, the effect of label deletion rate (LDR) on the dependence, CIE and reduction results are also studied. Finally, the results of experiments and statistical tests on 16 categorical UCI datasets show that the designed algorithms are statistically better than some state-of-the-art algorithms in classification accuracy.
Read full abstract