Optimal scale combination (OSC) selection plays a crucial role in multi-scale decision systems for data mining and knowledge discovery, and its aim is to select an appropriate subsystem for classification or decision-making while keeping a certain consistency criterion. Selecting the OSC with existing methods requires judging the consistency of all multi-scale attributes; however, judging consistency and selecting scales for unimportant multi-scale attributes increases the selection cost in vain. Moreover, the existing definitions of OSC are only applicable to rough set classifiers (RSCs), which makes the selected OSC perform poorly on other machine learning classifiers. To this end, the main objective of this paper is to investigate multi-scale attribute subset selection and OSC selection applicable to any classifier in generalized multi-scale decision systems. First, a novel consistency criterion based on the multi-scale attribute subset is proposed, which is called p-consistency criterion. Second, the relevance and redundancy among multi-scale attributes are measured based on the information entropy, and an algorithm for selecting the multi-scale attribute subset is given based on this. Third, an extended definition of OSC, called the accuracy OSC, is proposed, which can be widely applied to classification tasks using any classifier. On this basis, an OSC selection algorithm based on genetic algorithm is proposed. Finally, the results of many experiments show that the proposed method can significantly improve the classification accuracy and selection efficiency.
Read full abstract