Discretization of continuous attributes is an important task in rough sets and many discretization algorithms have been proposed. However, most of the current discretization algorithms are univariate, which may reduce the classification ability of a given decision table. To solve this problem, we propose a supervised and multivariate discretization algorithm — SMDNS in rough sets, which is derived from the traditional algorithm naive scaler (called Naive). Given a decision table DT=(U,C,D,V,f), since SMDNS uses both class information and the interdependence among various condition attributes in C to determine the discretization scheme, the cuts obtained by SMDNS are much less than those obtained by Naive, while the classification ability of DT remains unchanged after discretization. Experimental results show that SMDNS is efficient in terms of the classification accuracy and the number of generated cuts. In particular, our algorithm can obtain a satisfactory compromise between the number of cuts and the classification accuracy.
Read full abstract