J-Measure Based Pruning for Advancing Classification Performance of Information Entropy Based Rule Generation

Han Liu,Mihaela Cocea,Weili Ding

doi:10.1109/icmlc.2018.8527063

Abstract

Learning of classification rules is a popular approach of machine learning, which can be achieved through two strategies, namely divide-and-conquer and separate-and-conquer. The former is aimed at generating rules in the form of a decision tree, whereas the latter generates if-then rules directly from training data. From this point of view, the above two strategies are referred to as decision tree learning and rule learning, respectively. Both learning strategies can lead to production of complex rule based classifiers that overfit training data, which has motivated researchers to develop pruning algorithms towards reduction of overfitting. In this paper, we propose a J-measure based pruning algorithm, which is referred to as Jmean-pruning. The proposed pruning algorithm is used to advance the performance of the information entropy based rule generation method that follows the separate and conquer strategy. An experimental study is reported to show how Jmean-pruning can effectively help the above rule learning method avoid overfitting. The results show that the use of Jmean-pruning achieves to advance the performance of the rule learning method and the improved performance is very comparable or even considerably better than the one of C4.5.

Full Text