Induction of classification rules by Gini-index based rule generation

Han Liu,Mihaela Cocea

doi:10.1016/j.ins.2018.01.025

Abstract

Rule learning is one of the most popular areas in machine learning research, because the outcome of learning is to produce a set of rules, which not only provides accurate predictions but also shows a transparent process of mapping inputs to outputs. In general, rule learning approaches can be divided into two main types, namely, ‘divide and conquer’ and ‘separate and conquer’. The former type of rule learning is also known as Top-Down Induction of Decision Trees, which means to learn a set of rules represented in the form of a decision tree. This approach results in the production of a large number of complex rules (usually due to the replicated sub-tree problem), which lowers the computational efficiency in both the training and testing stages, and leads to the overfitting of training data. Due to this problem, researchers have been gradually motivated to develop ‘separate and conquer’ rule learning approaches, also known as covering approaches, by learning a set of rules on a sequential basis. In particular, a rule is learned and the instances covered by this rule are deleted from the training set, such that the learning of the next rule is based on a smaller training set. In this paper, we propose a new algorithm, GIBRG, which employs Gini-index to measure the quality of each rule being learned, in the context of ‘separate and conquer’ rule learning. Our experiments show that the proposed algorithm outperforms both decision tree learning algorithms (C4.5 and CART) and ‘separate and conquer’ approaches (Prism). In addition, it also leads to a smaller number of rules and rule terms, thus being more computationally efficient and less prone to overfitting.

Full Text