CLIP4: Hybrid inductive machine learning algorithm that generates inequality rules

K Cios

doi:10.1016/s0020-0255(03)00415-8

Abstract

The paper describes a hybrid inductive machine learning algorithm called CLIP4. The algorithm first partitions data into subsets using a tree structure and then generates production rules only from subsets stored at the leaf nodes . The unique feature of the algorithm is generation of rules that involve inequalities. The algorithm works with the data that have large number of examples and attributes, can cope with noisy data, and can use numerical, nominal, continuous, and missing-value attributes. The algorithm's flexibility and efficiency are shown on several well-known benchmarking data sets, and the results are compared with other machine learning algorithms. The benchmarking results in each instance show the CLIP4's accuracy, CPU time, and rule complexity. CLIP4 has built-in features like tree pruning, methods for partitioning the data (for data with large number of examples and attributes, and for data containing noise), data-independent mechanism for dealing with missing values, genetic operators to improve accuracy on small data, and the discretization schemes . CLIP4 generates model of data that consists of well-generalized rules, and ranks attributes and selectors that can be used for feature selection.

Full Text