Improving Association Rules by Optimizing Discretization Based on a Hybrid GA: A Case Study of Data from Forest Ecology Stations in China

Jianxin Wang,Fan Yang,Xiaoli Dong,Baojiang Cui,Ben Xu

doi:10.1109/eidwt.2013.113

Abstract

Association rule is one of the key techniques for data mining and knowledge discovery in databases. Before mining association rules from numerical data, however, the variable domains are required to be partitioned into sections first (i.e. the data should be discretized), which will directly affect the quality of association rules to be generated. But it is infeasible to find the best combination of dividing points in polynomial time, since the problem is an NP-complete one. We search the optimal combination of dividing points from continuous intervals by employing genetic algorithms (GA), in which the properties of strong association rules correspondingly yielded are treated as fitness function to guide the algorithm iteration. Operations in GA, together with sampling technique and hill climbing algorithm, are discussed in detail. Experimental results show that association rules are generated with good properties in quantity, support, and confidence. The proposed approach is successfully applied to mine massive data accumulated in the forest ecological stations widely distributed in China. In addition, the methods and algorithms are general and are ready to be adjusted and applied to produce good-property association rules in other fields where the variable domains are yet to be partitioned precisely or completely.

Full Text