Many interestingness measures have been proposed for mining meaningful association rules among two events in the form of A→B, but their characteristics and semantic similarity relations have not been comprehensively investigated. This paper presents a scenario-based approach for characterizing sixty-one commonly used measures and revealing their relationships in three steps. The first step generates a set of 969 three-probability scenarios, S={s|s=(p(A),p(B),p(A,B))∧p(A),p(B),p(A|B)∈[0,1]∧p(A,B)⩽min(p(A),p(B))}, in consideration of all possible situations in the range of 0.0 to 1.0 with a step of 0.05, excluding infinity and not-a-number cases. In the second step, 937,992 pairs of scenarios are enumerated, and for each scenario pair s1 and s2, the values of a measure (M) of s1 and s2, i.e., M(s1) and M(s2), are compared with the result of greater-than (M(s1)>M(s2)), smaller-than (M(s1)<M(s2)), or equal-to (M(s1)=M(s2)) for characterizing the measure. The final step is based on three types of relations: (1) behavior-based, (2) correlation-based, and (3) association-based similarity relations. The behavior of measures is depicted using nine common algebraic/statistical properties and four special condition properties, i.e., zero, min–max, infinity, and not-a-number of the measures. Similarities among the measures can be examined by grouping measures based on their properties. With three correlation functions, i.e., correlation coefficient, joint entropy, and mutual information, a correlation analysis was performed to discover relations among interestingness measures in the form of dendrograms and clusters with thresholding. Finally, the details of the relations among these interestingness measures are explored with association rule mining. Besides support, confidence, and lift, we propose five types of rules, i.e. same-directionrule (S-rule), opposite-direction rule (O-rule), equal-both rule (E-rule), equal-left rule (EL-rule),and equal-right rule (ER-rule) for a five-gradient comparison of any two measures to outline their similarities and dissimilarities in five directions.
Read full abstract