Discovering frequent pattern pairs

Carlos Ordonez,Zhibo Chen

doi:10.3233/ida-130614

Abstract

Cubes and association rules discover frequent patterns in a data set, most of which are not significant. Thus previous research has introduced search constraints and statistical metrics to discover significant patterns and reduce processing time. We introduce cube pairs comparing cube groups based on a parametric statistical test and rule pairs based on two similar association rules, which are pattern pair generalizations of cubes and association rules, respectively. We introduce algorithmic optimizations to discover comparable pattern sets. We carefully study why both techniques agree or disagree on the validity of specific pairs, considering p-value for statistical tests, as well as confidence for association rules. In addition, we analyze the probabilistic distribution of target attributes given confidence thresholds. We also introduce a reliability metric based on cross-validation, which enables an objective comparison between both patterns. We present an extensive experimental evaluation with real data sets to understand significance and reliability of pattern pairs. We show cube pairs generally produce more reliable results than rule pairs.

Full Text