A Theory of Evidence-based method for assessing frequent patterns

Francisco Guil,Roque Marín

doi:10.1016/j.eswa.2012.12.030

Abstract

Frequent itemset (or frequent pattern) mining is a very important issue within the data mining field. Both, syntactic simplicity and descriptive potential, are the key features of the itemset-based pattern which have led to its widespread use in a growing number of real-life domains. Some of the most representative algorithms for mining this kind of pattern are Apriori-like algorithms and, therefore, the number of patterns obtained under normal conditions is very large, making the process of evaluation and interpretation quite difficult. This problem is compounded if we consider that knowledge discovery is an iterative process, and the change in the parameters of the preprocessing techniques or the mining algorithm can lead to significant changes in the result. In this paper, we propose a method based on Shafer's Theory of Evidence which uses two information measures for the quality evaluation of the set of frequent patterns. From a practical point of view, the main goal is to select, for a given database, the best preprocessing technique that lead to the discovery of useful knowledge. Nevertheless, the underlying idea is to propose a formal method to assess, objectively, sets of frequent patterns, seen as belief structures, in terms of certainty in the information they represent.

Full Text