Association Rules and Frequent Patterns

Giuseppe Di Fatta

doi:10.1016/b978-0-12-809633-8.20333-6

Abstract

Large datasets of transactional records in the form of co-occurring events, variables or features may contain interesting knowledge in terms of implicit relations and patterns. Association Rule Mining (ARM) is the systematic extraction of frequent patterns from data in the form of rules that exposes and explicitly represents the relation between variables. It is a ‘descriptive’ data mining technique, which provides a compact and high level description of interesting patterns found in historical data. ARM has been successfully employed in many application domains such as market basket analysis, Web user behaviour, substructures of online social networks, intrusion detection in communication networks, co-expression of genes in bioinformatics, substructures of molecular compounds in chemoinformatics, etc. The information represented by means of association rules can be used as the basis for decisions, to discover regularities in the data and, in general, to formulate new scientific hypotheses driven by the data. ARM is computationally hard to solve and practical applications require critical design choices in the data analysis workflow, including data pre-processing, the data layout, algorithm selection and tuning of the algorithm’s parameters. Many ARM algorithms have been proposed for more than two decades: Some are suitable for specific data formats, properties or database layouts, others solve a reduced or an extended formulation of the ARM problem for improving efficiency or the interestingness of the result. In this article the ARM problem is introduced and its complexity discussed, the most important algorithms and the extended formulations are briefly described, and some applications are finally provided.

Full Text