Significant Frequent Item Sets Via Pattern Spectrum Filtering

Christian Borgelt,David Picado-Muiño

doi:10.1007/978-3-319-26986-3_4

Abstract

Frequent item set mining often suffers from the grave problem that the number of frequent item sets can be huge, even if they are restricted to closed or maximal item sets: in some cases the size of the output can even exceed the size of the transaction database to analyze. In order to overcome this problem, several approaches have been suggested that try to reduce the output by statistical assessments so that only significant frequent item sets (or association rules derived from them) are reported. In this paper we propose a new method along these lines, which combines data randomization with so-called pattern spectrum filtering, as it has been developed for neural spike train analysis. The former serves the purpose to implicitly represent the null hypothesis of independent items, while the latter helps to cope with the multiple testing problem resulting from a statistical evaluation of found patterns.

Full Text