Abstract

A general task in data mining consists in finding all rectangles of 1 in a boolean matrix in which the order of the rows and columns is not important. However, most algorithms which have been developed to solve this task are unable to be adapted to real data that may contain noise. The effect of the noise is to shatter relevant item sets into a set of small irrelevant item sets, yielding an explosion in the number of resulting item sets. Recent algorithms that have been proposed to address this problem suffer from various limitations such as the large number of results, the execution time which remains very high and the inability to discover overlapping patterns. In this work, we propose a new heuristic approach based on a graph algorithm for the efficient extraction of item set patterns in noisy binary contexts. This method is based on maximal flow/minimal cut algorithms to find dense sub graphs of 1 in the graph associated to the boolean data matrix. To evaluate our approach, various experiments have been performed on both synthetic data and real datasets from bioinformatic applications. We have compared our results on various synthetic datasets and a gene-expression data with various methods and demonstrate that i) our method is quite efficient ii) the patterns extracted by our algorithm have a better quality than the other methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call