Distributed Mining of Contrast Patterns

David Savage,Xinghou Yu,Pauline Chou,Qingmai Wang,Xiuzhen Zhang

doi:10.1109/tpds.2016.2637914

Abstract

In this paper we propose a novel algorithm for mining contrast patterns using a distributed, map-reduce like framework. Contrast patterns describe differences between contrasted data sets and have previously been used for building highly accurate classifiers. However, mining for contrast patterns is a computationally expensive task and existing algorithms are designed to run in a sequential manner on a single machine. Consequently, existing approaches are unable to handle dense, high volume and high dimensional databases. Our algorithm addresses this problem by partitioning the search-space for contrast patterns into small, independent units. These units can be mined in parallel, providing a scalable solution for mining large data sets. Using three different real-world data sets we test an implementation of our algorithm on a Spark cluster. Results of these tests indicate that our algorithm achieves a high-degree of parallelism and scalability.

Full Text