Abstract

In this paper we propose a novel algorithm for mining contrast patterns using a distributed, map-reduce like framework. Contrast patterns describe differences between contrasted data sets and have previously been used for building highly accurate classifiers. However, mining for contrast patterns is a computationally expensive task and existing algorithms are designed to run in a sequential manner on a single machine. Consequently, existing approaches are unable to handle dense, high volume and high dimensional databases. Our algorithm addresses this problem by partitioning the search-space for contrast patterns into small, independent units. These units can be mined in parallel, providing a scalable solution for mining large data sets. Using three different real-world data sets we test an implementation of our algorithm on a Spark cluster. Results of these tests indicate that our algorithm achieves a high-degree of parallelism and scalability.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.