Abstract

The plethora of analytic platforms escalates the difficulty of selecting the most appropriate analytic platform that fits the needed data mining task, the dataset as well as additional user-defined criteria. Especially analysts, who are rather focused on the analytics domain, experience difficulties to keep up with the latest developments. In this work, we introduce the ASAP-DM framework, which enables analysts to seamlessly use several platforms, whereas programmers can easily add several platforms to the framework. Furthermore, we investigate how to predict a platform based on specific criteria, such as lowest runtime or resource consumption during the execution of a data mining task. We formulate this task as an optimization problem, which can be solved by today’s classification algorithms. We evaluate the proposed framework on several analytic platforms such as Spark, Mahout, and WEKA along with several data mining algorithms for classification, clustering, and association rule discovery. Our experiments unveil that the automatic selection process can save up to 99.71% of the execution time due to automatically choosing a faster platform.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call