Abstract

Nowadays, machine learning is playing a crucial role in harnessing the value of massive data amount currently produced every day. The process of building a high-quality machine learning model is an iterative, complex and time-consuming process that requires solid knowledge about the various machine learning algorithms in addition to having a good experience with effectively tuning their hyper-parameters. With the booming demand for machine learning applications, it has been recognized that the number of knowledgeable data scientists can not scale with the growing data volumes and application needs in our digital world. Therefore, recently, several automated machine learning (AutoML) frameworks have been developed by automating the process of Combined Algorithm Selection and Hyper-parameter tuning (CASH). However, a main limitation of these frameworks is that they have been built on top of centralized machine learning libraries (e.g. scikit-learn) that can only work on a single node and thus they are not scalable to process and handle large data volumes. To tackle this challenge, we demonstrate D-SmartML, a distributed AutoML framework on top of Apache Spark, a distributed data processing framework. Our framework is equipped with a meta learning mechanism for automated algorithm selection and supports three different automated hyper-parameter tuning techniques: distributed grid search, distributed random search and distributed hyperband optimization. We will demonstrate the scalability of our framework on handling large datasets. In addition, we will show how our framework outperforms the-state-of-the-art framework for distributed AutoML optimization, TransmogrifAI.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.