Abstract

SummaryCloud computing has become popular for training deep‐learning (DL) models, avoiding the costs of acquiring and maintaining on‐premise systems. SageMaker is a cloud service that automates the execution of DL workloads. Its features include automatic hyperparameter optimization and use of spot instances. Nonetheless, it does not assist in selecting the right instance type for a workload. In public clouds, rent price depends on the configuration of the chosen instance type. Advanced and faster instances are typically more expensive, but not always the best choice. To select the optimal instance type, users must compare the workload's relative performance (and hence cost) on several candidates. Building on the execution profiles of multiple DL applications, we model the performance and cost of training DL applications on SageMaker and propose a lightweight technique to estimate these at low temporal and monetary cost. This method is a performance proxy that can be used to replace more expensive performance measurement procedures. So, it could speed up any technique that relies on such measurements. We show how it can help cloud customers seeking suitable instance types to train DL models, and that it can accurately predict the performance of different instance types when training these models on SageMaker.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.