Abstract

Users and operators of cloud-based Spark clusters often require quick insights on how the execution time of an application is likely to be impacted by the resources allocated to the application, e.g., the number of Spark executor cores assigned, and the size of the data to be processed. Existing techniques typically require extensive prior executions of the application under various resource allocation settings and data sizes to obtain an accurate model. In this paper, we explore the accuracy of a model with less prior executions of the application. Such a model can be useful for situations where quick predictions are required and little cluster resources are available for building a model. We use logs from two executions of an application with small sample data and different resource settings and explore the accuracy of the predictions for other resource allocation settings and input data sizes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.