Abstract

Data analysis workflows are often composed by many concurrent and compute-intensive tasks that can be efficiently executed only on scalable computing infrastructures, such as HPC systems, Grids and Cloud platforms. The use of Cloud services for the scalable execution of data analysis workflows is the key feature of the Data Mining Cloud Framework (DMCF), which provides a Web interface to develop data analysis applications using a visual workflow formalism. In this paper we describe how we extended DMCF to support also the design and execution of script-based data analysis workflows on Clouds. We introduce a workflow language, named JS4Cloud, that extends JavaScript to support the implementation of Cloud-based data analysis tasks and the handling of data on the Cloud. We also describe how data analysis workflows programmed through JS4Cloud are processed by DMCF to make parallelism explicit and to enable their scalable execution on Clouds. Finally, we present a data analysis application developed with JS4Cloud, and the performance results obtained executing the application with DMCF on the Windows Azure platform.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call