Abstract

With the start of big data era, data stream computing has emerged as a well-known approach to optimize data-intensive workflows. Apache STORM is an open-source real-time distributed computation system for processing data streams and has been opted by famous organizations such as Twitter, Yahoo, Alibaba, Baidu, Groupon. The workflows are implemented as topologies in STORM. The main aspect that controls the execution performance of a workflow in STORM is the strategy of scheduling the topology components (spout and bolts). In this paper, we evaluate and analyze the performance of our algorithm Partition-based Data-intensive Workflow optimization Algorithm (PDWA) in Apache STORM using a use case workflow, EURExpressII. It is a real-world application-based workflow that builds a transcriptome-wide atlas of gene expression for the developing mouse embryo established by ribonucleic acid (RNA) in situ hybridization. Our proposed algorithm, PDWA, partitions the application task graph so that the data movement between partitions is minimum. Each partition is then mapped on one machine for the execution of tasks of that partition. It provides minimum execution time for that particular partition. Partial task duplication is also part of this algorithm that enhances the performance. A STORM-based computing cluster is developed in OpenStack cloud which is used as a computing environment. The performance of PDWA-based optimizer is evaluated with the data sets of different sizes. The achieved results show that PDWA performs with 21% improved average execution time for different sizes of data sets and varying execution nodes. In addition, the comparative results show that on average the efficiency of PDWA is 20.4% higher as compared to STORM default scheduler (SDS).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.