Abstract

Big data analytic frameworks, such as MapReduce, Spark and Flink, have recently gained more popularity to process large data. Flink is an open-source Apache-hosted big data analytic framework for processing batch and streaming data. For historical data processing (batch), Flink's query optimiser is built based on techniques which have been used in the parallel database systems. Flink query optimiser translates the queries into jobs which are repeatedly submitted with similar tasks. Therefore, exploiting the similarity of tasks can avoid redundant computation. In this paper, Flink multi-query optimisation system, Flink-MQO, has been proposed and built on top of Flink software stack. It is considered as an add-on to Apache Flink to optimise multi-query based on data sharing. The Flink-MQO system exploits the data sharing opportunities of selection operators to eliminate the redundancy and duplication of data in-network movement of multi-query. Experimental results show that the exploiting of shared selection operators in big data multi-query can provide promising query execution time. Therefore, Flink-MQO system can potentially be used in the stream processing to improve the performance of the real-time applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.