Abstract

Adoption of distributed stream processing (DSP) systems such as Apache Flink in real-time big data processing is increasing. However, DSP programs are prone to be buggy, especially when one programmer neglects some DSP features (e.g., source data reordering), which motivates development of approaches for testing and verification. In this paper, we focus on the test data generation problem for DSP programs. Currently, there is a lack of an approach that generates test data for DSP programs with both high path coverage and covering different stream reordering situations. We present a novel solution, SPOT (i.e., Stream Processing Program Test), to achieve these two goals simultaneously. At first, SPOT generates a set of individual test data representing each path of one DSP program through symbolic execution. Then, SPOT composes these independent data into various time series data (a.k.a, stream) in diverse reordering. Finally, we can perform a test by feeding the DSP program with these streams continuously. To automatically support symbolic analysis, we also developed JPF-Flink, a JPF (i.e., Java Pathfinder) extension to coordinate the execution of Flink programs. We present four case studies to illustrate that: (1) SPOT can support symbolic analysis for the commonly used DSP operators; (2) test data generated by SPOT can more efficiently achieve high JDU (i.e., Joint Dataflow and UDF) path coverage than two recent DSP testing approaches; (3) test data generated by SPOT can more easily trigger software failure when comparing with those two DSP testing approaches; and (4) the data randomly generated by those two test techniques are highly skewed in terms of stream reordering, which is measured by the entropy metric. In comparison, it is even for test data from SPOT.

Highlights

  • Massive volumes of data have been generated rapidly by IoT devices, ecommerce websites, mobile applications, et cetera

  • In the third case study (Section C), we demonstrate that test data generated by SPOT is more efficient in triggering software failure compared with DiffStream and FlinkCheck

  • How JDU Path Coverage Can Be Achieved The second case study illustrates how SPOT behaviors in terms of coverage compared with property-based testing (PBT)-based approaches, among which DiffStream and FlinkCheck are the most representative solutions for testing distributed stream processing (DSP) programs

Read more

Summary

Introduction

Massive volumes of data have been generated rapidly by IoT devices, ecommerce websites, mobile applications, et cetera. Symbolic execution [14] is one of the most promising techniques used for generating test cases automatically with high coverage guarantees, working as the foundation of many popular testing tools: Java PathFinder [15] (JPF for short), CUTE [16], and jCUTE [17], KLEE [18], et cetera These solutions and tools are not directly appliable to DSP programs because of the large scale of their frameworks [13]. The former improves the random generator in PBT methods, leading to high coverage test suites The latter covers different stream reordering situations that existing symbolic execution-based approaches have not provided.

Overview
Motivating Example
DSP and Apache Flink
Reordering Metric
Solution Architecture
Symbolic Analysis Approach
DSP Model Classes Mocking Up
Maintaining Symbolic Expression
Implementing JPF-Flink
Modeling Stream
How Efficient in Triggering Failure
How Reordering Situations Can Be Covered
Related Work
Findings
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.