Abstract
Software testing is an important stage of the software development life cycle. However, the test execution is time-consuming and costly. To accelerate the test execution, researchers have applied several methods to run the testing in parallel. One method of parallelizing the test execution is by using a GPU machine to distribute test case inputs among several threads running in parallel. However, when testing a program on a GPU, different test case inputs could have different control flow paths. This leads to branch divergence which may increase the execution time of testing. In addition, transferring data between host and device may increase the execution time of the test execution when data are allocated statically instead of dynamically. Also, different data sizes from different test case inputs may increase the number of memory transactions. To address these challenges, three studies were conducted. The first study investigated three programming models: CUDA Unified Memory, CUDA Non-Unified Memory, and OpenMP GPU Offloading to parallelize the test execution. This study utilized 11 benchmarks and parallelized their test suites by using these models. The evaluation was based on their performance in terms of execution time. The results showed that using CUDA Unified Memory to implement a test suite for GPU was the easiest and most productive programming model among the other two models. The second study discussed three challenges (control flow, memory access pattern, and input size of a program) when parallelizing test execution on two major computing platforms: multi-core CPU and GPU. It analyzed 14 programs from different domains to evaluate parallelizing test execution on the two platforms. The results showed that the cache size of a GPU and branch divergence significantly increase the execution time of many test suites of programs. The last study proposed a novel algorithm to minimize the branch divergence when testing an application on a GPU. Essentially, the algorithm arranges test inputs based on the warp size of a GPU. Test inputs that have similar control flow paths are grouped within the same warp to be executed in parallel. The validation and evaluation were conducted on six different domains of benchmarks (57 programs in total). The results showed that the algorithm speeds up the testing execution up to 3.8x and improves the warp execution efficiency up to 15x. Therefore, the algorithm reduces the branch divergence by minimizing the number of inactive threads per warp.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have