Abstract

The <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">AAA</i> pattern (i.e., <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Arrange-Act-Assert</i> ) is a common and natural layout to create a test case. Following this pattern in test cases may benefit comprehension, debugging, and maintenance. The <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">AAA</i> structure of real-life test cases, however, may not be clear due to their high complexity. Manually labeling <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">AAA</i> statements in test cases is tedious. Thus, we envision that an automated approach for labeling <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">AAA</i> statements in existing test cases could benefit new developers and projects that practice collective code ownership and test-driven development. This paper contributes an automatic approach based on machine learning models. The “secret sauce” of this approach is a set of three learning features that are based on the semantic, syntax, and context information in test cases, derived from the manual tagging process. Thus, our approach mimics how developers may manually tag the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">AAA</i> pattern of a test case. We assess the precision, recall, and F-1 score of our approach based on 449 test cases, containing about 16,612 statements, across 4 Apache open source projects. To achieve the best performance in our approach, we explore the usage of six machine learning models; the contribution of the SMOTE data balancing technique; the comparison of the three learning features; and the comparison of five different methods for calculating the semantic feature. The results show our approach is able to identify <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Arrangement</i> , <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Action</i> , and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Assertion</i> statements with a precision upwards of 92%, and recall up to 74%. We also summarize some experience based on our experiments—regarding the choice of machine learning models, data balancing algorithm, and feature engineering methods—which could potentially provide some reference to related future research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call