An Empirical Assessment of Machine Learning Approaches for Triaging Reports of a Java Static Analysis Tool

Ugur Koc,Adam A Porter,Shiyi Wei,Jeffrey S Foster,Marine Carpuat

doi:10.1109/icst.2019.00036

Abstract

Despite their ability to detect critical bugs in software, developers consider high false positive rates to be a key barrier to using static analysis tools in practice. To improve the usability of these tools, researchers have recently begun to apply machine learning techniques to classify and filter false positive analysis reports. Although initial results have been promising, the long-term potential and best practices for this line of research are unclear due to the lack of detailed, large-scale empirical evaluation. To partially address this knowledge gap, we present a comparative empirical study of four machine learning techniques, namely hand-engineered features, bag of words, recurrent neural networks, and graph neural networks, for classifying false positives, using multiple ground-truth program sets. We also introduce and evaluate new data preparation routines for recurrent neural networks and node representations for graph neural networks, and show that these routines can have a substantial positive impact on classification accuracy. Overall, our results suggest that recurrent neural networks (which learn over a program's source code) outperform the other subject techniques, although interesting tradeoffs are present among all techniques. Our observations provide insight into the future research needed to speed the adoption of machine learning approaches in practice.

Full Text