Abstract

Static code analysis in Continuous Integration (CI) environment can significantly improve the quality of a software system because it enables early detection of defects without any test executions or user interactions. However, being a conservative over-approximation of system behaviours, static analysis also produces a large number of false positive alarms, identification of which takes up valuable developer time. We present an automated classifier based on Convolutional Neural Networks (CNNs). We hypothesise that many false positive alarms can be classified by identifying specific lexical patterns in the parts of the code that raised the alarm: human engineers adopt a similar tactic. We train a CNN based classifier to learn and detect these lexical patterns, using a total of about 10K historical static analysis alarms generated by six static analysis checkers for over 27 million LOC, and their labels assigned by actual developers. The results of our empirical evaluation suggest that our classifier can be highly effective for identifying false positive alarms, with the average precision across all six checkers of 79.72%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call