Detecting Integer Overflow Errors in Java Source Code via Machine Learning

Yu Luo,Weifeng Xu,Dianxiang Xu

doi:10.1109/ictai52525.2021.00115

Abstract

Integer overflow is a common cause of software failure and security vulnerability. Existing approaches to detecting integer overflow errors rely on traditional static code analysis and dynamic testing. This paper presents a novel machine learning-based approach that predicts integer overflow errors by treating source code as text. It exploits text classifiers to determine whether each method in a given Java program contains an integer overflow error. As the training data is essential, we have constructed a comprehensive dataset to accounts for (a) integer overflow errors of all integer types and operations in Java (i.e., positive samples); (b) various programming techniques for preventing integer overflow errors (i.e., negative samples); and (c) malicious scenarios that may mislead text classifiers (i.e., adversarial samples). We have trained three classifiers, BERT, fastText, and NBSVM, that represent different text embedding techniques. BERT, as a representative deep-learning transformer, has achieved the highest performance scores and remained robust even when tested with the adversarial samples.

Full Text