Bugram: bug detection with n-gram language models

Song Wang,Devin Chollak,Dana Movshovitz-Attias,Lin Tan

doi:10.1145/2970276.2970341

Abstract

To improve software reliability, many rule-based techniques have been proposed to infer programming rules and detect violations of these rules as bugs. These rule-based approaches often rely on the highly frequent appearances of certain patterns in a project to infer rules. It is known that if a pattern does not appear frequently enough, rules are not learned, thus missing many bugs. In this paper, we propose a new approach—Bugram—that leverages n-gram language models instead of rules to detect bugs. Bugram models program tokens sequentially, using the n-gram language model. Token sequences from the program are then assessed according to their probability in the learned model, and low probability sequences are marked as potential bugs. The assumption is that low probability token sequences in a program are unusual, which may indicate bugs, bad practices, or unusual/special uses of code of which developers may want to be aware. We evaluate Bugram in two ways. First, we apply Bugram on the latest versions of 16 open source Java projects. Results show that Bugram detects 59 bugs, 42 of which are manually verified as correct, 25 of which are true bugs and 17 are code snippets that should be refactored. Among the 25 true bugs, 23 cannot be detected by PR-Miner. We have reported these bugs to developers, 7 of which have already been confirmed by developers (4 of them have already been fixed), while the rest await confirmation. Second, we further compare Bugram with three additional graph- and rule-based bug detection tools, i.e., JADET, Tikanga, and GrouMiner. We apply Bugram on 14 Java projects evaluated in these three studies. Bugram detects 21 true bugs, at least 10 of which cannot be detected by these three tools. Our results suggest that Bugram is complementary to existing rule-based bug detection approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bugram: bug detection with n-gram language models

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Joint unsupervised adaptation of n-gram and RNN language models via LDA-based hybrid mixture modeling
Ryo Masumura ... Hirokazu Masataki
-
Ryo Masumura, et. al.Ryo Masumura ... Hirokazu Masataki
01 Dec 2017
01 Dec 2017

An Empirical Comparison Between N-gram and Syntactic Language Models for Word Ordering
Jiangming Liu ... Yue Zhang
-
Jiangming Liu, et. al.Jiangming Liu ... Yue Zhang
01 Jan 2015
01 Jan 2015

Federated Learning of N-Gram Language Models
Mingqing Chen ... Ananda Theertha Suresh
-
Mingqing Chen, et. al.Mingqing Chen ... Ananda Theertha Suresh
01 Jan 2019
01 Jan 2019

The Adaptation Schemes In PR-SVM Based Language Recognition
Bing Xu ... Yan Song
-
Bing Xu, et. al.Bing Xu ... Yan Song
01 Dec 2008
01 Dec 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bugram: bug detection with n-gram language models

Abstract

Talk to us

Similar Papers