Abstract

An important issue faced during software development is to identify defects and the properties of those defects, if found, in a given source file. Determining defectiveness of source code assumes significance due to its implications on software development and maintenance cost. We present a novel system to estimate the presence of defects in source code and detect attributes of the possible defects, such as the severity of defects. The salient elements of our system are: (i) a dataset of newly introduced source code metrics, called PRO gramming CON struct (PROCON) metrics, and (ii) a novel M achine- L earning (ML)-based system, called D efect E stimator for S ource Co de (DESCo), that makes use of PROCON dataset for predicting defectiveness in a given scenario. The dataset was created by processing 30,400+ source files written in four popular programming languages, viz., C, C++, Java, and Python. The results of our experiments show that DESCo system outperforms one of the state-of-the-art methods with an improvement of 44.9%. To verify the correctness of our system, we compared the performance of 12 different ML algorithms with 50+ different combinations of their key parameters. Our system achieves the best results with SVM technique with a mean accuracy measure of 80.8%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call