A catalog of metrics at source code level for vulnerability prediction: A systematic mapping study

Zadia Codabux,Kazi Zakia Sultana,Md Naseef‐Ur‐Rahman Chowdhury

doi:10.1002/smr.2639

Abstract

AbstractIndustry practitioners assess software from a security perspective to reduce the risks of deploying vulnerable software. Besides following security best practice guidelines during the software development life cycle, predicting vulnerability before roll‐out is crucial. Software metrics are popular inputs for vulnerability prediction models. The objective of this study is to provide a comprehensive review of the source code‐level security metrics presented in the literature. Our systematic mapping study started with 1451 studies obtained by searching the four digital libraries from ACM, IEEE, ScienceDirect, and Springer. After applying our inclusion/exclusion criteria as well as the snowballing technique, we narrowed down 28 studies for an in‐depth study to answer four research questions pertaining to our goal. We extracted a total of 685 code‐level metrics. For each study, we identified the empirical methods, quality measures, types of vulnerabilities of the prediction models, and shortcomings of the work. We found that standard machine learning models, such as decision trees, regressions, and random forests, are most frequently used for vulnerability prediction. The most common quality measures are precision, recall, accuracy, and ‐measure. Based on our findings, we conclude that the list of software metrics for measuring code‐level security is not universal or generic yet. Nonetheless, the results of our study can be used as a starting point for future studies aiming at improving existing security prediction models and a catalog of metrics for vulnerability prediction for software practitioners.

Full Text