Using software metrics for predicting vulnerable classes in java and python based systems

Kazi Zakia Sultana,Vaibhav Anu,Tai-Yin Chong

doi:10.1080/19393555.2023.2240343

Abstract

ABSTRACT [Context:] Failure to predict vulnerability in the earlier stage of development can cause vulnerable code being written and deployed in the final software product. Vulnerability prediction using software metrics as features can support the discovery process by localizing vulnerable code. Existing studies have successfully employed metrics for vulnerability prediction for some platforms (C/C++ or Java projects). We propose that a comparative evaluation of how these metrics perform in projects of different languages can help the developers in deciding whether metrics-based prediction approach can be effective in their own project’s context. [Objective:] The purpose of this research is to analyze/compare the performance of software metrics in vulnerability-prediction for different programming language contexts (Java vs. Python). [Method:] We conducted experiments on vulnerabilities reported for Apache Tomcat (releases 6 and 7), Apache CXF, and two Python projects (Django and Keystone). We applied machine learning for predicting a particular type of code component (Java and Python classes) as vulnerable/non-vulnerable. [Results:] We found that metrics-based prediction can predict Java vulnerable classes with higher recall and precision than the Python vulnerable classes. [Conclusion:] This study at class-level will help developers to predict vulnerabilities at the class-level and assist in secure coding in object-oriented programming.

Full Text