Vulnerable Code Detection Using Software Metrics and Machine Learning

Nadia Medeiros,Naghmeh Ivaki,Marco Vieira,Pedro Costa

doi:10.1109/access.2020.3041181

Abstract

Software metrics are widely-used indicators of software quality and several studies have shown that such metrics can be used to estimate the presence of vulnerabilities in the code. In this paper, we present a comprehensive experiment to study how effective software metrics can be to distinguish the vulnerable code units from the non-vulnerable ones. To this end, we use several machine learning algorithms (Random Forest, Extreme Boosting, Decision Tree, SVM Linear, and SVM Radial) to extract vulnerability-related knowledge from software metrics collected from the source code of several representative software projects developed in C/C++ (Mozilla Firefox, Linux Kernel, Apache HTTPd, Xen, and Glibc). We consider different combinations of software metrics and diverse application scenarios with different security concerns (e.g., highly critical or non-critical systems). This experiment contributes to understanding whether software metrics can effectively be used to distinguish vulnerable code units in different application scenarios, and how can machine learning algorithms help in this regard. The main observation is that using machine learning algorithms on top of software metrics helps to indicate vulnerable code units with a relatively high level of confidence for security-critical software systems (where the focus is on detecting the maximum number of vulnerabilities, even if false positives are reported), but they are not helpful for low-critical or non-critical systems due to the high number of false positives (that bring an additional development cost frequently not affordable).

Highlights

Several research studies show that software defects/vulnerabilities (e.g., Buffer overflow, SQL injection) are a central and critical source of security breaches [1]–[3] in computer systems
This study considers several commonly used machine learning (ML) algorithms (Random Forest, Extreme Boosting, Decision Tree, Support Vector Machine (SVM) Linear and SVM Radial) that are applied on software metrics of all types (e.g., Cyclomatic Complexity, Lines of Code, and Coupling Between Objects) collected from the source code of several widely used and representative software projects developed in C/C++ (Mozilla Firefox, Linux Kernel, Apache HTTPd, Xen and Glibc) at different levels
WORK This paper presented a comprehensive study on the use of software metrics and machine learning algorithms for the detection/prediction of vulnerable code

Summary

Introduction

Several research studies show that software defects/vulnerabilities (e.g., Buffer overflow, SQL injection) are a central and critical source of security breaches [1]–[3] in computer systems. Organizations, and critical infrastructures are backed by software systems executing critical operations and transactions, providing services and dealing with huge amounts of sensitive data for supporting effective decisions and constant business/system adaptation. This tremendously increased concerns regarding security, driving researchers and businesses to come up with tools, techniques, standards, and regulations to help developers to ensure security in software systems [13], [14]. Sensei [30] is another example that tries to enforce secure coding guidelines in the integrated development environment It is still very difficult for developers, if not impossible, to build software without vulnerabilities. This has led to many works trying to mitigate the damage that such vulnera-

Objectives

Methods

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 87	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Vulnerable Code Detection Using Software Metrics and Machine Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Application of machine learning algorithms for code smell prediction using object-oriented software metrics
Mansi Agnihotri ... Anuradha Chug
Journal of Statistics and Management Systems | VOL. 23
Mansi Agnihotri, et. al.Mansi Agnihotri ... Anuradha Chug
29 Jul 2020
Journal of Statistics and Management Systems | VOL. 23

Classification of Television Programs Based on Public Opinion in Social Media Using Random Forest and Decision Tree
Dede Kurniadi ... Indri Tri Julianto
-
Dede Kurniadi, et. al.Dede Kurniadi ... Indri Tri Julianto
16 Feb 2023
16 Feb 2023

6P Urine spectroscopy coupled with artificial intelligence: Proof of concept for a new diagnostic tool to detect gynaecological cancers
F Vigo ... T Kavvadias
Annals of Oncology | VOL. 33
F Vigo, et. al.F Vigo ... T Kavvadias
01 Jun 2022
Annals of Oncology | VOL. 33

Prediction of student exam performance using data mining classification algorithms
Dalia Khairy ... Nouf Alharbi
Education and Information Technologies | VOL. -
Dalia Khairy, et. al.Dalia Khairy ... Nouf Alharbi
03 May 2024
Education and Information Technologies | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Vulnerable Code Detection Using Software Metrics and Machine Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access