Hybrid Representation to Locate Vulnerable Lines of Code

Mohammed Zagane,Mustapha Kamel Abdi,Mamdouh Alenezi

doi:10.4018/ijsi.292020

Abstract

Locating vulnerable lines of code in large software systems needs huge efforts from human experts. This explains the high costs in terms of budget and time needed to correct vulnerabilities. To minimize these costs, automatic solutions of vulnerabilities prediction have been proposed. Existing machine learning (ML)-based solutions face difficulties in predicting vulnerabilities in coarse granularity and in defining suitable code features that limit their effectiveness. To addressee these limitations, in the present work, the authors propose an improved ML-based approach using slice-based code representation and the technique of TF-IDF to automatically extract effective features. The obtained results showed that combining these two techniques with ML techniques allows building effective vulnerability prediction models (VPMs) that locate vulnerabilities in a finer granularity and with excellent performances (high precision (&gt;98%), low FNR (&lt;2%) and low FPR (&lt;3%) which outperforms software metrics and are equivalent to the best performing recent deep learning-based approaches.

Full Text