CASMS: Combining clustering with attention semantic model for identifying security bug reports

Xiaoxue Ma,Yishu Li,Zhen Yang,Jacky Keung,Xiao Yu,Hao Zhang

doi:10.1016/j.infsof.2022.106906

Abstract

Inappropriate public disclosure of security bug reports (SBRs) is likely to attract malicious attackers to invade software systems; hence being able to detect SBRs has become increasingly important for software maintenance. Due to the class imbalance problem that the number of non-security bug reports (NSBRs) exceeds the number of SBRs, insufficient training information, and weak performance robustness, the existing techniques for identifying SBRs are still less than desirable. This prompted us to overcome the challenges of the most advanced SBR detection methods. In this work, we propose the CASMS approach to efficiently alleviate the imbalance problem and predict bug reports. CASMS first converts bug reports into weighted word embeddings based on t f − i d f and w o r d 2 v e c techniques. Unlike the previous studies selecting the NSBRs that are the most dissimilar to SBRs, CASMS then automatically finds a certain number of diverse NSBRs via the Elbow method and k -means clustering algorithm. Finally, the selected NSBRs and all SBRs train an effective Attention CNN–BLSTM model to extract contextual and sequential information. The experimental results have shown that CASMS is superior to the three baselines (i.e., FARSEC, SMOTUNED, and LTRWES) in assessing the overall performance ( g -measure) and correctly identifying SBRs ( recall ), with improvements of 4.09%–24.26% and 10.33%–36.24%, respectively. The best results are easily obtained under the limited ratio ranges of the two-class training set (1:1 to 3:1), with around 20 experiments for each project. By evaluating the robustness of CASMS via the standard deviation indicator, CASMS is more stable than LTRWES. Overall, CASMS can alleviate the data imbalance problem and extract more semantic information to improve performance and robustness. Therefore, CASMS is recommended as a practical approach for identifying SBRs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CASMS: Combining clustering with attention semantic model for identifying security bug reports

Abstract

Talk to us

Similar Papers

More From: Information and Software Technology

Lead the way for us

Journal: Information and Software Technology	Publication Date: Jul 1, 2022
Citations: 15

Similar Papers

Identifying security bug reports via text mining: An industrial case study
Michael Gegick ... Pete Rotella
-
Michael Gegick, et. al.Michael Gegick ... Pete Rotella
01 May 2010
01 May 2010

SAIS: Self-Adaptive Identification of Security Bug Reports
Shaikh Mostafa ... Xiaoyin Wang
IEEE Transactions on Dependable and Secure Computing | VOL. -
Shaikh Mostafa, et. al.Shaikh Mostafa ... Xiaoyin Wang
01 Jan 2019
IEEE Transactions on Dependable and Secure Computing | VOL. -

CVE-assisted large-scale security bug report dataset construction method
Xiaoxue Wu ... Dejun Mu
The Journal of Systems & Software | VOL. 160
Xiaoxue Wu, et. al.Xiaoxue Wu ... Dejun Mu
02 Nov 2019
The Journal of Systems & Software | VOL. 160

Text Filtering and Ranking for Security Bug Report Prediction
Fayola Peters ... Yijun Yu
IEEE Transactions on Software Engineering | VOL. 45
Fayola Peters, et. al.Fayola Peters ... Yijun Yu
01 Jun 2019
IEEE Transactions on Software Engineering | VOL. 45

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CASMS: Combining clustering with attention semantic model for identifying security bug reports

Abstract

Talk to us

Similar Papers

More From: Information and Software Technology