Abstract

As trends continue to move toward the introduction of intelligent methods to automate software engineering processes, security requirements classification is rapidly turning into a highly potent field for the software engineering community. There are several models for classifying security requirements proposed in the literature. However, their adoption and use is constrained by the absence of substantial datasets to allow for the replication and generalization of studies, using more advanced machine-learning techniques. Furthermore, most researchers in this area, consider Maintainability as purely a non-functional requirement with no relation to security. This has been identified to be a major source of security concerns. The main objective of this study is to propose a software requirements classification approach that considers maintainability as a security requirement. This seeks to ensure that maintenance efforts don't lead to new software vulnerabilities that were previously not present during deployment. A mixed research methodology is adopted as qualitative data is collected from students’ project documentation, labelled, and transformed into quantitative form during analysis. As a culmination of this process, a validated original publicly accessible, labelled software requirements dataset of student software project requirements (DOSSPRE) is obtained and presented to support the approach. It contains 1317 software requirements, including security requirements, functional requirements, and other non-functional requirements. Two versions of the dataset are presented: one for binary classification of security requirements vs. non-security requirements and the other for multi-class classification tasks with various more granular security requirements vs. non-security requirements. In both instances, well-known machine learning algorithms are used to verify the dataset. Support Vector Machine (SVM) and Logistic Regression were the top performers in multi-class classification with an average Accuracy of 86% in both cases. Multinomial Nave Bayes topped the other machine learning techniques in binary classification with 91% Precision, 69% Recall, 78% F1-Score, and Accuracy of 86%. The dataset is accessible on this link https://data.mendeley.com/datasets/23xtbvk6yp/1

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call