A BERT-based transfer learning approach to text classification on software requirements specifications

Derya Kici,Mucahit Cevik,Garima Malik,Devang Parikh,Ayşe Başar

doi:10.21428/594757db.a4880a62

Abstract

In a software development life cycle, software requirements specifications (SRS) writ- ten in an incomprehensible language might hinder the success of the project in later stages. In such cases, the subjective and ambiguous nature of the natural languages can be considered as a cause for the failure of the final product. Redundancy and/or controversial information in the SRS documents might also result in additional costs and time loss, reducing the overall efficiency of the project. With the recent advances in machine learning, there is an increased effort to develop automated solutions for a seamless SRS design. However, most vanilla machine learning approaches ignore the semantics of the software artifacts or integrating domain-specific knowledge into the underlying natural language processing tasks, and therefore tend to generate inaccurate results. With such concerns in mind, we consider a transfer learning approach in our study, which is based on an existing pre-trained language model called DistilBERT. We specifically examine the DistilBERTâs ability in multi-class text classification on SRS data using various fine- tuning methods, and compare its performance with other deep learning methods such as LSTM and BiLSTM. We test the performance of these models using two datasets: DOORS Next Generation dataset and PROMISE-NFR dataset. Our numerical results demonstrate that DistilBERT perform well for various text classification tasks over the SRS datasets and shows significant promise to be used for automating the software development processes.

Full Text