Software requirement specific entity extraction using transformer models

Garima Malik,Mucahit Cevik,Devang Parikh,Savas Yildirim,Ayse Basar,Swayami Bera

doi:10.21428/594757db.9e433d7c

Abstract

Software Requirement Specifications (SRS) documents provide the description of requirements and expectations attributed to software products. The structured text present in the SRS documents serves as a guide for developers in defining various functions in the process of software development. Software specific entity extraction is an important pre-processing step for various Natural Language Processing (NLP) tasks in the requirement engineering domain such as entity-centric search systems, SRS document summmarization, requirement classification, and requirement quality management. Recent advances in transformer-based models have significantly contributed to NLP and information retrieval problems, and achieved state-of-the-art performance for domain specific entity extraction tasks. In this study, we employ the transformer models including BERT, RoBERTa and ALBERT for software specific entity extraction. For this purpose, we annotate three requirement datasets, namely, DOORS, SRE, and RQA with varied sets of software specific entities. Our numerical study shows that transformer models are able to outperform the traditional approaches such as ML-CRF, and we find that BERT variants improve the F1-scores by 4% and 5% on the DOORS and SRE datasets, respectively. We conduct entity level error analysis to examine the partial and exact matching of entities and respective boundaries. Lastly, we experiment with few-shot learning to create sample efficient NER systems with template-based BART model

Full Text