Abstract Systems are developed as a solution to the problem space defined by their requirements. The requirements are acquired during the elicitation process. The creative nature of the elicitation process, proprietary nature of requirements, the need of extensive preprocessing and the diverse techniques for analysis restricts the development of a requirement dataset. There exists no standard method to create a requirement dataset. Thus, we devise a semi-formal method to create a multi-purpose requirement dataset that harnesses human knowledge in the system requirement specification documents (SyRSDs) to facilitate the deployment of modern computing algorithms. Our dataset has three forms. (1) ReqList, a list of requirements from 86 distinct systems with their document structure in pure text form. The 12701 requirements are ready to leverage natural language processing techniques and unsupervised machine learning techniques. (2) ReqNet, a large network of requirements consisting of 17375 nodes to deploy graph-theoretic algorithms for requirement engineering. ReqNet portrays small-world network characteristics with an average distance of 9.5619 links. (3) ReqSim, a dataset consisting of 10933 pairs of requirements annotated with their similarity scores. ReqSim enables sentence-level supervised learning tasks to exploit the semantics of requirements. The similarity scores are coherent with human knowledge. Our dataset is grounded by the tree structure of SyRSDs. We devise a method to extract a tree from the SyRSDs. The tree structure resonates with the hierarchical nature of the requirement allocation process. The ReqList, ReqNet and ReqSim dataset is available at https://github.com/ChandanKSahu/ReqList_ReqNet_ReqSim.
Read full abstract