Abstract

Software requirements traceability links have been widely recognized as an essential means for effective system evolution when requirements change. However, creating and maintaining traceability links is not an easy task in practice, especially when faced time pressure. Therefore, an automatic and accurate traceability link recovery approach is needed. Existing methods usually recover the traceability links through information retrieval models [1], which calculate text similarity among software artifacts. Although the text-similarity plays an essential role in correlating software artifacts, we argue that the context of software artifacts also renders important clues for establishing the traceability links among software artifacts. For example, for each software use case, its includes/extends use cases can be seen as its context information, contributing to comprehensively profiling the use case. The collection of software artifacts can be modeled as a graph structure through a variety of explicit relationships. Description-Embodied Knowledge Representation Learning (DKRL) [2] is a widely accepted method, which can effectively capture the structural information of explicit relationship and description information of entities. By effectively and precisely embedding such a graph, the context information can be meaningfully represented, contributing to the identification of requirements traceability links. In this paper, we propose a Traceability Link Recovery-Knowledge Representation Learning (TLR-KRL) to recover requirements traceability links between use cases and code based on DKRL. This work has been accepted in The 32nd International Conference on Software Engineering& Knowledge Engineering. TLR-KRL can comprehensively characterize software artifacts by embedding both text information and structural relationships, an overview of which is shown in Fig. 1. Specifically, we follow a systematic process to extend the DKRL model, improving its negative sampling method and thus minimizing false-negative samples generated by the original DKRL model. In such a way, we are able to obtain more precise embedding of software artifacts. Such meaningful embeddings are then used to train traceability link classifiers by using supervised machine learning algorithms. All traceability link candidates obtained from the classifier will be further screened by using triple classification in order to retrieve more correct traceability links. To verify the effectiveness of our method, we have carried out experiments on four datasets, including eTour, EAnci, Clinic, and ITrust. The evaluation results on each of the datasets have all shown that our approach can outperform existing work.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.