Multi-type requirements traceability prediction by code data augmentation and fine-tuning MS-CodeBERT

Ali Majidzadeh,Mehrdad Ashtiani,Morteza Zakeri-Nasrabadi

doi:10.1016/j.csi.2024.103850

Abstract

Requirement traceability is a crucial quality factor that highly impacts the software evolution process and maintenance costs. Automated traceability links recovery techniques are required for a reliable and low-cost software development life cycle. Pre-trained language models have shown promising results on many natural language tasks. However, using such pre-trained models for requirement traceability needs large and quality traceability datasets and accurate fine-tuning mechanisms. This paper proposes code augmentation and fine-tuning techniques to prepare the MS-CodeBERT pre-trained language model for various types of requirements traceability prediction including documentation-to-method, issue-to-commit, and issue-to-method links. Three program transformation operations, namely, Rename Variable, Swap Operands, and Swap Statements are designed to generate new quality samples increasing the sample diversity of the traceability datasets. A 2-stage and 3-stage fine-tuning mechanism is proposed to fine-tune the language model for the three types of requirement traceability prediction on provided datasets. Experiments on 14 Java projects demonstrate a 6.2% to 8.5% improvement in the precision, 2.5% to 5.2% improvement in the recall, and 3.8% to 7.3% improvement in the F1 score of the traceability prediction models compared to the best results from the state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-type requirements traceability prediction by code data augmentation and fine-tuning MS-CodeBERT

Abstract

Talk to us

Similar Papers

More From: Computer Standards & Interfaces

Lead the way for us

Similar Papers

Neural Transfer Learning For Vietnamese Sentiment Analysis Using Pre-trained Contextual Language Models
An Pha Le ... Tran Vu Pham
-
An Pha Le, et. al.An Pha Le ... Tran Vu Pham
16 Dec 2021
16 Dec 2021

Towards an Enhanced Understanding of Bias in Pre-trained Neural Language Models: A Survey with Special Emphasis on Affective Bias
Anoop K ... Lajish V L
-
Anoop K, et. al. Anoop K ... Lajish V L
01 Jan 2021
01 Jan 2021

On the Power of Pre-Trained Text Representations
Yu Meng ... Jiawei Han
-
Yu Meng, et. al.Yu Meng ... Jiawei Han
14 Aug 2021
14 Aug 2021

A Multi-tasking and Multi-stage Chinese Minority Pre-trained Language Model
Bin Li ... Bin Sun
-
Bin Li, et. al.Bin Li ... Bin Sun
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-type requirements traceability prediction by code data augmentation and fine-tuning MS-CodeBERT

Abstract

Talk to us

Similar Papers

More From: Computer Standards &amp; Interfaces

More From: Computer Standards & Interfaces