Augmenting commit classification by using fine-grained source code changes and a pre-trained deep neural language model

Lobna Ghadhab,Mohamed Wiem Mkaouer,Montassar Ben Messaoud,Ilyes Jenhani

doi:10.1016/j.infsof.2021.106566

Abstract

Analyzing software maintenance activities is very helpful in ensuring cost-effective evolution and development activities. The categorization of commits into maintenance tasks supports practitioners in making decisions about resource allocation and managing technical debt. In this paper, we propose to use a pre-trained language neural model, namely BERT (Bidirectional Encoder Representations from Transformers) for the classification of commits into three categories of maintenance tasks — corrective, perfective and adaptive. The proposed commit classification approach will help the classifier better understand the context of each word in the commit message. We built a balanced dataset of 1793 labeled commits that we collected from publicly available datasets. We used several popular code change distillers to extract fine-grained code changes that we have incorporated into our dataset as additional features to BERT’s word representation features. In our study, a deep neural network (DNN) classifier has been used as an additional layer to fine-tune the BERT model on the task of commit classification. Several models have been evaluated to come up with a deep analysis of the impact of code changes on the classification performance of each commit category. Experimental results have shown that the DNN model trained on BERT’s word representations and Fixminer code changes ([email protected]+Fix_cc) provided the best performance and achieved 79.66% accuracy and a macro-average f1 score of 0.8. Comparison with the state-of-the-art model that combines keywords and code changes ([email protected]+CD_cc) has shown that our model achieved approximately 8% improvement in accuracy. Results have also shown that a DNN model using only BERT’s word representation features achieved an improvement of 5% in accuracy compared to the [email protected]+CD_cc model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Augmenting commit classification by using fine-grained source code changes and a pre-trained deep neural language model

Abstract

Talk to us

Similar Papers

More From: Information and Software Technology

Lead the way for us

Journal: Information and Software Technology	Publication Date: Mar 10, 2021
Citations: 27

Similar Papers

Fine-grained code changes and bugs: Improving bug prediction

-

01 Jan 2012
01 Jan 2012

Bert model fine-tuning for text classification in knee OA radiology reports
L Chen ... V Pedoia
Osteoarthritis and Cartilage | VOL. 28
L Chen, et. al.L Chen ... V Pedoia
01 Apr 2020
Osteoarthritis and Cartilage | VOL. 28

Bidirectional encoders to state-of-the-art: a review of BERT and its transformative impact on natural language processing
Rajesh Gupta
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3
Rajesh GuptaRajesh Gupta
02 Mar 2024
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3

Classification of Fire Related Tweets on Twitter Using Bidirectional Encoder Representations from Transformers (BERT)
Jairus Mingua ... Dionis Padilla
-
Jairus Mingua, et. al.Jairus Mingua ... Dionis Padilla
28 Nov 2021
28 Nov 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Augmenting commit classification by using fine-grained source code changes and a pre-trained deep neural language model

Abstract

Talk to us

Similar Papers

More From: Information and Software Technology