Natural Language Processing Application on Commit Messages: A Case Study on HEP Software

Yue Yang,Elisabetta Ronchieri,Marco Canaparo

doi:10.3390/app122110773

Yue Yang, Elisabetta Ronchieri + Show 1 more

Open Access

https://doi.org/10.3390/app122110773

Copy DOI

Abstract

Version Control and Source Code Management Systems, such as GitHub, contain a large amount of unstructured historical information of software projects. Recent studies have introduced Natural Language Processing (NLP) to help software engineers retrieve information from a very large collection of unstructured data. In this study, we have extended our previous study by increasing our datasets and machine learning and clustering techniques. We have followed a complex methodology made up of various steps. Starting from the raw commit messages we have employed NLP techniques to build a structured database. We have extracted their main features and used them as input of different clustering algorithms. Once each entry was labelled, we applied supervised machine learning techniques to build a prediction and classification model. We have developed a machine learning-based model to automatically classify commit messages of a software project. Our model exploits a ground-truth dataset that includes commit messages obtained from various GitHub projects belonging to the High Energy Physics context. The contribution of this paper is two-fold: it proposes a ground-truth database and it provides a machine learning prediction model that automatically identifies the more change-prone areas of code. Our model has obtained a very high average accuracy (0.9590), precision (0.9448), recall (0.9382), and F1-score (0.9360).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Oct 24, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Natural Language Processing Application on Commit Messages: A Case Study on HEP Software

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Using Natural Language Processing to Extract Information from Unstructured code-change version control data: lessons learned
Elisabetta Ronchieri ... Yue Yang
-
Elisabetta Ronchieri, et. al.Elisabetta Ronchieri ... Yue Yang
22 Oct 2021
22 Oct 2021

BDCI: behavioral driven conflict identification
Fabrizio Pastore ... Leonardo Mariani
-
Fabrizio Pastore, et. al.Fabrizio Pastore ... Leonardo Mariani
21 Aug 2017
21 Aug 2017

Prediction of oil and gas pipeline failures through machine learning approaches: A systematic review
Abdulnaser M Al-Sabaeei ... Ajayshankar Jagadeesh
Energy Reports | VOL. 10
Abdulnaser M Al-Sabaeei, et. al.Abdulnaser M Al-Sabaeei ... Ajayshankar Jagadeesh
16 Aug 2023
Energy Reports | VOL. 10

Resume Classification System using Natural Language Processing and Machine Learning Techniques
Irfan Ali ... Javed Ahmed
Mehran University Research Journal of Engineering and Technology | VOL. 41
Irfan Ali, et. al.Irfan Ali ... Javed Ahmed
01 Jan 2021
Mehran University Research Journal of Engineering and Technology | VOL. 41

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Natural Language Processing Application on Commit Messages: A Case Study on HEP Software

Abstract

Talk to us

Similar Papers

More From: Applied Sciences