Abstract

In open-source software projects, during fixing software faults, developers sometimes also perform other types of non-fixing code changes such as functionality enhancement, code restructuring/improving, or documentation. They commit non-fixing changes together with the fixing ones in the same transaction. We call them mixed-purpose fixing commits (MFCs). We have conducted an empirical study on MFCs in several popular open-source projects. Our results showed that MFCs are about 11%-39% of total fixing commits. In 3%-41% of MFCs, developers performed other change types without indicating them in the commit logs. Our study also showed that mining software repositories (MSR) approaches that rely on the recovery of the history of fixed/buggy files are affected by the noisy data where non-fixing changes in MFCs are considered as fixing ones. The results of our study motivated us to develop Cardo, a tool to identify MFCs and filter non-fixing changed files in the change sets of the fixing commits. It uses natural language processing to analyze the sentences in commit logs and program analysis to cluster the changes in the change sets to determine if a changed file is for non-fixing. Our empirical evaluation on several open-source projects showed that Cardo achieves on average 93% precision, and existing MSR approaches can be relatively improved up to 32% with data filtered by Cardo.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call