Comparative Analysis of Machine Learning Techniques for Splitting Identifiers within Source Code

Abeer Abdulsalam,Nazre Abdul Rashid

doi:10.14704/web/v17i2/web17066

Abstract

Feature location is the process of extracting identifiers within source code. In software engineering, it is a usual procedure to upgrade software by adding new features. In order to facilitate this process for the developers, feature location has been proposed to extract the significant components within the source code which are the identifiers. One of the challenging issues that faces the feature location task is handling multi-word identifiers where developers may use different type of separations among the words. Different research studies have used various types of techniques. However, recent studies have showed interest in Machine Learning Techniques (MLTs) due to their substantial performance. With the diversity MLTs, there is a vital demand to identify the most accurate one in terms of splitting the identifiers correctly. Therefore, this study aims to provide a comparative analysis of different MLTs including Naïve Bayes, Support Vector Machine and J48. The dataset used in the experiment is a benchmark data that contains vast amount of source codes along with numerous identifiers. Results showed that the best accuracy has been achieved by using the J48 classifier where the f-measure was 66%.

Full Text