On the effectiveness of developer features in code smell prioritization: A replication study

Huiqun Yu,Zijie Huang,Ziyi Zhou,Mingchen Li,Guisheng Fan,Zhiqing Shao

doi:10.1016/j.jss.2024.111968

Abstract

Code smells are sub-optimal design and implementation choices that hinder software maintainability. Although significant progress has been achieved in code smell detection, numerous results are perceived as trivial by developers. In response, a code smell prioritization approach capturing developer features has been proposed by a prior study (MSR’20), and it outperformed a code metric baseline (KBS). The conclusion was validated on a dataset collected from original developers, which includes their comments on code smell priority. However, the low presence of developer aspects in the comments is inconsistent with the performance improvement after involving such features. To explain the inconsistency, we replicate the two studies by exploiting different feature selection methods and a model explanation technique called SHAP. Our major findings are: (i) Correlation-based Feature Selection should not be used as a default method since it could harm Krippendoff’s Alpha by up to 72%, (ii) if better feature selection is applied, pure code metrics from KBS outperform the MSR features in 3 smells by up to 45% in Alpha, and (iii) the behavior of code metrics based models have more agreement with developers’ comments. We suggest exploiting different feature selection methods and using code metrics to prioritize the 3 change-insensitive smells.

Full Text