Improving feature location accuracy via paragraph vector tuning

Allysson Costa E Silva,Marcelo De Almeida Maia

doi:10.1016/j.infsof.2019.106177

Abstract

Abstract Context Feature location techniques are still not highly accurate despite advances in the field. Objective This paper aims at investigating the impact of applying different tunings to paragraph vector to the feature location problem. It evaluates the influence of different artificial neural network (ANN) configurations for learning rate and negative sampling loss function in paragraph vectors training. Method The suggested weight configuration relies on the search for an adequate ANN learning rate and an adequate calibration of negative sampling skip-gram mode of the Doc2vec (DV) algorithm. A dataset with 633 feature descriptions, extracted from six open-source Java projects, organized within method granularity, is used for the empirical assessment. Results Our results suggest that feature location techniques benefit from the use of paragraph vector with systematic tuning. We show that an adequate update policy for ANN weights can increase feature location accuracy. An adequate calibration for negative sampling also improved accuracy. We got it with no default values of negative sampling pointed by literature. Moreover, an ensemble of learning rate policies and the use of a tuned DV negative sampling option had overcome state-of-the-art approaches. Conclusions We show evidence of a relationship between hyper-parameter settings and accuracy gain. Modern paragraph vector approaches require adequate calibration to produce better results, and we have improved the accuracy of feature location process with proper tuning.

Full Text