Abstract

Abstract Context Feature location techniques are still not highly accurate despite advances in the field. Objective This paper aims at investigating the impact of applying different tunings to paragraph vector to the feature location problem. It evaluates the influence of different artificial neural network (ANN) configurations for learning rate and negative sampling loss function in paragraph vectors training. Method The suggested weight configuration relies on the search for an adequate ANN learning rate and an adequate calibration of negative sampling skip-gram mode of the Doc2vec (DV) algorithm. A dataset with 633 feature descriptions, extracted from six open-source Java projects, organized within method granularity, is used for the empirical assessment. Results Our results suggest that feature location techniques benefit from the use of paragraph vector with systematic tuning. We show that an adequate update policy for ANN weights can increase feature location accuracy. An adequate calibration for negative sampling also improved accuracy. We got it with no default values of negative sampling pointed by literature. Moreover, an ensemble of learning rate policies and the use of a tuned DV negative sampling option had overcome state-of-the-art approaches. Conclusions We show evidence of a relationship between hyper-parameter settings and accuracy gain. Modern paragraph vector approaches require adequate calibration to produce better results, and we have improved the accuracy of feature location process with proper tuning.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.