An analysis of correctness for API recommendation: are the unmatched results useless?

Xianglong Kong,Bixin Li,Weina Han,Li Liao

doi:10.1007/s11432-019-2929-9

Abstract

API recommendation is a promising approach which is widely used during software development. However, the evaluation of API recommendation is not explored with sufficient rigor. The current evaluation of API recommendation mainly focuses on correctness, the measurement is conducted by matching recommended results with ground-truth results. In most cases, there is only one set of ground-truth APIs for each recommendation attempt, but the object code can be implemented in dozens of ways. The neglect of code diversity results in a possible defect in the evaluation. To address the problem, we invite 15 developers to analyze the unmatched results in a user study. The online evaluation confirms that some unmatched APIs can also benefit to programming due to the functional correlation with ground-truth APIs. Then we measure the API functional correlation based on the relationships extracted from API knowledge graph, API method name, and API documentation. Furthermore, we propose an approach to improve the measurement of correctness based on API functional correlation. Our measurement is evaluated on a dataset of 6141 requirements and historical code fragments from related commits. The results show that 28.2% of unmatched APIs can contribute to correctness in our experiments.

Full Text