Abstract

Code summarization aims to generate short functional descriptions for source code to facilitate code comprehension. While Information Retrieval (IR) approaches that leverage similar code snippets and corresponding summaries have led the early research, Deep Learning (DL) approaches that use neural models to capture statistical properties between code and summaries are now mainstream. Although some preliminary studies suggest that IR approaches are more effective in some cases, it is currently unclear how effective the existing approaches can be in general, where and why IR/DL approaches perform better, and whether the integration of IR and DL can achieve better performance. Consequently, there is an urgent need for a comprehensive study of the IR and DL code summarization approaches to provide guidance for future development in this area. This article presents the first large-scale empirical study of 18 IR, DL, and hybrid code summarization approaches on five benchmark datasets. We extensively compare different types of approaches using automatic metrics, we conduct quantitative and qualitative analyses of where and why IR and DL approaches perform better, respectively, and we also study hybrid approaches for assessing the effectiveness of integrating IR and DL. The study shows that the performance of IR approaches should not be underestimated, that while DL models perform better in predicting tokens from method signatures and capturing structural similarities in code, simple IR approaches tend to perform better in the presence of code with high similarity or long reference summaries, and that existing hybrid approaches do not perform as well as individual approaches in their respective areas of strength. Based on our findings, we discuss future research directions for better code summarization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call