Retrieval-based neural source code summarization

Jian Zhang,Hongyu Zhang,Xudong Liu,Hailong Sun,Xu Wang

doi:10.1145/3377811.3380383

Abstract

Source code summarization aims to automatically generate concise summaries of source code in natural language texts, in order to help developers better understand and maintain source code. Traditional work generates a source code summary by utilizing information retrieval techniques, which select terms from original source code or adapt summaries of similar code snippets. Recent studies adopt Neural Machine Translation techniques and generate summaries from code snippets using encoder-decoder neural networks. The neural-based approaches prefer the high-frequency words in the corpus and have trouble with the low-frequency ones. In this paper, we propose a retrieval-based neural source code summarization approach where we enhance the neural model with the most similar code snippets retrieved from the training set. Our approach can take advantages of both neural and retrieval-based techniques. Specifically, we first train an attentional encoder-decoder model based on the code snippets and the summaries in the training set; Second, given one input code snippet for testing, we retrieve its two most similar code snippets in the training set from the aspects of syntax and semantics, respectively; Third, we encode the input and two retrieved code snippets, and predict the summary by fusing them during decoding. We conduct extensive experiments to evaluate our approach and the experimental results show that our proposed approach can improve the state-of-the-art methods.

Full Text