Enhancing code summarization with action word prediction

Huiqun Yu,Mingchen Li,Guisheng Fan,Ziyi Zhou,Zijie Huang

doi:10.1016/j.neucom.2023.126777

Abstract

Code summarization refers to automatically generating concise description in natural language from a code snippet. Good code summaries could effectively facilitate program comprehension and software maintenance. In recent years, various learning-based code summarization techniques have achieved impressive performance. Most of these models treat code summarization as an end-to-end model and directly generate the summaries, which ignores the fact that action words are crucial to code summaries. An essential characteristic of code summaries is the concentration of action word distribution. For instance, in the Funcom dataset, the top forty most-common action words account for 72% of all samples. To incorporate this valuable prior domain knowledge into code summarization models, we develop a method for assisting code summarization through an additional action word prediction module, where an action predictor is employed to predict the primary action in the code summary, which is then used as a prompt to enhance the performance of the summary generation model. Our approach can be conveniently integrated into the existing models. We evaluate our approach on two Java datasets and a C/C++ dataset. The results show that our approach can efficiently improve the performance of the code summarization models. Furthermore, our action word prediction module can enhance the performance of a large pre-trained language model by prompting it with the predicted action words. This work suggests that a precise action word prediction model can significantly improve the performance of code summarization through the proposed action word guidance mechanism.

Full Text