Summarizing source code with Heterogeneous Syntax Graph and dual position

Juncai Guo,Jin Liu,Xiao Liu,Yao Wan,Li Li

doi:10.1016/j.ipm.2023.103415

Abstract

Code summarization attempts to summarize the semantics of source code by automatically producing brief natural-language descriptions. Most existing work proposes to learn from the Abstract Syntax Tree (AST) and plain text of source code for summary generation. However, little attention has been paid to the structural heterogeneity and layout features of source code. In this paper, we present a novel framework titled HetSum to address these issues. Specifically, a Heterogeneous Syntax Graph (HSG) is first built by designing six types of augmented edges in AST, which indicates the heterogeneous structure of source code. Meanwhile, a dual position is designed for each token in the source code by considering the layout information. Moreover, we develop a heterogeneous graph neural network in HetSum to encode the HSG while extracting the code layout features with the Transformer encoder. By assimilating the learned code token vectors into the HSG encoder, HetSum can capture the relations between its two encoders for improved code representation. To facilitate the generation of high-quality summaries, we integrate a copying mechanism into the decoding procedure while expanding the Transformer decoding sublayer. Extensive experiments on the Java and Python datasets prove that HetSum is superior to seventeen state-of-the-art baselines. To promote reproducibility studies, we make the implementation of HetSum available at https://github.com/GJCEXP/HETSUM.

Full Text