Multi-grained contextual code representation learning for commit message generation

Chuangwei Wang,Li Zhang,Xiaofang Zhang

doi:10.1016/j.infsof.2023.107393

Chuangwei Wang, Li Zhang + Show 1 more

https://doi.org/10.1016/j.infsof.2023.107393

Copy DOI

Export

Save

Cite

Journal: Information and Software Technology	Publication Date: Dec 19, 2023
Citations: 3

Abstract
Full-Text
Similar Papers

Abstract

Listen

Commit messages, precisely describing the code changes for each commit in natural language, makes it possible for developers and succeeding reviewers to understand the code changes without digging into implementation details. However, the semantic and structural gap between code and natural language poses a significant challenge for commit message generation. Several researchers have proposed automated techniques to generate commit messages. Nevertheless, the information about the code is not sufficiently exploited. In this paper, we propose multi-grained contextual code representation learning for commit message generation (COMU). We extract multi-grained information from the changed code at the line and AST levels (i.e., Code_Diff and AST_Diff). In Code_Diff, we construct global contextual semantic information about the changed code, and mark whether a line of code has changed with three different tokens. In AST_Diff, we extract the code structure from source code changes and combine the extracted structure with four types of editing operations to explicitly focus on the detailed information of the changed part. In addition, we build the experimental datasets, since there is still no publicly sufficient dataset for this task. The release of this dataset would contribute to advancing research in this field. We perform an extensive experiment to evaluate the effectiveness of COMU. The experimental evaluation and human study show that our model outperforms the baseline model.

Full Text