Abstract
Commit messages, precisely describing the code changes for each commit in natural language, makes it possible for developers and succeeding reviewers to understand the code changes without digging into implementation details. However, the semantic and structural gap between code and natural language poses a significant challenge for commit message generation. Several researchers have proposed automated techniques to generate commit messages. Nevertheless, the information about the code is not sufficiently exploited. In this paper, we propose multi-grained contextual code representation learning for commit message generation (COMU). We extract multi-grained information from the changed code at the line and AST levels (i.e., Code_Diff and AST_Diff). In Code_Diff, we construct global contextual semantic information about the changed code, and mark whether a line of code has changed with three different tokens. In AST_Diff, we extract the code structure from source code changes and combine the extracted structure with four types of editing operations to explicitly focus on the detailed information of the changed part. In addition, we build the experimental datasets, since there is still no publicly sufficient dataset for this task. The release of this dataset would contribute to advancing research in this field. We perform an extensive experiment to evaluate the effectiveness of COMU. The experimental evaluation and human study show that our model outperforms the baseline model.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have