Automatic Generation of Large-Granularity Pull Request Description

Li Kuang,Ruyi Shi,Huan Zhang,Honghao Gao,Leihao Zhao

doi:10.21655/ijsi.1673-7288.00253

Abstract

In GitHub platform, many project contributors often ignore the descriptions of Pull Requests (PRs) when submitting PRs, making their PRs easily neglected or rejected by reviewers. Therefore, it is necessary to generate PR descriptions automatically to help increase the PR pass rate. The performances of existing PR description generation methods are usually affected by PR granularity, so it is difficult to generate descriptions for large-granularity PRs effectively. For such reasons, this work focuses on generating descriptions for large-granularity PRs. The text information is first preprocessed in PRs and word-sentence heterogeneous graphs are constructed where the words are taken as secondary nodes, so as to establish the connections between PR sentences. Subsequently, feature extraction is performed on the heterogeneous graphs, and then the features are input to a graph neural network for further graph representation learning, from which the sentence nodes can learn more abundant content information through message delivery between nodes. Finally, the sentences with key information are selected to form a PR description. In addition, the supervised learning method cannot be used for training due to the lack of manually labeled tags in the dataset; therefore, reinforcement learning is adopted to guide the generation of PR descriptions. The goal of model training is minimizing the negative expectation of rewards, which does not require the ground truth and directly improves the performance of the results. The experiments are conducted on real dataset and the experimental results show that the proposed method is superior to existing methods in $F1$ and readability.

Full Text