Code samples summarization for knowledge exchange in developer community

Shikai Guo,Hui Li,Zixuan Song,Zhongyan Liu,Rong Chen

doi:10.1002/spe.3151

Abstract

AbstractA question title's function is to generate readable titles and describe a problem encountered by the code. Previous studies often used an end‐to‐end sequence‐to‐sequence system to generate question title's from source code. However, long‐term dependencies are often difficult to capture, and this may result in an incomplete source code representation. To address this issue, we propose a Transformer for Generating Code Title (hereinafter referred to as TGCT) model. Specifically, the TGCT model uses the position coding mechanism to model paired relationships between source terms by applying relative position representations. Multiple self‐attention mechanism components are also used to capture long‐term dependencies of the code. Comprehensive experiments on datasets from five coding languages, namely Python, Java, JavaScript, C#, and SQL, are conducted, and the results show that TGCT outperforms state‐of‐the‐art models based on the measurements of BLEU and ROUGE in general. In addition, a cross‐sectional comparison experiment was conducted to verify the effects of different model parameters, different data set sizes, position coding mechanism, and self‐attention mechanism on model results.

Full Text