A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments

Xuejian Gao,Xue Jiang,Lei Lyu,Qiong Wu,Chen Lyu,Xiao Wang

doi:10.1109/access.2021.3055955

Abstract

Code fragment natural language description generation, also known as code summarization, refers to obtaining a natural language sequence describing a given code fragment's functionality. It is broadly agreed that applying code summarization into production can significantly improve the efficiency of software development and maintenance. In recent years, syntactic analysis (SA) technology and Latent Dirichlet Allocation (LDA) has been widely used in code summarization and has achieved good results. However, most of the existing techniques focus on core code statements, and thus their generated code summarization lacks a logical description of the code fragment's holistic information. To this end, we propose a code summarization method based on multiple modules to generate natural language for each code statement by constructing a new type of natural language template. Meanwhile, to utilize the code fragment's holistic information, we adopt the code statement partition rules and cosine similarity measure to rank and optimize the weight of the overall information of the code fragment, and finally generate the holistic natural language description of the code fragment. The experimental results demonstrate that our method can generate more concise and logical natural language descriptions than existing models.

Highlights

With the vigorous development of computer technology, various software and applications are emerging in an endless stream
APPROACH As illustrated in Fig. 1, we propose a code summarization method based on multiple modules: 1) Preprocessing module, which divides a given code fragment into various statements according to division rules; 2) Internal processing module, which utilizes CamelCase and Software Word Usage Model (SWUM) to split identifiers of various statements to mine source code characteristics, and combines natural language templates to generate natural language descriptions of source code statements; 3) External processing module, which sorts the natural language descriptions generated in the internal processing module
1) RQ1 (Accuracy Evaluation): Can the program description generated by our method accurately describe the functionality of the source code?

Summary

INTRODUCTION

With the vigorous development of computer technology, various software and applications are emerging in an endless stream. The keywords extracted in practical applications belong to code surface semantic information, which does not reflect the code function well For this reason, Brian et al [2] proposed hierarchical Pachinko Allocation Model (hPAM) to implement code summarization tasks. X. Gao et al.: Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments summarization automatically. We propose a code summarization method based on multiple modules This method does not depend on deep learning models and specific training datasets. 1) We propose a code summarization method for the overall source code fragment, which divides, optimizes, and sorts the code statements through a multi-module processing mechanism to consider the code fragment’s global information and reduce the redundant information in the generated results. 2) We propose a weight calculation mechanism to prioritize the generated program description sentences to ensure that the generated natural language has high logic

PROBLEM STATEMENT

1: Algorithm Process

METRICS

THREATS TO VALIDITY

VIII. CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 33	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Enhancing code summarization with action word prediction
Mingchen Li ... Zijie Huang
Neurocomputing | VOL. 563
Mingchen Li, et. al.Mingchen Li ... Zijie Huang
16 Oct 2023
Neurocomputing | VOL. 563

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning
Wei Ye ... Jinglei Zhang
-
Wei Ye, et. al.Wei Ye ... Jinglei Zhang
20 Apr 2020
20 Apr 2020

CoDesc: A Large Code–Description Parallel Dataset
...
-
, et. al. ...
03 Aug 2021
03 Aug 2021

Utilizing Keywords in Source Code to Improve Code Summarization
Peng-Fei Liu ... Xiao-Meng Wang
-
Peng-Fei Liu, et. al.Peng-Fei Liu ... Xiao-Meng Wang
11 Dec 2020
11 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access