Abstract
Code fragment natural language description generation, also known as code summarization, refers to obtaining a natural language sequence describing a given code fragment's functionality. It is broadly agreed that applying code summarization into production can significantly improve the efficiency of software development and maintenance. In recent years, syntactic analysis (SA) technology and Latent Dirichlet Allocation (LDA) has been widely used in code summarization and has achieved good results. However, most of the existing techniques focus on core code statements, and thus their generated code summarization lacks a logical description of the code fragment's holistic information. To this end, we propose a code summarization method based on multiple modules to generate natural language for each code statement by constructing a new type of natural language template. Meanwhile, to utilize the code fragment's holistic information, we adopt the code statement partition rules and cosine similarity measure to rank and optimize the weight of the overall information of the code fragment, and finally generate the holistic natural language description of the code fragment. The experimental results demonstrate that our method can generate more concise and logical natural language descriptions than existing models.
Highlights
With the vigorous development of computer technology, various software and applications are emerging in an endless stream
APPROACH As illustrated in Fig. 1, we propose a code summarization method based on multiple modules: 1) Preprocessing module, which divides a given code fragment into various statements according to division rules; 2) Internal processing module, which utilizes CamelCase and Software Word Usage Model (SWUM) to split identifiers of various statements to mine source code characteristics, and combines natural language templates to generate natural language descriptions of source code statements; 3) External processing module, which sorts the natural language descriptions generated in the internal processing module
1) RQ1 (Accuracy Evaluation): Can the program description generated by our method accurately describe the functionality of the source code?
Summary
With the vigorous development of computer technology, various software and applications are emerging in an endless stream. The keywords extracted in practical applications belong to code surface semantic information, which does not reflect the code function well For this reason, Brian et al [2] proposed hierarchical Pachinko Allocation Model (hPAM) to implement code summarization tasks. X. Gao et al.: Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments summarization automatically. We propose a code summarization method based on multiple modules This method does not depend on deep learning models and specific training datasets. 1) We propose a code summarization method for the overall source code fragment, which divides, optimizes, and sorts the code statements through a multi-module processing mechanism to consider the code fragment’s global information and reduce the redundant information in the generated results. 2) We propose a weight calculation mechanism to prioritize the generated program description sentences to ensure that the generated natural language has high logic
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.