Abstract
Source code summarization refers to the natural language description of the source code’s function. It can help developers easily understand the semantics of the source code. We can think of the source code and the corresponding summarization as being symmetric. However, the existing source code summarization is mismatched with the source code, missing, or out of date. Manual source code summarization is inefficient and requires a lot of human efforts. To overcome such situations, many studies have been conducted on Automatic Source Code Summarization (ASCS). Given a set of source code, the ASCS techniques can automatically generate a summary described with natural language. In this paper, we give a review of the development of ASCS technology. Almost all ASCS technology involves the following stages: source code modeling, code summarization generation, and quality evaluation. We further categorize the existing ASCS techniques based on the above stages and analyze their advantages and shortcomings. We also draw a clear map on the development of the existing algorithms.
Highlights
Code summarization, called code comment, is a text description for the function and purpose of special identifiers in computer programs
We conducted an in-depth analysis of Automatic Source Code Summarization (ASCS): (1) We outlined the core of the paper, which consists of the current challenges, and systematized the ASCS based on three dimensions: source code analysis, code summarization generation algorithms, and the evaluation methodologies used to evaluate them
(3) We summarized the effective evaluation mechanism of ASCS, and analyzed the recent evaluation methods
Summary
Called code comment, is a text description for the function and purpose of special identifiers in computer programs. The quality evaluation methods of NLP are used for code summarization, but the source code is different from natural language text. According to the technique development, we summarize the work from three aspects: source code modeling, automatic code summarization algorithms, and the summarizaiton quality evaluation. This survey makes the following contributions to the field:. Almost all source code modeling uses machine learning, and paper [45] can be used as a reference It carried out an extensive literature search and identified 364 primary studies published between 2002 and 2021, aiming to summarize the current knowledge in the area of applied machine learning for source code analysis. The quality evaluation measures the pros and cons of ASCS algorithms through the generated code summarization.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.