Abstract

Code comments are an integral part of software development. They improve program comprehension and software maintainability. The lack of code comments is a common problem in the software industry. Therefore, it is beneficial to generate code comments automatically. In this paper, we propose a general approach to generate code comments automatically by analyzing existing software repositories. We apply code clone detection techniques to discover similar code segments and use the comments from some code segments to describe the other similar code segments. We leverage natural language processing techniques to select relevant comment sentences. In our evaluation, we analyze 42 million lines of code from 1,005 open source projects from GitHub, and use them to generate 359 code comments for 21 Java projects. We manually evaluate the generated code comments and find that only 23.7% of the generated code comments are good. We report to the developers the good code comments, whose code segments do not have an existing code comment. Amongst the reported code comments, seven have been confirmed by the developers as good and committable to the software repository while the rest await for developers' confirmation. Although our approach can generate good and committable comments, we still have to improve the yield and accuracy of the proposed approach before it can be used in practice with full automation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call