A Review on Source Code Documentation

Sawan Rai,Ramesh Chandra Belwal,Atul Gupta

doi:10.1145/3519312

Abstract

Context: Coding is an incremental activity where a developer may need to understand a code before making suitable changes in the code. Code documentation is considered one of the best practices in software development but requires significant efforts from developers. Recent advances in natural language processing and machine learning have provided enough motivation to devise automated approaches for source code documentation at multiple levels. Objective: The review aims to study current code documentation practices and analyze the existing literature to provide a perspective on their preparedness to address the stated problem and the challenges that lie ahead. Methodology: We provide a detailed account of the literature in the area of automated source code documentation at different levels and critically analyze the effectiveness of the proposed approaches. This also allows us to infer gaps and challenges to address the problem at different levels. Findings: (1) The research community focused on method-level summarization. (2) Deep learning has dominated the past five years of this research field. (3) Researchers are regularly proposing bigger corpora for source code documentation. (4) Java and Python are the widely used programming languages as corpus. (5) Bilingual Evaluation Understudy is the most favored evaluation metric for the research persons.

Full Text