Source Code Comments Research Articles

Technical debt presents sub-optimal choices made in development, which are beneficial in the short term but not in the long run. Consciously admitted debt, which is marked with a keyword, e.g., TODO, is called keyword-labeled self-admitted technical debt (KL-SATD). KL-SATD can lead to adverse effects in software development, e.g., to a rise in complexity within the developed software. We investigated the relationship between KL-SATD from source code comments and reports from the highly popular industrial program analysis tool SonarQube. The goal was to find which SonarQube metrics and issues are related to KL-SATD introduction and removal and how many KL-SATD in the context of an issue addresses that issue. We performed a study with 33 software repositories. We analyzed the changes in SonarQube reports (sqale index, reliability and security remediation metrics, and SonarQube issues) and the relationship to KL-SATD addition and removal with mixed model analysis. We manually annotated a sample to investigate how many KL-SATD comments are in the context of SonarQube issues and how many address them directly. KL-SATD is associated with a reduction in code maintainability measured with SonarQube’s sqale index. KL-SATD removal is associated with an increase in code maintainability (sqale index) and reliability measured with SonarQube’s reliability remediation effort. The introduction and removal of KL-SATD have a predominantly relationship with code smells, and not with vulnerabilities and bugs. Manual annotation revealed that 36% of KL-SATD comments are in the context of a SonarQube issue, but only 15% of the comment address an issue. This means that despite of statistical relationship between KL-SATD comments and SonarQube reports there is a large set of KL-SATD comments that are in areas that Sonarqube reports as clean or free of maintainability issues. KL-SATD introduction and removal are connected mainly to code smells, connecting them to maintainability rather than reliability or security. This is reinforced by the relationship with the sqale index, as well as the dominance of code smells in SonarQube issues. Many KL-SATD issues have characteristics going beyond static analysis tools and require future studies extending the capabilities of the current tools. As KL-SATD comments and SonarQube reports appear to have limited overlap, it suggests that they are complementary and both are needed for getting a comprehensive view coverage of code maintainability. The study also presents rules violations developers should be aware of regarding KL-SATD introduction and removal.

Read full abstract

High-quality source code comments are valuable for software development and maintenance, however, code often contains low-quality comments or lacks them altogether. We name such source code comments as suboptimal comments. Such suboptimal comments create challenges in code comprehension and maintenance. Despite substantial research on low-quality source code comments, empirical knowledge about commenting practices that produce suboptimal comments and reasons that lead to suboptimal comments are lacking. We help bridge this knowledge gap by investigating (1) independent comment changes (ICCs) —comment changes committed independently of code changes—which likely address suboptimal comments, (2) commenting guidelines, and (3) comment-checking tools and comment-generating tools, which are often employed to help commenting practice—especially to prevent suboptimal comments. We collect 24M+ comment changes from 4,392 open-source GitHub Java repositories and find that ICCs widely exist. The ICC ratio —proportion of ICCs among all comment changes—is ~15.5%, with 98.7% of the repositories having ICC. Our thematic analysis of 3,533 randomly sampled ICCs provides a three-dimensional taxonomy for what is changed (four comment categories and 13 subcategories), how it changed (six commenting activity categories), and what factors are associated with the change (three factors). We investigate 600 repositories to understand the prevalence, content, impact, and violations of commenting guidelines. We find that only 15.5% of the 600 sampled repositories have any commenting guidelines. We provide the first taxonomy for elements in commenting guidelines: where and what to comment are particularly important. The repositories without such guidelines have a statistically significantly higher ICC ratio, indicating the negative impact of the lack of commenting guidelines. However, commenting guidelines are not strictly followed: 85.5% of checked repositories have violations. We also systematically study how developers use two kinds of tools, comment-checking tools and comment-generating tools, in the 4,392 repositories. We find that the use of Javadoc tool is negatively correlated with the ICC ratio, while the use of Checkstyle has no statistically significant correlation; the use of comment-generating tools leads to a higher ICC ratio. To conclude, we reveal issues and challenges in current commenting practice, which help understand how suboptimal comments are introduced. We propose potential research directions on comment location prediction, comment generation, and comment quality assessment; suggest how developers can formulate commenting guidelines and enforce rules with tools; and recommend how to enhance current comment-checking and comment-generating tools.

Read full abstract

Source Code Comments Research Articles

Related Topics

Articles published on Source Code Comments

Keyword-labeled self-admitted technical debt and static code analysis have significant relationship but limited overlap

18 million links in commit messages: purpose, evolution, and decay

Completing Function Documentation Comments Using Structural Information

Automatic identification of self-admitted technical debt from four different sources

Self-Admitted Technical Debt in the Embedded Systems Industry: An Exploratory Case Study

Self-admitted technical debt classification using natural language processing word embeddings

Suboptimal Comments in Java Projects: From Independent Comment Changes to Commenting Practices

Investigating Novice Developers’ Code Commenting Trends Using Machine Learning Techniques

Code2tree: A Method for Automatically Generating Code Comments

Comments or Issues: Where to Document Technical Debt

Self-admitted technical debt in R: detection and causes

Self-Admitted Technical Debt and comments’ polarity: an empirical study

Identifying self-admitted technical debt in issue tracking systems using machine learning

FIXME: synchronize with database! An empirical study of data access self-admitted technical debt

Deep neural network ensembles for detecting self-admitted technical debt

A Study of Vulnerability Identifiers in Code Comments: Source, Purpose, and Severity

SATDBailiff-mining and tracking self-admitted technical debt

Detecting and Classifying Self-Admitted of Technical Debt with CNN-BiLSTM

An empirical study on the co-occurrence between refactoring actions and Self-Admitted Technical Debt removal

Leveraging machine learning for software redocumentation—A comprehensive comparison of methods in practice

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Source Code Comments Research Articles

Related Topics

Articles published on Source Code Comments

Keyword-labeled self-admitted technical debt and static code analysis have significant relationship but limited overlap

18 million links in commit messages: purpose, evolution, and decay

Completing Function Documentation Comments Using Structural Information

Automatic identification of self-admitted technical debt from four different sources

Self-Admitted Technical Debt in the Embedded Systems Industry: An Exploratory Case Study

Self-admitted technical debt classification using natural language processing word embeddings

Suboptimal Comments in Java Projects: From Independent Comment Changes to Commenting Practices

Investigating Novice Developers’ Code Commenting Trends Using Machine Learning Techniques

Code2tree: A Method for Automatically Generating Code Comments

Comments or Issues: Where to Document Technical Debt

Self-admitted technical debt in R: detection and causes

Self-Admitted Technical Debt and comments’ polarity: an empirical study

Identifying self-admitted technical debt in issue tracking systems using machine learning

FIXME: synchronize with database! An empirical study of data access self-admitted technical debt

Deep neural network ensembles for detecting self-admitted technical debt

A Study of Vulnerability Identifiers in Code Comments: Source, Purpose, and Severity

SATDBailiff-mining and tracking self-admitted technical debt

Detecting and Classifying Self-Admitted of Technical Debt with CNN-BiLSTM

An empirical study on the co-occurrence between refactoring actions and Self-Admitted Technical Debt removal

Leveraging machine learning for software redocumentation—A comprehensive comparison of methods in practice