Source Code Patterns Research Articles

Detecting Bug Inducing Commit (BIC) or Just in Time (JIT) defect prediction using Machine Learning (ML) based models requires tabulated feature values extracted from the source code or historical maintenance data of a software system. Existing studies have utilized meta-data from source code repositories (we named them GitHub Statistics or GS), n-gram-based source code text processing, and developer’s information (e.g., the experience of a developer) as the feature values in ML-based bug detection models. However, these feature values do not represent the source code syntax styles or patterns that a developer might prefer over available valid alternatives provided by programming languages. This investigation proposed a method to extract features from its source code syntax patterns to represent software commits and investigate whether they are helpful in detecting bug proneness in software systems. We utilize six manually and two automatically labeled datasets from eight open-source software projects written in Java, C++, and Python programming languages. Our datasets contain 642 manually labeled and 4014 automatically labeled buggy and non-buggy commits from six and two subject systems, respectively. The subject systems contain a diverse number of revisions, and they are from various application domains. Our investigation shows the inclusion of the proposed features increases the performance of detecting buggy and non-buggy software commits using five different machine learning classification models. Our proposed features also perform better in detecting buggy commits using the Deep Belief Network generated features and classification model. This investigation also implemented a state-of-the-art tool to compare the explainability of predicted buggy commits using our proposed and traditional features and found that our proposed features provide better reasoning about buggy commit detection compared to the traditional features. The continuation of this study can lead us to enhance software effectiveness by identifying, minimizing, and fixing software bugs during its maintenance and evolution.

Read full abstract

A code clone is a pair of code fragments, within or between software systems that are similar. Since code clones often negatively impact the maintainability of a software system, several code clone detection techniques and tools have been proposed and studied over the last decade. However, the clone detection tools are not always perfect and their clone detection reports often contain a number of false positives or irrelevant clones from specific project management or user perspective. To detect all possible similar source code patterns in general, the clone detection tools work on the syntax level while lacking user-specific preferences. This often means the clones must be manually inspected before analysis in order to remove those false positives from consideration. This manual clone validation effort is very time-consuming and often error-prone, in particular for large-scale clone detection. In this paper, we propose a machine learning approach for automating the validation process. First, a training dataset is built by taking code clones from several clone detection tools for different subject systems and then manually validating those clones. Second, several features are extracted from those clones to train the machine learning model by the proposed approach. The trained algorithm is then used to automatically validate clones without human inspection. Thus the proposed approach can be used to remove the false positive clones from the detection results, automatically evaluate the precision of any clone detectors for any given set of datasets, evaluate existing clone benchmark datasets, or even be used to build new clone benchmarks and datasets with minimum effort. In an experiment with clones detected by several clone detectors in several different software systems, we found our approach has an accuracy of up to 87.4% when compared against the manual validation by multiple expert judges. The proposed method also shows better results in several comparative studies with the existing related approaches for clone classification.

Read full abstract

Source Code Patterns Research Articles

Related Topics

Articles published on Source Code Patterns

DetectVul: A statement-level code vulnerability detection for Python

Insecurity Refactoring: Automated Injection of Vulnerabilities in Source Code

Utilizing source code syntax patterns to detect bug inducing commits using machine learning models

Predicting input validation vulnerabilities based on minimal SSA features and machine learning

A Feature-Based Method for Detecting Design Patterns in Source Code

Discovering Sequential Source Code Patterns in Software Engineering

A Bacterial Foraging Algorithm with Random Forest Classifier for Detecting the Design Patterns in Source Code

A machine learning based framework for code clone validation

TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology.

Assessing source code vulnerabilities in a cloud‐based system for health systems: OpenNCP

Applying learning-based methods for recognizing design patterns

An investigation of misunderstanding code patterns in C open-source software projects

Efficiently detecting structural design pattern instances based on ordered sequences

Does Python Smell Like Java? Tool Support for Design Defect Discovery in Python

Detecting design patterns in object-oriented program source code by using graph matching algorithm

Detecting Design Patterns in Object-Oriented Program Source Code by Using Metrics and Machine Learning

Recovering Runtime Structures of Software Systems from Static Source Code

An Algebraic Programming Style for Numerical Software and Its Optimization

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Source Code Patterns Research Articles

Related Topics

Articles published on Source Code Patterns

DetectVul: A statement-level code vulnerability detection for Python

Insecurity Refactoring: Automated Injection of Vulnerabilities in Source Code

Utilizing source code syntax patterns to detect bug inducing commits using machine learning models

Predicting input validation vulnerabilities based on minimal SSA features and machine learning

A Feature-Based Method for Detecting Design Patterns in Source Code

Discovering Sequential Source Code Patterns in Software Engineering

A Bacterial Foraging Algorithm with Random Forest Classifier for Detecting the Design Patterns in Source Code

A machine learning based framework for code clone validation

TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology.

Assessing source code vulnerabilities in a cloud‐based system for health systems: OpenNCP

Applying learning-based methods for recognizing design patterns

An investigation of misunderstanding code patterns in C open-source software projects

Efficiently detecting structural design pattern instances based on ordered sequences

Does Python Smell Like Java? Tool Support for Design Defect Discovery in Python

Detecting design patterns in object-oriented program source code by using graph matching algorithm

Detecting Design Patterns in Object-Oriented Program Source Code by Using Metrics and Machine Learning

Recovering Runtime Structures of Software Systems from Static Source Code

An Algebraic Programming Style for Numerical Software and Its Optimization