The price of precision: the cost of preprocessing for automated code revision in code review
Abstract Code review is a widespread practice in software engineering during which developers examine each other’s source code changes to identify potential issues and improve code quality. Among the automated techniques proposed by researchers to reduce the manual workload of code review, Automated Code Revision (ACR) aims to automatically address reviewers’ feedback by producing a revised version of the code. Transformer-based language models have demonstrated state-of-the-art results in ACR. The performance of these models, however, is significantly influenced by the quality and preparation of the training and evaluation data. We present several systematic analyses of prevalent preprocessing steps, examined both cumulatively and in isolation, across three established preprocessing pipelines and two dataset splitting strategies (time-level vs. project-level). Our study spans across models of different scales: OpenNMT (small), T5 and CodeReviewer (mid-sized), LoRA-tuned CodeLLaMA-7B (large), and GPT-3.5-Turbo (large, black-box). Using datasets up to 496k training records, we evaluate and statistically compare models’ performance using exact match ratio (EXM), CodeBLEU, and Levenshtein ratio. Our findings show that preprocessing may be a significant component in the success of the different techniques: OpenNMT relies on heavy preprocessing; T5 benefits from light filtering (selective removal of records); CodeReviewer performs best when trained on larger, less aggressively filtered data; CodeLLaMA-7B and ChatGPT-3.5 Turbo are largely indifferent to preprocessing. Overall, the effectiveness of ACR tools depends on aligning preprocessing with model scale and training setup. In general, small models need abstraction, mid-sized ones benefit from light filtering, and large-scale models perform best when trained on the original, unprocessed form of the code.
- Conference Article
41
- 10.1109/saner48275.2020.9054794
- Feb 1, 2020
Code review is a common process that is used by developers, in which a reviewer provides useful comments or points out defects in the submitted source code changes via pull request. Code review has been widely used for both industry and open-source projects due to its capacity in early defect identification, project maintenance, and code improvement. With rapid updates on project developments, code review becomes a non-trivial and labor-intensive task for reviewers. Thus, an automated code review engine can be beneficial and useful for project development in practice. Although there exist prior studies on automating the code review process by adopting static analysis tools or deep learning techniques, they often require external sources such as partial or full source code for accurate review suggestion. In this paper, we aim at automating the code review process only based on code changes and the corresponding reviews but with better performance. The hinge of accurate code review suggestion is to learn good representations for both code changes and reviews. To achieve this with limited source, we design a multi-level embedding (i.e., word embedding and character embedding) approachto represent the semantics provided by code changes and reviews. The embeddings are then well trained through a proposed attentional deep learning model, as a whole named CORE. We evaluate the effectiveness of CORE on code changes and reviews collected from 19 popular Java projects hosted on Github. Experimental results show that our model CORE can achieve significantly better performance than the state-of-the-art model (DeepMem), with an increase of 131.03% in terms of Recall@10 and 150.69% in terms of Mean Reciprocal Rank. Qualitative general word analysis among project developers also demonstrates the performance of CORE in automating code review.
- Conference Article
3
- 10.1109/iwsc.2019.8665852
- Feb 1, 2019
Code review is key to ensuring the absence of potential issues in source code. Code reviewers spend a large amount of time to manually check submitted patches based on their knowledge. Since a number of patches sometimes have similar potential issues, code reviewers need to suggest similar source code changes to patch authors. If patch authors notice similar code improvement patterns by themselves before submitting to code review, reviewers’ cost would be reduced. In order to detect similar code changes patterns, this study employs a sequential pattern mining to detect source code improvement patterns that frequently appear in code review history. In a case study using a code review dataset of the OpenStack project, we found that the detected patterns by our proposed approach included effective examples to improve patches without reviewers’ manual check. We also found that the patterns have been changed in time series; our pattern mining approach timely achieves to track the effective code improvement patterns.
- Conference Article
48
- 10.1109/icsme.2014.46
- Sep 1, 2014
As a software project ages, its source code is modified to add new features, restructure existing ones, and fix defects. These source code changes often induce changes in the build system, i.e., the system that specifies how source code is translated into deliverables. However, since developers are often not familiar with the complex and occasionally archaic technologies used to specify build systems, they may not be able to identify when their source code changes require accompanying build system changes. This can cause build breakages that slow development progress and impact other developers, testers, or even users. In this paper, we mine the source and test code changes that required accompanying build changes in order to better understand this co-change relationship. We build random forest classifiers using language-agnostic and language-specific code change characteristics to explain when code-accompanying build changes are necessary based on historical trends. Case studies of the Mozilla C++ system, the Lucene and Eclipse open source Java systems, and the IBM Jazz proprietary Java system indicate that our classifiers can accurately explain when build co-changes are necessary with an AUC of 0.60-0.88. Unsurprisingly, our highly accurate C++ classifiers (AUC of 0.88) derive much of their explanatory power from indicators of structural change (e.g., was a new source file added?). On the other hand, our Java classifiers are less accurate (AUC of 0.60-0.78) because roughly 75% of Java build co-changes do not coincide with changes to the structure of a system, but rather are instigated by concerns related to release engineering, quality assurance, and general build maintenance.
- Conference Article
1
- 10.15514/syrcose-2008-2-23
- Jan 1, 2008
Software development process is a complex sequence of actions having source code of working system as a result. All project participants should track changes in source code during work process to know what’s happening. However to make «manual» code review everyone should have corresponding technical skills and a lot of time to spend. This work describes usage of automated source code changes classification aimed to control source code evolution. The method bases on statistical clusterization of change metrics. In this work we show usage of automatic classification of changes to optimize code review and code change control on final development stages. Development process report building is also shown.
- Conference Article
9
- 10.1145/2897586.2897605
- May 14, 2016
Code review is a common practice for improving the quality of source code changes and expediting knowledge transfer in a development community. In modern code review, source code changes or patches are considered to be assessed and approved for integration by multiple reviews. However, from our empirical study, we found that some patches are reviewed by only one reviewer, and some reviewers did not continue the review discussion, which can have negative effects on software quality. To understand these reviewers' behaviors, we model the code review situation based on the snowdrift game, which is used to analyze social dilemmas. With this game-theoretical modeling, we found that it can explain reviewers' behaviors well.
- Conference Article
12
- 10.15439/2017f536
- Sep 24, 2017
Code review is a key tool for quality assurance in software development.It is intended to find coding mistakes overlooked during development phase and lower risk of bugs in final product.In large and complex projects accurate code review is a challenging task.As code review depends on individual reviewer predisposition there is certain margin of source code changes that is not checked as it should.In this paper we propose machine learning approach for pointing project artifacts that are significantly at risk of failure.Planning and adjusting quality assurance (QA) activities could strongly benefit from accurate estimation of software areas endangered by defects.Extended code review could be directed there.The proposed approach has been evaluated for feasibility on large medical software project.Significant work was done to extract features from heterogeneous production data, leading to good predictive model.Our preliminary research results were considered worthy of implementation in the company where the research has been conducted, thus opening the opportunities for the continuation of the studies.
- Dissertation
1
- 10.5167/uzh-61703
- Jan 1, 2012
Software development and, in particular, software maintenance are time consuming and require detailed knowledge of the structure and the past development activities of a software system. Limited resources and time constraints make the situation even more difficult. Therefore, a significant amount of research effort has been dedicated to learning software prediction models that allow project members to allocate and spend the limited resources efficiently on the (most) critical parts of their software system. Prominent examples are bug prediction models and change prediction models: Bug prediction models identify the bug-prone modules of a software system that should be tested with care; change prediction models identify modules that change frequently and in combination with other modules, i.e., they are change coupled. By combining statistical methods, data mining approaches, and machine learning techniques software prediction models provide a structured and analytical basis to make decisions.Researchers proposed a wide range of approaches to build effective prediction models that take into account multiple aspects of the software development process. They achieved especially good prediction performance, guiding developers towards those parts of their system where a large share of bugs can be expected. For that, they rely on change data provided by version control systems (VCS). However, due to the fact that current VCS track code changes only on file-level and textual basis most of those approaches suffer from coarse-grained and rather generic change information. More fine-grained change information, for instance, at the level of source code statements, and the type of changes, e.g., whether a method was renamed or a condition expression was changed, are often not taken into account. Therefore, investigating the development process and the evolution of software at a fine-grained change level has recently experienced an increasing attention in research.The key contribution of this thesis is to improve software prediction models by using fine-grained source code changes. Those changes are based on the abstract syntax tree structure of source code and allow us to track code changes at the fine-grained level of individual statements. We show with a series of empirical studies using the change history of open-source projects how prediction models can benefit in terms of prediction performance and prediction granularity from the more detailed change information.First, we compare fine-grained source code changes and code churn, i.e., lines modified, for bug prediction. The results with data from the Eclipse platform show that fine grained-source code changes significantly outperform code churn when classifying source files into bug- and not bug-prone, as well as when predicting the number of bugs in source files. Moreover, these results give more insights about the relation of individual types of code changes, e.g., method declaration changes and bugs. For instance, in our dataset method declaration changes exhibit a stronger correlation with the number of bugs than class declaration changes.Second, we leverage fine-grained source code changes to predict bugs at method-level. This is beneficial as files can grow arbitrarily large. Hence, if bugs are predicted at the level of files a developer needs to manually inspect all methods of a file one by one until a particular bug is located.Third, we build models using source code properties, e.g., complexity, to predict whether a source file will be affected by a certain type of code change. Predicting the type of changes is of practical interest, for instance, in the context of software testing as different change types require different levels of testing: While for small statement changes local unit-tests are mostly sufficient, API changes, e.g., method declaration changes, might require system-wide integration-tests which are more expensive. Hence, knowing (in advance) which types of changes will most likely occur in a source file can help to better plan and develop tests, and, in case of limited resources, prioritize among different types of testing.Finally, to assist developers in bug triaging we compute prediction models based on the attributes of a bug report that can be used to estimate whether a bug will be fixed fast or whether it will take more time for resolution.The results and findings of this thesis give evidence that fine-grained source code changes can improve software prediction models to provide more accurate results.
- Conference Article
47
- 10.1145/3238147.3238219
- Sep 3, 2018
Analyzing and understanding source code changes is important in a variety of software maintenance tasks. To this end, many code differencing and code change summarization methods have been proposed. For some tasks (e.g. code review and software merging), however, those differencing methods generate too fine-grained a representation of code changes, and those summarization methods generate too coarse-grained a representation of code changes. Moreover, they do not consider the relationships among code changes. Therefore, the generated differences or summaries make it not easy to analyze and understand code changes in some software maintenance tasks. In this paper, we propose a code differencing approach, named CLDIFF, to generate concise linked code differences whose granularity is in between the existing code differencing and code change summarization methods. The goal of CLDIFF is to generate more easily understandable code differences. CLDIFF takes source code files before and after changes as inputs, and consists of three steps. First, it pre-processes the source code files by pruning unchanged declara- tions from the parsed abstract syntax trees. Second, it generates concise code differences by grouping fine-grained code differences at or above the statement level and describing high-level changes in each group. Third, it links the related concise code differences according to five pre-defined links. Experiments with 12 Java projects (74,387 commits) and a human study with 10 participants have indicated the accuracy, conciseness, performance and usefulness of CLDIFF.
- Research Article
9
- 10.47852/bonviewjcce2022010102
- Dec 17, 2021
- Journal of Computational and Cognitive Engineering
Detecting and removing hateful speech in various online social media is a challenging task. Researchers tried to solve this problem by using both classical and deep learning methods, which are found to have limitations in terms of the requirement of extensive hand-crafted features, model architecture design, and pretrained embeddings, that are not very proficient in capturing semantic relations between words. Therefore, in this paper, we tackle the problem using Transformer-based pretrained language models which are specially designed to produce contextual embeddings of text sequences. We have evaluated two such models—RoBERTa and XLNet—using four publicly available datasets from different social media platforms and compared them to the existing baselines. Our investigation shows that the Transformer-based models either surpass or match all of the existing baseline scores by significant margins obtained by previously used models such as 1-dimensional convolutional neural network (1D-CNN) and long short-term memory (LSTM). The Transformer-based models proved to be more robust by achieving native performance when trained and tested on two different datasets. Our investigation also revealed that variations in the characteristics of the data produce significantly different results with the same model. From the experimental observations, we are able to establish that Transformer-based language models exhibit superior performance than their conventional counterparts at a fraction of the computation cost and minimal need for complex model engineering. Received: 28 August 2021 | Revised: 20 October 2021 | Accepted: 29 October 2021 Conflicts of Interest The authors declare that they have no conflicts of interest to this work.
- Conference Article
9
- 10.1109/cee-secr.2011.6188468
- Oct 1, 2011
The primary goal of software repositories is to store the source code of software during its development. Developers constantly store small parts (i.e. software modifications) of code into the repository and share those parts with others until the software is finished. However, software repositories store a significant amount of information about software and development processes. With the appropriate tool, source code modifications could be identified. In the article, we will introduce a tool for identifying structural source code changes from software repositories. With this tool, three open source projects were analyzed and different source code changes were identified during their development. We showed that the tool could be used to identify source code changes from software repositories.
- Conference Article
1
- 10.1109/icsme.2019.00064
- Sep 1, 2019
In the era of Big Code, when researchers seek to study an increasingly large number of repositories to support their findings, the data processing stage may require manipulating millions and more of records. In this work we focus on studies involving fine-grained AST level source code changes. We present how we extended the CodeDistillery source code mining framework with data manipulation capabilities, aimed to alleviate the processing of large datasets of fine grained source code changes. The capabilities we have introduced allow researchers to highly automate their repository mining process and streamline the data acquisition and processing phases. These capabilities have been successfully used to conduct a number of studies, in the course of which dozens of millions of fine-grained source code changes have been processed.
- Research Article
1
- 10.51594/csitrj.v5i8.1380
- Aug 3, 2024
- Computer Science & IT Research Journal
Achieving comprehensive code quality and ensuring software reliability are critical goals in modern software engineering. This paper delves into advanced strategies that encompass both technical and organizational practices, aiming to enhance code quality and boost software reliability. Key strategies include the implementation of rigorous code review processes, adoption of automated testing frameworks, and the utilization of static and dynamic code analysis tools. Firstly, rigorous code review processes are fundamental for maintaining high standards of code quality. By fostering a collaborative environment where peer reviews are routine, teams can catch potential issues early, ensure adherence to coding standards, and promote knowledge sharing. This practice not only improves code quality but also enhances team cohesion and expertise. Secondly, automated testing frameworks are indispensable in achieving comprehensive test coverage and reducing the incidence of bugs. Unit tests, integration tests, and end-to-end tests, when automated, provide continuous feedback and ensure that code changes do not introduce new defects. This continuous testing approach is crucial for maintaining software reliability, especially in agile and DevOps environments where rapid iterations are common. Furthermore, the integration of static and dynamic code analysis tools into the development pipeline offers additional layers of quality assurance. Static analysis tools inspect the codebase for potential vulnerabilities, code smells, and compliance with coding standards without executing the program. Dynamic analysis tools, on the other hand, evaluate the software during runtime to detect memory leaks, concurrency issues, and performance bottlenecks. Together, these tools provide a comprehensive assessment of the code’s health and reliability. Moreover, fostering a culture of continuous improvement and learning within the development team is essential. Regular training sessions, workshops, and knowledge-sharing activities help in keeping the team updated with the latest advancements in software engineering practices and tools. Encouraging a mindset of quality-first and reliability among developers ensures long-term benefits and sustainability. In conclusion, achieving comprehensive code quality and ensuring software reliability require a multifaceted approach that integrates rigorous code reviews, automated testing frameworks, and both static and dynamic analysis tools. Coupled with a strong culture of continuous learning and improvement, these strategies collectively contribute to the development of robust, reliable, and high-quality software systems. Keywords: Software Reliability, Code Quality, Advanced, Strategies, Comprehensive.
- Conference Article
1
- 10.1145/3425269.3425282
- Oct 19, 2020
<p>Vídeo de auxílio para apresentação do artigo: An Approach to Identify and Classify State Machine Changes from Code Changes</p>
- Conference Article
12
- 10.1109/scam.2019.00014
- Sep 1, 2019
Code review has been widely acknowledged as a key quality assurance process in both open-source and industrial software development. Due to the asynchronicity of the code review process, the system's codebase tends to incorporate external commits while a source code change is reviewed, which cause the need for rebasing operations. External commits have the potential to modify files currently under review, which causes re-work for developers and fatigue for reviewers. Since source code changes observed during code review may be due to external commits, rebasing operations may pose a severe threat to empirical studies that employ code review data. Yet, to the best of our knowledge, there is no empirical study that characterises and investigates rebasing in real-world software systems. Hence, this paper reports an empirical investigation aimed at understanding the frequency in which rebasing operations occur and their side-effects in the reviewing process. To achieve so, we perform an in-depth large-scale empirical investigation of the code review data of 11 software systems, 28,808 code reviews and 99,121 revisions. Our observations indicate that developers need to perform rebasing operations in an average of 75.35% of code reviews. In addition, our data suggests that an average of 34.21% of rebasing operations tend to tamper with the reviewing process. Finally, we propose a methodology to handle rebasing in empirical studies that employ code review data. We show how an empirical study that does not account for rebasing operations may report skewed, biased and inaccurate observations.
- Conference Article
35
- 10.1109/icsme.2017.40
- Sep 1, 2017
Code reviews are an important mechanism for assuring quality of source code changes. Reviewers can either add general comments pertaining to the entire change or pinpoint concerns or shortcomings about a specific part of the change using inline comments. Recent studies show that reviewers often do not understand the change being reviewed and its context.Our ultimate goal is to identify the factors that confuse code reviewers and understand how confusion impacts the efficiency and effectiveness of code review(er)s. As the first step towards this goal we focus on the identification of confusion in developers' comments. Based on an existing theoretical framework categorizing expressions of confusion, we manually classify 800 comments from code reviews of the Android project. We observe that confusion can be reasonably well-identified by humans: raters achieve moderate agreement (Fleiss' kappa 0.59 for the general comments and 0.49 for the inline ones). Then, for each kind of comment we build a series of automatic classifiers that, depending on the goals of the further analysis, can be trained to achieve high precision (0.875 for the general comments and 0.615 for the inline ones), high recall (0.944 for the general comments and 0.988 for the inline ones), or substantial precision and recall (0.696 and 0.542 for the general comments and 0.434 and 0.583 for the inline ones, respectively). These results motivate further research on the impact of confusion on the code review process. Moreover, other researchers can employ the proposed classifiers to analyze confusion in other contexts where software development-related discussions occur, such as mailing lists.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.