Replication Package of Augmenting Commit Classification by using Fine-Grained Source Code Changes and a Pre-trained Deep Neural Language Model
Replication Package of Augmenting Commit Classification by using Fine-Grained SourceCode Changes and a Pre-trained Deep Neural Language Model
- Research Article
43
- 10.1016/j.infsof.2021.106566
- Mar 10, 2021
- Information and Software Technology
Augmenting commit classification by using fine-grained source code changes and a pre-trained deep neural language model
- Supplementary Content
2
- 10.5167/uzh-61703
- Jan 1, 2012
- Zurich Open Repository and Archive (University of Zurich)
Software development and, in particular, software maintenance are time consuming and require detailed knowledge of the structure and the past development activities of a software system. Limited resources and time constraints make the situation even more difficult. Therefore, a significant amount of research effort has been dedicated to learning software prediction models that allow project members to allocate and spend the limited resources efficiently on the (most) critical parts of their software system. Prominent examples are bug prediction models and change prediction models: Bug prediction models identify the bug-prone modules of a software system that should be tested with care; change prediction models identify modules that change frequently and in combination with other modules, i.e., they are change coupled. By combining statistical methods, data mining approaches, and machine learning techniques software prediction models provide a structured and analytical basis to make decisions.Researchers proposed a wide range of approaches to build effective prediction models that take into account multiple aspects of the software development process. They achieved especially good prediction performance, guiding developers towards those parts of their system where a large share of bugs can be expected. For that, they rely on change data provided by version control systems (VCS). However, due to the fact that current VCS track code changes only on file-level and textual basis most of those approaches suffer from coarse-grained and rather generic change information. More fine-grained change information, for instance, at the level of source code statements, and the type of changes, e.g., whether a method was renamed or a condition expression was changed, are often not taken into account. Therefore, investigating the development process and the evolution of software at a fine-grained change level has recently experienced an increasing attention in research.The key contribution of this thesis is to improve software prediction models by using fine-grained source code changes. Those changes are based on the abstract syntax tree structure of source code and allow us to track code changes at the fine-grained level of individual statements. We show with a series of empirical studies using the change history of open-source projects how prediction models can benefit in terms of prediction performance and prediction granularity from the more detailed change information.First, we compare fine-grained source code changes and code churn, i.e., lines modified, for bug prediction. The results with data from the Eclipse platform show that fine grained-source code changes significantly outperform code churn when classifying source files into bug- and not bug-prone, as well as when predicting the number of bugs in source files. Moreover, these results give more insights about the relation of individual types of code changes, e.g., method declaration changes and bugs. For instance, in our dataset method declaration changes exhibit a stronger correlation with the number of bugs than class declaration changes.Second, we leverage fine-grained source code changes to predict bugs at method-level. This is beneficial as files can grow arbitrarily large. Hence, if bugs are predicted at the level of files a developer needs to manually inspect all methods of a file one by one until a particular bug is located.Third, we build models using source code properties, e.g., complexity, to predict whether a source file will be affected by a certain type of code change. Predicting the type of changes is of practical interest, for instance, in the context of software testing as different change types require different levels of testing: While for small statement changes local unit-tests are mostly sufficient, API changes, e.g., method declaration changes, might require system-wide integration-tests which are more expensive. Hence, knowing (in advance) which types of changes will most likely occur in a source file can help to better plan and develop tests, and, in case of limited resources, prioritize among different types of testing.Finally, to assist developers in bug triaging we compute prediction models based on the attributes of a bug report that can be used to estimate whether a bug will be fixed fast or whether it will take more time for resolution.The results and findings of this thesis give evidence that fine-grained source code changes can improve software prediction models to provide more accurate results.
- Conference Article
109
- 10.1145/1985441.1985456
- May 21, 2011
A significant amount of research effort has been dedicated to learning prediction models that allow project managers to efficiently allocate resources to those parts of a software system that most likely are bug-prone and therefore critical. Prominent measures for building bug prediction models are product measures, e.g., complexity or process measures, such as code churn. Code churn in terms of lines modified (LM) and past changes turned out to be significant indicators of bugs. However, these measures are rather imprecise and do not reflect all the detailed changes of particular source code entities during maintenance activities. In this paper, we explore the advantage of using fine-grained source code changes (SCC) for bug prediction. SCC captures the exact code changes and their semantics down to statement level. We present a series of experiments using different machine learning algorithms with a dataset from the Eclipse platform to empirically evaluate the performance of SCC and LM. The results show that SCC outperforms LM for learning bug prediction models.
- Conference Article
1
- 10.1109/icsme.2019.00064
- Sep 1, 2019
In the era of Big Code, when researchers seek to study an increasingly large number of repositories to support their findings, the data processing stage may require manipulating millions and more of records. In this work we focus on studies involving fine-grained AST level source code changes. We present how we extended the CodeDistillery source code mining framework with data manipulation capabilities, aimed to alleviate the processing of large datasets of fine grained source code changes. The capabilities we have introduced allow researchers to highly automate their repository mining process and streamline the data acquisition and processing phases. These capabilities have been successfully used to conduct a number of studies, in the course of which dozens of millions of fine-grained source code changes have been processed.
- Conference Article
74
- 10.1109/msr.2012.6224284
- Jun 1, 2012
There exist many approaches that help in pointing developers to the change-prone parts of a software system. Although beneficial, they mostly fall short in providing details of these changes. Fine-grained source code changes (SCC) capture such detailed code changes and their semantics on the statement level. These SCC can be condition changes, interface modifications, inserts or deletions of methods and attributes, or other kinds of statement changes. In this paper, we explore prediction models for whether a source file will be affected by a certain type of SCC. These predictions are computed on the static source code dependency graph and use social network centrality measures and object-oriented metrics. For that, we use change data of the Eclipse platform and the Azureus 3 project. The results show that Neural Network models can predict categories of SCC types. Furthermore, our models can output a list of the potentially change-prone files ranked according to their change-proneness, overall and per change type category.
- Conference Article
53
- 10.5555/2664446.2664480
- Jun 2, 2012
There exist many approaches that help in pointing developers to the change-prone parts of a software system. Although beneficial, they mostly fall short in providing details of these changes. Fine-grained source code changes (SCC) capture such detailed code changes and their semantics on the statement level. These SCC can be condition changes, interface modifications, inserts or deletions of methods and attributes, or other kinds of statement changes. In this paper, we explore prediction models for whether a source file will be affected by a certain type of SCC. These predictions are computed on the static source code dependency graph and use social network centrality measures and object-oriented metrics. For that, we use change data of the Eclipse platform and the Azureus 3 project. The results show that Neural Network models can predict categories of SCC types. Furthermore, our models can output a list of the potentially change-prone files ranked according to their change-proneness, overall and per change type category.
- Research Article
5
- 10.1007/s10579-020-09525-1
- Feb 28, 2021
- Language Resources and Evaluation
Sense representations have gone beyond word representations like Word2Vec, GloVe and FastText and achieved innovative performance on a wide range of natural language processing tasks. Although very useful in many applications, the traditional approaches for generating word embeddings have a strict drawback: they produce a single vector representation for a given word ignoring the fact that ambiguous words can assume different meanings. In this paper, we explore unsupervised sense representations which, different from traditional word embeddings, are able to induce different senses of a word by analyzing its contextual semantics in a text. The unsupervised sense representations investigated in this paper are: sense embeddings and deep neural language models. We present the first experiments carried out for generating sense embeddings for Portuguese. Our experiments show that the sense embedding model (Sense2vec) outperformed traditional word embeddings in syntactic and semantic analogies task, proving that the language resource generated here can improve the performance of NLP tasks in Portuguese. We also evaluated the performance of pre-trained deep neural language models (ELMo and BERT) in two transfer learning approaches: feature based and fine-tuning, in the semantic textual similarity task. Our experiments indicate that the fine tuned Multilingual and Portuguese BERT language models were able to achieve better accuracy than the ELMo model and baselines.
- Research Article
26
- 10.1109/access.2020.3023421
- Jan 1, 2020
- IEEE Access
Massive textual content has enabled rapid advances in natural language modeling. The use of pre-trained deep neural language models has significantly improved natural language understanding tasks. However, the extent to which these systems can be applied to content generation is unclear. While a few informal studies have claimed that these models can generate ‘high quality’ readable content, there is no prior study on analyzing the generated content from these models based on sampling and fine-tuning hyperparameters. We conduct an in-depth comparison of several language models for open-ended story generation from given prompts. Using a diverse set of automated metrics, we compare the performance of transformer-based generative models – OpenAI’s GPT2 (pre-trained and fine-tuned) and Google’s pre-trained Transformer-XL and XLNet to human-written textual references. Studying inter-metric correlation along with metric ranking reveals interesting insights – the high correlation between the readability scores and word usage in the text. A study of the statistical significance and empirical evaluations between the scores (human and machine-generated) at higher sampling hyperparameter combinations ( $t=\{0.75, 1.0\}$ , $k=\{100, 150, 250\}$ ) reveal that the top pre-trained and fine-tuned models generated samples condition well on the prompt with an increased occurrence of unique and difficult words. The GPT2-medium model fine-tuned on the 1024 Byte-pair Encoding (BPE) tokenized version of the dataset along with pre-trained Transformer-XL models generated samples close to human written content on three metrics: prompt-based overlap, coherence, and variation in sentence length. A study of overall model stability and performance shows that fine-tuned GPT2 language models have the least deviation in metric scores from human performance.
- Research Article
2
- 10.1145/3468744.3468751
- Jul 14, 2021
- ACM SIGSOFT Software Engineering Notes
More than two decades ago, researchers started to mine the data stored in software repositories to help software developers in making informed decisions for developing and testing software systems. Bug prediction was one of the most promising and popular research directions that uses the data stored in software repositories to predict the bug-proneness or number of bugs in source files. On that topic and as part of Emanuel's PhD studies, we submitted a paper with the title Comparing fine-grained source code changes and code churn for bug prediction [8] to the 8th Working Conference on Mining Software Engineering, held 2011 in beautiful Honolulu, Hawaii. Ten years later, it got selected as one of the finalists to receive the MSR 2021 Most Influential Paper Award. In the following, we provide a retrospective on our work, describing the road to publishing this paper, its impact in the field of bug prediction, and the road ahead.
- Supplementary Content
1
- 10.5167/uzh-16421
- Nov 26, 2008
- Zurich Open Repository and Archive (University of Zurich)
Software systems have to evolve over their life-cycle or they become progressively less useful. The reasons of why software is continuously changed are manifold: Features are added or adapted because of changing requirements; bugs have to be fixed because of faults in the software; or the software has to be migrated because of modernization. One negative effect of the continuing change is the software aging phenomenon. As software is changed from people unaware of the initial design concepts and, mostly, under time-pressure software becomes larger, more complex, and less understandable. As a result, in the last decade, several techniques have been developed to understand the negative impact of continuing change by analyzing change in general and source code change in particular. The approaches developed so far suffer from the coarse-grained information available for changes. They rely on data provided by versioning systems, which keep track of changes by storing the text differences of a particular file. Changes at the level of source code entities are not considered. In addition, a precise definition and a classification of source code changes are still missing. Both are key to extract and analyze source code changes, and eventually understand the negative impact of continuing change. We therefore claim: Extracting, classifying, and analyzing finegrained source code changes from the history of software systems provide useful insights into problems of continuing change and can identify support mechanisms to reduce them. The key contribution of this dissertation is change distilling, a methodology to define, classify, extract, and analyze fine-grained source code changes. Change distilling provides a taxonomy of source code changes which defines source code change types according to tree edit operations in the abstract syntax tree. Our change distilling algorithm applies tree differencing pairwise on subsequent versions of abstract syntax trees to extract the tree edit operations. We provide three empirical experiments to show the benefits of extracting finegrained source code change types. First, we analyze the source code and comment co-change behavior in the evolution of eight software systems. We show that in cases where comments are adapted to source code changes, the related changes happen in the same revision. We also show that in half of these software systems API comments are adapted several revisions after the source code change happened. Second, we explore whether certain change types appear frequently together. For that we use hierarchical agglomerative clustering to discover change type patterns and present a catalogue of change type patterns. The results from a commercial software system show that certain control flow changes are due to source code cleanup activities, that exception flow is used differently in different system parts, and that API convention changes are spread over many releases. Third, we investigate whether methods exist whose invocations are significantly more affected by context and update changes than other methods, and whether we can reveal change patterns among these invocation changes. We develop an approach that ranks how often context and update changes were applied to invocations of a particular method and whether these changes were bug fixes. In addition, we extract patterns of context and update changes to assess whether they can be used to provide valuable change suggestions. The results of our three software evolution experiments provide enough evidence that the analysis of change types helps in understanding software evolution and provides means to support developers in their daily work.
- Conference Article
4
- 10.1145/2554850.2559925
- Mar 24, 2014
Source code change analysis in the project history is one of main issues in mining software repositories, which incorporates change prediction, API evolution and refactoring, etc. In the previous studies, fine-grained source code changes, that is, code level changes, were used to find source code change patterns. In this paper, we closely investigate IMPORT change type, which has not been unnoticed. At first, we performed modifying the existing change extraction tool, change distiller [3], to extract IMPORT change history. And then, we extracted commit history data from project repository of eclipse CDT and IDT. Change types for each change in commit history data have been determined using change types in [3] and IMPORT change types defined in this work. Finally, we analyzed the effect of IMPORT change using the frequency of each change type occurred in the commit history. Experimental result shows that the IMPORT change meaningfully affects other changes and it would be better to consider IMPORT change types in change analysis work.
- Conference Article
4
- 10.18653/v1/2021.deelio-1.9
- Jan 1, 2021
Contextual word representation models have shown massive improvements on a multitude of NLP tasks, yet their word sense disambiguation capabilities remain poorly explained. To address this gap, we assess whether contextual word representations extracted from deep pretrained language models create distinguishable representations for different senses of a given word. We analyze the representation geometry and find that most layers of deep pretrained language models create highly anisotropic representations, pointing towards the existence of representation degeneration problem in contextual word representations. After accounting for anisotropy, our study further reveals that there is variability in sense learning capabilities across different language models. Finally, we propose LASeR, a ‘Low Anisotropy Sense Retrofitting’ approach that renders off-the-shelf representations isotropic and semantically more meaningful, resolving the representation degeneration problem as a post-processing step, and conducting sense-enrichment of contextualized representations extracted from deep neural language models.
- Conference Article
19
- 10.18653/v1/2020.wnut-1.40
- Jan 1, 2020
Recent improvements in machine-reading technologies attracted much attention to automation problems and their possibilities. In this context, WNUT 2020 introduces a Name Entity Recognition (NER) task based on wet laboratory procedures. In this paper, we present a 3-step method based on deep neural language models that reported the best overall exact match F 1 -score (77.99%) of the competition. By fine-tuning 10 times, 10 different pretrained language models, this work shows the advantage of having more models in an ensemble based on a majority of votes strategy. On top of that, having 100 different models allowed us to analyse the combinations of ensemble that demonstrated the impact of having multiple pretrained models versus fine-tuning a pretrained model multiple times.
- Research Article
2
- 10.1038/s42004-025-01540-z
- May 6, 2025
- Communications Chemistry
Collision cross section (CCS) of peptide ions provides an important separation dimension in liquid chromatography/tandem mass spectrometry-based proteomics that incorporates ion mobility spectrometry (IMS), and its accurate prediction is the basis for advanced proteomics workflows. This paper describes experimental data and a prediction model for challenging CCS prediction tasks including longer peptides that tend to have higher charge states. The proposed model is based on a pretrained deep protein language model. While the conventional prediction model requires training from scratch, the proposed model enables training with less amount of time owing to the use of the pretrained model as a feature extractor. Results of experiments with the novel experimental data show that the proposed model succeeds in drastically reducing the training time while maintaining the same or even better prediction performance compared with the conventional method. Our approach presents the possibility of prediction on the basis of “greener” manner training of various peptide properties in proteomic liquid chromatography/tandem mass spectrometry experiments.
- Research Article
31
- 10.1523/jneurosci.1163-22.2023
- May 22, 2023
- The Journal of Neuroscience
A sentence is more than the sum of its words: its meaning depends on how they combine with one another. The brain mechanisms underlying such semantic composition remain poorly understood. To shed light on the neural vector code underlying semantic composition, we introduce two hypotheses: (1) the intrinsic dimensionality of the space of neural representations should increase as a sentence unfolds, paralleling the growing complexity of its semantic representation; and (2) this progressive integration should be reflected in ramping and sentence-final signals. To test these predictions, we designed a dataset of closely matched normal and jabberwocky sentences (composed of meaningless pseudo words) and displayed them to deep language models and to 11 human participants (5 men and 6 women) monitored with simultaneous MEG and intracranial EEG. In both deep language models and electrophysiological data, we found that representational dimensionality was higher for meaningful sentences than jabberwocky. Furthermore, multivariate decoding of normal versus jabberwocky confirmed three dynamic patterns: (1) a phasic pattern following each word, peaking in temporal and parietal areas; (2) a ramping pattern, characteristic of bilateral inferior and middle frontal gyri; and (3) a sentence-final pattern in left superior frontal gyrus and right orbitofrontal cortex. These results provide a first glimpse into the neural geometry of semantic integration and constrain the search for a neural code of linguistic composition.SIGNIFICANCE STATEMENT Starting from general linguistic concepts, we make two sets of predictions in neural signals evoked by reading multiword sentences. First, the intrinsic dimensionality of the representation should grow with additional meaningful words. Second, the neural dynamics should exhibit signatures of encoding, maintaining, and resolving semantic composition. We successfully validated these hypotheses in deep neural language models, artificial neural networks trained on text and performing very well on many natural language processing tasks. Then, using a unique combination of MEG and intracranial electrodes, we recorded high-resolution brain data from human participants while they read a controlled set of sentences. Time-resolved dimensionality analysis showed increasing dimensionality with meaning, and multivariate decoding allowed us to isolate the three dynamical patterns we had hypothesized.