PatchScope: LLM-Enhanced Fine-Grained Stable Patch Classification for Linux Kernel
Stable patch classification plays a crucial role in vulnerability management for the Linux kernel, significantly contributing to the stability and security of Long-term support(LTS) versions. Although existing tools have effectively assisted in assessing whether patches should be merged into stable versions, they cannot determine which stable patches should be merged into which LTS versions. This process still requires the maintainers of the distribution community to manually screen based on the requirements of their respective versions.To address this issue, we propose PatchScope, which is designed to predict the specific merge status of patches.Patchscope consists of two components: patch analysis and patch classification.Patch analysis leverages Large Language Models(LLMs) to generate detailed patch descriptions from the commit message and code changes, thereby deepening the model's semantic understanding of patches. Patch classification utilizes a pre-trained language model to extract semantic features of the patches and employs a two-stage classifier to predict the merge status of the patches.The model is optimized using the dynamic weighted loss function to handle data imbalance and improve overall performance.Given that the primary focus is maintaining Linux kernel versions 5.10 and 6.6, we have conducted comparative experiments based on these two versions. Experimental results demonstrate that Patchscope can effectively predict the merge status of patches.
- Research Article
272
- 10.1038/s42256-022-00458-8
- Mar 1, 2022
- Nature Machine Intelligence
Artificial writing is permeating our lives due to recent advances in large-scale, transformer-based language models (LMs) such as BERT, GPT-2 and GPT-3. Using them as pre-trained models and fine-tuning them for specific tasks, researchers have extended the state of the art for many natural language processing tasks and shown that they capture not only linguistic knowledge but also retain general knowledge implicitly present in the data. Unfortunately, LMs trained on unfiltered text corpora suffer from degenerated and biased behaviour. While this is well established, we show here that recent LMs also contain human-like biases of what is right and wrong to do, reflecting existing ethical and moral norms of society. We show that these norms can be captured geometrically by a ‘moral direction’ which can be computed, for example, by a PCA, in the embedding space. The computed ‘moral direction’ can rate the normativity (or non-normativity) of arbitrary phrases without explicitly training the LM for this task, reflecting social norms well. We demonstrate that computing the ’moral direction’ can provide a path for attenuating or even preventing toxic degeneration in LMs, showcasing this capability on the RealToxicityPrompts testbed. Large language models identify patterns in the relations between words and capture their relations in an embedding space. Schramowski and colleagues show that a direction in this space can be identified that separates ‘right’ and ‘wrong’ actions as judged by human survey participants.
- Research Article
36
- 10.1109/tse.2019.2952614
- Dec 18, 2019
- IEEE Transactions on Software Engineering
Linux kernel stable versions serve the needs of users who value stability of the kernel over new features. The quality of such stable versions depends on the initiative of kernel developers and maintainers to propagate bug fixing patches to the stable versions. Thus, it is desirable to consider to what extent this process can be automated. A previous approach relies on words from commit messages and a small set of manually constructed code features. This approach, however, shows only moderate accuracy. In this paper, we investigate whether deep learning can provide a more accurate solution. We propose PatchNet, a hierarchical deep learning-based approach capable of automatically extracting features from commit messages and commit code and using them to identify stable patches. PatchNet contains a deep hierarchical structure that mirrors the hierarchical and sequential structure of commit code, making it distinctive from the existing deep learning models on source code. Experiments on 82,403 recent Linux patches confirm the superiority of PatchNet against various state-of-the-art baselines, including the one recently-adopted by Linux kernel maintainers.
- Conference Article
10
- 10.18653/v1/2024.findings-acl.787
- Jan 1, 2024
Advances in machine learning have made it possible to perform various text and speech processing tasks, such as automatic speech recognition (ASR), in an end-to-end (E2E) manner.E2E approaches utilizing pre-trained models are gaining attention for conserving training data and resources.However, most of their applications in ASR involve only one of either a pre-trained speech or a language model.This paper proposes integrating a pre-trained speech representation model and a large language model (LLM) for E2E ASR.The proposed model enables the optimization of the entire ASR process, including acoustic feature extraction and acoustic and language modeling, by combining pre-trained models with a bridge network and also enables the application of remarkable developments in LLM utilization, such as parameter-efficient domain adaptation and inference optimization.Experimental results demonstrate that the proposed model achieves a performance comparable to that of modern E2E ASR models by utilizing powerful pre-training models with the proposed integrated approach.
- Research Article
2
- 10.3390/e27040379
- Apr 2, 2025
- Entropy (Basel, Switzerland)
Natural Language Processing (NLP) stands as a forefront of artificial intelligence research, empowering computational systems to comprehend and process human language as used in everyday contexts. Language models (LMs) underpin this field, striving to capture the intricacies of linguistic structure and semantics by assigning probabilities to sequences of words. The trend towards large language models (LLMs) has shown significant performance improvements with increasing model size. However, the deployment of LLMs on resource-limited devices such as mobile and edge devices remains a challenge. This issue is particularly pronounced in languages other than English, including Korean, where pre-trained models are relatively scarce. Addressing this gap, we introduce a novel lightweight pre-trained Korean language model that leverages knowledge distillation and low-rank factorization techniques. Our approach distills knowledge from a 432 MB (approximately 110 M parameters) teacher model into student models of substantially reduced sizes (e.g., 53 MB ≈ 14 M parameters, 35 MB ≈ 13 M parameters, 30 MB ≈ 11 M parameters, and 18 MB ≈ 4 M parameters). The smaller student models further employ low-rank factorization to minimize the parameter count within the Transformer's feed-forward network (FFN) and embedding layer. We evaluate the efficacy of our lightweight model across six established Korean NLP tasks. Notably, our most compact model, KR-ELECTRA-Small-KD, attains over 97.387% of the teacher model's performance despite an 8.15× reduction in size. Remarkably, on the NSMC sentiment classification benchmark, KR-ELECTRA-Small-KD surpasses the teacher model with an accuracy of 89.720%. These findings underscore the potential of our model as an efficient solution for NLP applications in resource-constrained settings.
- Research Article
14
- 10.1145/1131322.1131325
- Apr 1, 2006
- ACM SIGOPS Operating Systems Review
The open source concept is a phenomenon of the past two decades in the computer development world. One of the most important characteristics of this concept is the Linux operating system that was started in the early 90's by Linus Torvalds. The complexity of the code creates a challenging environment requiring highly skilled volunteers.This case study is part of a larger PhD research which deals with the evolution of the open source movement and Linux operating system, that claims to track and examine the kernel source code evolution over more than a decade by reviewing 534 different Linux kernel versions in various parameters such as the growth of source code lines, number of participants, size of the kernel etc. and analyzing the progress of the findings along the time axis. A major part of this research is a study of the Kernel code evolution beginning with the first stable version Furthermore, this study compares the stable kernel versions (140) and the unstable kernel or development versions (394). This reveals an interesting preference to the unstable kernel versions in variety of growth data, such as average additional source code lines, kernel size etc.
- Book Chapter
5
- 10.1007/978-3-030-86340-1_43
- Jan 1, 2021
Recently, pre-trained language models achieve extraordinary performance on numerous benchmarks. By learning the general language knowledge from a large pre-train corpus, the language models could fit for a specific downstream task with a relatively small amount of labeled training data in the fine-tuning stage. More remarkably, the GPT-3 with 175 B parameters performs well in specific tasks by leveraging natural-language prompts and few demonstrations of the task. Inspired by the success of GPT-3, we desire to know whether smaller language models could still have a similarly few-shot learning ability. Unlike the various delicately designed tasks in previous few-shot learning research works, we do it more practically. We present a question-answering-based method to help the language model better understand the text classification task by concatenating a label-related question to each candidate sentence. By leveraging the label-related language knowledge, which the language model has learned during the pre-trained stage, our QA model can outperform the traditional binary and multi-class classification approaches over both English and Chinese datasets. Afterward, we test our QA model by performing few-shot learning experiments on multiple pre-trained language models of different sizes that range from the DistilBERT to the RoBERTa-large. We are surprised to find that even the DistilBERT, which is the smallest language model we tested with only 66 M parameters, still holds undeniable few-shot learning ability. Moreover, the RoBERTa-large with 355 M parameter could achieve a remarkable high accuracy rate of 92.18% with only 100 labeled training data. This result gives people a practical guideline that when a new category of labeled data is needed, only as few as 100 data need to be labeled. Then cooperate with an appropriate pre-training model and classification algorithm, reliable classification results can be obtained. Even without any labeled training data, that is, under the zero-shot learning setup, the RoBERTa-large still achieves a solid accuracy rate of 84.84%. Our code is available at https://github.com/ZhangYunchenY/BetterFs.
- Conference Article
4
- 10.1145/3691620.3694999
- Oct 27, 2024
Recent studies indicate that traditional techniques for understanding code changes are not as effective as techniques that directly prompt language models (LMs). However, current LM-based techniques heavily rely on expensive, large LMs (LLMs) such as GPT-4 and Llama-13b, which are either commercial or prohibitively costly to deploy on a wide scale, thereby restricting their practical applicability. This paper explores the feasibility of deploying small LMs (SLMs) while maintaining comparable or superior performance to LLMs in code change understanding. To achieve this, we created a small yet high-quality dataset called HQCM which was meticulously reviewed, revised, and validated by five human experts. We fine-tuned state-of-the-art 7b and 220m SLMs using HQCM and compared them with traditional techniques and LLMs with ≥70b parameters. Our evaluation confirmed HQCM's benefits and demonstrated that SLMs, after finetuning by HQCM, can achieve superior performance in three change understanding tasks: change summarization, change classification, and code refinement. This study supports the use of SLMs in environments with security, computational, and financial constraints, such as in industry scenarios and on edge devices, distinguishing our work from the others.
- Conference Article
63
- 10.1145/3442442.3451375
- Apr 19, 2021
Neural networks for language modeling have been proven effective on several sub-tasks of natural language processing. Training deep language models, however, is time-consuming and computationally intensive. Pre-trained language models such as BERT are thus appealing since (1) they yielded state-of-the-art performance, and (2) they offload practitioners from the burden of preparing the adequate resources (time, hardware, and data) to train models. Nevertheless, because pre-trained models are generic, they may underperform on specific domains. In this study, we investigate the case of multi-class text classification, a task that is relatively less studied in the literature evaluating pre-trained language models. Our work is further placed under the industrial settings of the financial domain. We thus leverage generic benchmark datasets from the literature and two proprietary datasets from our partners in the financial technological industry. After highlighting a challenge for generic pre-trained models (BERT, DistilBERT, RoBERTa, XLNet, XLM) to classify a portion of the financial document dataset, we investigate the intuition that a specialized pre-trained model for financial documents, such as FinBERT, should be leveraged. Nevertheless, our experiments show that the FinBERT model, even with an adapted vocabulary, does not lead to improvements compared to the generic BERT models.
- Video Transcripts
- 10.48448/c5m9-6p30
- May 11, 2022
- Underline Science Inc.
Human-like biases and undesired social stereotypes exist in large pretrained language models. Given the wide adoption of these models in real-world applications, mitigating such biases has become an emerging and important task. In this paper, we propose an automatic method to mitigate the biases in pretrained language models. Different from previous debiasing work that uses external corpora to fine-tune the pretrained models, we instead directly probe the biases encoded in pretrained models through prompts. Specifically, we propose a variant of the beam search method to automatically search for biased prompts such that the cloze-style completions are the most different with respect to different demographic groups. Given the identified biased prompts, we then propose a distribution alignment loss to mitigate the biases. Experiment results on standard datasets and metrics show that our proposed Auto-Debias approach can significantly reduce biases, including gender and racial bias, in pretrained language models such as BERT, RoBERTa and ALBERT. Moreover, the improvement in fairness does not decrease the language models' understanding abilities, as shown using the GLUE benchmark.
- Conference Article
114
- 10.18653/v1/2022.acl-long.72
- Jan 1, 2022
Human-like biases and undesired social stereotypes exist in large pretrained language models. Given the wide adoption of these models in real-world applications, mitigating such biases has become an emerging and important task. In this paper, we propose an automatic method to mitigate the biases in pretrained language models. Different from previous debiasing work that uses external corpora to fine-tune the pretrained models, we instead directly probe the biases encoded in pretrained models through prompts. Specifically, we propose a variant of the beam search method to automatically search for biased prompts such that the cloze-style completions are the most different with respect to different demographic groups. Given the identified biased prompts, we then propose a distribution alignment loss to mitigate the biases. Experiment results on standard datasets and metrics show that our proposed Auto-Debias approach can significantly reduce biases, including gender and racial bias, in pretrained language models such as BERT, RoBERTa and ALBERT. Moreover, the improvement in fairness does not decrease the language models' understanding abilities, as shown using the GLUE benchmark.
- Conference Article
7
- 10.1109/saner56733.2023.00038
- Mar 1, 2023
Code changes are at the very core of software development and maintenance. Deep learning techniques have been used to build a model from a massive number of code changes to solve software engineering tasks, e.g., commit message generation and bug-fix commit identification. However, existing code change representation learning approaches represent code change as lexical tokens or syntactical AST (abstract syntax tree) paths, limiting the capability to learn semantics of code changes. Besides, they mostly do not consider noisy or tangled code change, hurting the accuracy of solved tasks. To address the above problems, we first propose a slice-based code change representation approach which considers data and control dependencies between changed code and unchanged code. Then, we propose a pre-trained sparse Transformer model, named CCS2VEC, to learn code change representations with three pre-training tasks. Our experiments by fine-tuning our pre-trained model on three downstream tasks have demonstrated the improvement of CCS2VEC over the state-of-the-art CC2VEC.
- Research Article
2
- 10.1109/access.2025.3533554
- Jan 1, 2025
- IEEE Access
Quantum-inspired language models model finer-grained semantic interactions in higher-order Hilbert spaces. However, previous methods usually capture semantic features based on context-free word vectors such as Word2Vec and GloVe. Building on natural language encoding, incorporating quantum-inspired density matrix modeling can capture more fine-grained semantic interactions. However, when applied to large pre-trained language models like BERT, using quantum density matrices often leads to issues such as gradient explosion or vanishing. Therefore, how to effectively integrate the quantum-inspired language model and the pre-trained model, and make them function under the fine-tuning paradigm of the pre-trained model has become a key issue for the further development of the quantum-inspired language model. Therefore, in this paper, we propose the BERT-Residual quantum language model inspired by the multi-step method of ordinary differential equations (ODE), using the density matrix to capture the semantic high-order interaction features missing in the BERT modeling process, and obtain the sentence representation, and perform the first step Residuals. Then quantum measurement is performed on the sentence representation, and the second step of residual connection is performed with the BERT layer. This residual connection method based on the multi-step method can more effectively combine the advantages of BERT representation and quantum density matrix representation to enhance representation learning. Experiments show that in text classification benchmarks, our proposed method generally surpasses baseline models.
- Research Article
- 10.32362/2500-316x-2025-13-3-21-43
- Jun 5, 2025
- Russian Technological Journal
Objectives. Despite the recent success of large language models, which are now capable of solving a wide range of tasks, a number of practical issues remain unsolved. For example, users of systems providing question answering (QA) services may experience a lack of commonsense knowledge and reasoning proficiency. The present work considers knowledge injection methods as a means of providing functional enhancements to large language models by providing necessary facts and patterns from external sources.Methods. Knowledge injection methods leveraged in relevant QA systems are classified, analyzed, and compared. Self-supervised learning, fine-tuning, attention mechanism and interaction tokens for supporting information injection are considered along with auxiliary approaches for emphasizing the most relevant facts.Results. The reviewed QA systems explicitly show the accuracy increase on the CommonsenseQA benchmark compared to pretrained language model baseline due to knowledge injection methods exploitation. At the same time, in general the higher results are related to knowledge injection methods based on language models and attention mechanism.Conclusions. The presented systematic review of existing external knowledge injection methods for QA systems confirms the continuing validity of this research direction. Such methods are not only capable of increasing the accuracy of QA systems but also mitigating issues with interpretability and factual obsolescence in pretrained models. Further investigations will be carried out to improve and optimize different aspects of the current approaches and develop conceptually novel ideas.
- Book Chapter
14
- 10.1007/978-981-19-4453-6_2
- Jan 1, 2022
The remarkable progress in Natural Language Processing (NLP) brought about by deep learning, particularly with the recent advent of large pre-trained neural language models, is brought into scrutiny as several studies began to discuss and report potential biases in NLP applications. Bias in NLP is found to originate from latent historical biases encoded by humans into textual data which gets perpetuated or even amplified by NLP algorithm. We present a survey to comprehend bias in large pre-trained language models, analyze the stages at which they occur in these models, and various ways in which these biases could be quantified and mitigated. Considering wide applicability of textual affective computing based downstream tasks in real-world systems such as business, healthcare, education, etc., we give a special emphasis on investigating bias in the context of affect (emotion) i.e., Affective Bias, in large pre-trained language models. We present a summary of various bias evaluation corpora that help to aid future research and discuss challenges in the research on bias in pre-trained language models. We believe that our attempt to draw a comprehensive view of bias in pre-trained language models, and especially the exploration of affective bias will be highly beneficial to researchers interested in this evolving field.
- Dissertation
- 10.63028/10067/2090950151162165141
- Jan 1, 2024
As natural language-based technologies continue to develop and play a prominent role in society, increasing attention is being paid to the ethical issues constraining their use, with bias being a prominent concern. There is a growing body of evidence highlighting biases, such as gender bias, within natural language models. Although considerable work has been done to understand and address this issue, significant challenges remain regarding how to detect, measure, and effectively mitigate bias. This thesis addresses two main themes. The first part primarily involves empirical investigations of various existing techniques and approaches for detecting, measuring, and mitigating bias in natural language processing (NLP). The second part focuses on developing solutions to mitigate bias in language-based technologies and human-generated biases. We first investigate existing techniques to measure bias in natural language models. Specifically, we review the literature on fairness metrics for pre-trained language models and empirically evaluate their consistency and compatibility. We investigate how various factors, such as templates, attribute and target seeds, and the choice of embeddings used by existing techniques, affect how bias is quantified. Secondly, we investigate the relationship between bias in pretrained language models and fine-tuned language models for downstream applications. We design a probe to investigate the effects that some of the major intrinsic gender bias mitigation strategies have on downstream text classification tasks. We discover the propensity for some intrinsic bias mitigation techniques to hide bias instead of resolving it and show inconsistencies in how bias measuring techniques measure bias with respect to certain mitigation techniques. We also find that bias inherent in a pretrained model has little material effect on downstream fairness. Thirdly, we develop an automated approach to generating parallel data for training counterfactual text generator models for counterfactual data augmentation (CDA) that limits the need for human intervention. Although CDA has been a widely used mitigation strategy in NLP, existing works have significant issues, which we also highlight in this thesis. Finally, we propose a text style transfer technique to automatically mitigate bias in textual data. Our text-style transfer model can be trained on non-parallel data. We demonstrate that our approach overcomes the limitations of many existing text style transfer techniques.