Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

The rapid growth of model scale has necessitated substantial computational resources for fine-tuning. Existing approach such as low-rank adaptation (LoRA) has sought to address the problem of handling the large updated parameters in full fine-tuning (FT). However, LoRA utilize random initialization and optimization of low-rank matrices to approximate updated weights, which can result in suboptimal convergence and an accuracy gap compared to full fine-tuning (FT). To address these issues, we propose low-rank LDU (LoLDU), a parameter-efficient fine-tuning (PEFT) approach that significantly reduces trainable parameters by 2600 times compared to regular PEFT methods while maintaining comparable performance. LoLDU leverages lower-diag-upper (LDU) decomposition to initialize low-rank matrices for faster convergence and nonsingularity. We focus on optimizing the diagonal matrix for scaling transformations. To the best of our knowledge, LoLDU has the fewest parameters among all PEFT approaches. We conducted extensive experiments across 4 instruction-following datasets, six natural language understanding (NLU) datasets, eight image classification datasets, and image generation datasets with multiple model types [LLaMA2, RoBERTa, ViT, and stable diffusion (SD)], providing a comprehensive and detailed analysis. Our open-source code can be accessed at https://anonymous.4open.science/r/LoLDU-B5A6.

Similar Papers
  • Research Article
  • Cite Count Icon 4
  • 10.1360/n112018-00331
Protein function prediction based on zero-one matrix factorization
  • Sep 1, 2019
  • SCIENTIA SINICA Informationis
  • Jun Wang + 4 more

Accurately annotating the functions of proteins is one of the key tasks of functional genomics. A large portion of functional annotations of proteins is missing, and the functional label space is expansive. Moreover, label compression methods have been proposed and applied to predict protein function; however, such methods lack the interpretability of compressed labels and suffer from the inherent problem of thresholding labels in multi-label learning. To solve these problems, this paper proposes a protein function prediction method based on zero-one matrix factorization (ZOMF). ZOMF first factorizes the protein-function association matrix into two low-rank zero-one matrices and explores the inner latent relationship between proteins and labels. Subsequently, it defines two smoothness terms on these two low-rank matrices with respect to protein–protein interactions and the structural relationships between labels to guide the optimization of low-rank matrices. Finally, to predict protein function, it reconstructs the association matrix using the optimized two low-rank matrices. Experimental results on four model species (yeast, Arabidopsis, mouse, and human) show that ZOMF can predict protein functions more accurately than existing algorithms. However, it does not need to threshold the reconstructed matrix, and the compressed zero-one labels have more than one intuitive explanation.

  • Research Article
  • 10.1109/tmm.2026.3668691
Unleashing the Power of Singular Values for Parameter-Efficient Fine-Tuning of Large Pre-Trained Models
  • Jan 1, 2026
  • IEEE Transactions on Multimedia
  • Chengwei Sun + 7 more

Large pre-trained models (LPMs) have achieved remarkable success across natural language processing and computer vision tasks. However, fully fine-tuning these models for downstream adaptation incurs high memory costs, posing challenges in resource-constrained settings. Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, alleviate this by updating only a small subset of parameters. Despite their efficiency, these methods typically employ random initialization for low-rank matrices, which can lead to slower and less stable convergence during gradient descent, as well as diminished generalizability due to suboptimal starting points. In this paper, we present PiVot, a novel PEFT method that utilizes singular value decomposition (SVD) to initialize low-rank matrices, with critical singular values serving as trainable parameters. Specifically, PiVot performs SVD on the pre-trained weight matrix to obtain the best rank-<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$r$</tex-math></inline-formula> approximation, focusing on the top-<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$r$</tex-math></inline-formula> singular values that capture over 99% of the matrix's structural information. By treating these top-<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$r$</tex-math></inline-formula> singular values as trainable parameters, PiVot effectively scales the fundamental subspaces of the pre-trained weight matrix, enabling efficient and targeted adaptation to new domains. Extensive experiments across various LPMs demonstrate that PiVot achieves superior performance compared to LoRA on tasks such as natural language understanding, text-to-image generation, and image classification while requiring 16 times fewer trainable parameters.

  • Research Article
  • Cite Count Icon 16
  • 10.1016/j.eswa.2021.115974
Novel regularization method for the class imbalance problem
  • Oct 2, 2021
  • Expert Systems with Applications
  • Bosung Kim + 2 more

Novel regularization method for the class imbalance problem

  • Research Article
  • Cite Count Icon 326
  • 10.1109/tpami.2024.3447085
A Survey on Deep Neural Network Pruning: Taxonomy, Comparison, Analysis, and Recommendations.
  • Dec 1, 2024
  • IEEE transactions on pattern analysis and machine intelligence
  • Hongrong Cheng + 2 more

Modern deep neural networks, particularly recent large language models, come with massive model sizes that require significant computational and storage resources. To enable the deployment of modern models on resource-constrained environments and to accelerate inference time, researchers have increasingly explored pruning techniques as a popular research direction in neural network compression. More than three thousand pruning papers have been published from 2020 to 2024. However, there is a dearth of up-to-date comprehensive review papers on pruning. To address this issue, in this survey, we provide a comprehensive review of existing research works on deep neural network pruning in a taxonomy of 1) universal/specific speedup, 2) when to prune, 3) how to prune, and 4) fusion of pruning and other compression techniques. We then provide a thorough comparative analysis of eight pairs of contrast settings for pruning (e.g., unstructured/structured, one-shot/iterative, data-free/data-driven, initialized/pre-trained weights, etc.) and explore several emerging topics, including pruning for large language models, vision transformers, diffusion models, and large multimodal models, post-training pruning, and different levels of supervision for pruning to shed light on the commonalities and differences of existing methods and lay the foundation for further method development. Finally, we provide some valuable recommendations on selecting pruning methods and prospect several promising research directions for neural network pruning. To facilitate future research on deep neural network pruning, we summarize broad pruning applications (e.g., adversarial robustness, natural language understanding, etc.) and build a curated collection of datasets, networks, and evaluations on different applications. We maintain a repository on https://github.com/hrcheng1066/awesome-pruning that serves as a comprehensive resource for neural network pruning papers and corresponding open-source codes. We will keep updating this repository to include the latest advancements in the field.

  • Research Article
  • Cite Count Icon 42
  • 10.1109/tse.2015.2427831
Identifying Renaming Opportunities by Expanding Conducted Rename Refactorings
  • Sep 1, 2015
  • IEEE Transactions on Software Engineering
  • Hui Liu + 3 more

To facilitate software refactoring, a number of approaches and tools have been proposed to suggest where refactorings should be conducted. However, identification of such refactoring opportunities is usually difficult because it often involves difficult semantic analysis and it is often influenced by many factors besides source code. For example, whether a software entity should be renamed depends on the meaning of its original name (natural language understanding), the semantics of the entity (source code semantics), experience and preference of developers, and culture of companies. As a result, it is difficult to identify renaming opportunities. To this end, in this paper we propose an approach to identify renaming opportunities by expanding conducted renamings. Once a rename refactoring is conducted manually or with tool support, the proposed approach recommends to rename closely related software entities whose names are similar to that of the renamed entity. The rationale is that if an engineer makes a mistake in naming a software entity it is likely for her to make the same mistake in naming similar and closely related software entities. The main advantage of the proposed approach is that it does not involve difficult semantic analysis of source code or complex natural language understanding. Another advantage of this approach is that it is less influenced by subjective factors, e.g., experience and preference of software engineers. The proposed approach has been evaluated on four open-source applications. Our evaluation results show that the proposed approach is accurate in recommending entities to be renamed (average precision 82 percent) and in recommending new names for such entities (average precision 93 percent). Evaluation results also suggest that a substantial percentage (varying from 20 to 23 percent) of rename refactorings are expansible.

  • Research Article
  • 10.71465/csb162
Domain-Adapted Large Language Models for Industrial Applications: From Fine-Tuning to Real-Time Deployment
  • Dec 11, 2025
  • Computer Science Bulletin
  • Jie-Si Yang + 3 more

Large language models (LLMs) have emerged as transformative technologies across various domains, demonstrating remarkable capabilities in natural language understanding and generation. The adaptation of these models for industrial applications presents unique challenges and opportunities, particularly in sectors requiring domain-specific knowledge and real-time processing capabilities. This review examines the current state of domain-adapted LLMs in industrial contexts, focusing on fine-tuning methodologies, deployment strategies, and practical implementation challenges. The paper synthesizes recent advances in parameter-efficient fine-tuning (PEFT), retrieval-augmented generation (RAG), and continual learning approaches that enable effective domain adaptation while maintaining computational efficiency. Special attention is given to industrial sectors including manufacturing, healthcare, finance, and energy, where specialized knowledge and regulatory compliance are critical. The review also addresses key challenges in model deployment, including latency optimization, model compression, and edge computing integration. By analyzing recent developments from 2019 to 2024, this paper provides comprehensive insights into the technical approaches, performance metrics, and practical considerations for deploying domain-adapted LLMs in industrial environments. The findings highlight the importance of balancing model performance with operational constraints, emphasizing the need for hybrid approaches that combine domain-specific fine-tuning with efficient inference strategies. This review serves as a valuable resource for researchers and practitioners seeking to implement LLM-based solutions in industrial settings, offering guidance on methodology selection, deployment architecture, and future research directions in this rapidly evolving field.

  • Conference Article
  • Cite Count Icon 13
  • 10.1145/3632410.3632463
A Comprehensive Analysis of Adapter Efficiency
  • Jan 4, 2024
  • Nandini Mundra + 5 more

Adapters have been positioned as a parameter-efficient fine-tuning (PEFT) approach, whereby a minimal number of parameters are added to the model and fine-tuned. However, adapters have not been sufficiently analyzed to understand if PEFT translates to benefits in training/deployment efficiency and maintainability/extensibility. Through extensive experiments on many adapters, tasks, and languages in supervised and cross-lingual zero-shot settings, we clearly show that for Natural Language Understanding (NLU) tasks, the parameter efficiency in adapters does not translate to efficiency gains compared to full fine-tuning of models. More precisely, adapters are relatively expensive to train and have slightly higher deployment latency. Furthermore, the maintainability/extensibility benefits of adapters can be achieved with simpler approaches like multi-task training via full fine-tuning, which also provide relatively faster training times. We, therefore, recommend that for moderately sized models for NLU tasks, practitioners should rely on full fine-tuning or multi-task training rather than using adapters. Our code is available at https://github.com/AI4Bharat/adapter-efficiency.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.jvcir.2022.103677
Isomorphic model-based initialization for convolutional neural networks
  • Oct 29, 2022
  • Journal of Visual Communication and Image Representation
  • Hong Zhang + 4 more

Isomorphic model-based initialization for convolutional neural networks

  • Research Article
  • 10.1109/tai.2026.3676747
FedDOT: Defending Federated Learning Against Overwhelming Targeted Attacks
  • Jan 1, 2026
  • IEEE Transactions on Artificial Intelligence
  • Priyesh Ranjan + 3 more

Federated Learning (FL), which facilitates collaborative model training and protects users’ privacy, has drawn great interest from the research community. With FL, participants train their models on local data and submit the corresponding updates for aggregation to a server. While concealing the identities of the participants, FL may attract adversaries in order to hamper the underlying model. In this paper, we propose an FL framework, FedDOT, to defend against adversaries performing <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">targeted attacks</i>. FedDOT incorporates two powerful defense algorithms, Maximum Spanning Tree based attacker detection (MSTAD) and Densest graph based attacker detection (Density-AD), which leverage correlation between weight updates and graph theory concepts, maximum spanning tree, and densest graph. With a goal to withstand an overwhelming number of attackers, our algorithms provide strong solutions to aid an FL server, even in overwhelming scenarios where adversaries constitute more than half of the participants. Along with theoretical bounds in correlation space, a rigorous experimental analysis using image classification datasets is carried out to validate the robustness of the FedDOT framework in non-iid settings, which ascertains the superiority of the models against the state-of-the-art methods using a variety of metrics evaluating the accuracy and attack detection rate. With an attack success rate of < 10% for targeted attacks like single-label flipping, multi-label flipping, and backdoor, FedDOT successfully defends against overwhelming adversaries with a marginal accuracy drop of less than 2%.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/icassp49357.2023.10094859
ACF: Aligned Contrastive Finetuning For Language and Vision Tasks
  • Jun 4, 2023
  • Wei Zhu + 4 more

Contrastive learning (CL) has achieved great success in various fields with self-supervised learning. However, CL under the supervised setting is not fully explored, especially how to utilize the class labels in CL. We propose a novel aligned contrastive finetuning (ACF) approach in this work. Specifically, we consider the label embeddings as labeled instances and put them in an InfoNCE loss objective together with the instance representations, thus aligning the label embeddings and instance representation in the same semantic space. In addition, we design a correlation-based regularization term to alleviate the anisotropy problem. Extensive experiments are conducted on language understanding and image classification tasks, demonstrating our ACF method's competitiveness. ACF is off-the-shelf and can be plugged into any pre-trained models without additional network architectures or computation overhead.

  • Research Article
  • Cite Count Icon 68
  • 10.1177/1757975909348111
Methodological consideration of story telling in qualitative research involving Indigenous Peoples
  • Dec 1, 2009
  • Global Health Promotion
  • Susan Bird + 4 more

The use of storytelling in qualitative research involving Inuit compliments the oral tradition of Inuit culture. The objective of the research was to explore the use of qualitative methods to gain understanding of the experience of living with diabetes, with the ultimate goal of better formulating health care delivery and health promotion among Inuit. In-depth interviews were analyzed and interpreted using thematic analysis, open coding, and structured narrative analysis. Inuit community members acted as partners through all stages of the research. ''Because the more we understand, the more we're gonna do a prevention on it ... What I want is use my, use my diabetes, what I have ... so that it can be used by other people for prevention because they'll have understanding about it'' - an Inuk storyteller speaks to the value of education in health promotion. Key methodological issues found relevant to improving qualitative research with Indigenous Peoples include: (i) participatory research methods, grounded in principals of equity, through all phases of research; (ii) the presentation of narratives rather than only interpretations of narratives; (iii) understanding of culture, language, and place to frame the interpretation of the stories in the context within which storytellers experience living with their diabetes, and (iv) the value of multiple methods of analyses. This article comments on the challenges of conducting rigorous research in a cross-cultural setting and outlines methodologies that can improve qualitative narrative analyses research. The research highlighted experiences of living with diabetes and the ways in which storytellers coped and negotiated social support.

  • PDF Download Icon
  • Preprint Article
  • Cite Count Icon 1
  • 10.2196/preprints.64954
Enhancing the Breast Cancer Screening Journey: Exploring Women's Perceptions of Traditional Mammography and Emerging AI driven Thermography (Preprint)
  • Aug 1, 2024
  • Kristýna Sirka Kacafírková + 4 more

BACKGROUND Breast cancer is one of the most frequent causes of mortality among women’s population. Early diagnosis is critical for successful treatment, but underscreening is frequent. Novel screening methods that are more convenient, such as thermography, are being developed. They could help a wider group of screeners and they could contribute to better compliance with screening and thus to a decline in breast cancer mortality. OBJECTIVE The study aims to explore the screeners’ preferences for the screening process, specifically for a novel screening that utilizes artificial intelligence (AI) and thermal imaging. Furthermore, we explore a better understanding of the barriers and facilitators associated with participation in breast cancer screening by currently used mammography. METHODS One online focus group with experts and five focus groups with potential screeners on thermography were carried out. Potential screeners were recruited through an online survey (n=228) focused on addressing barriers and motivations related to screening. Survey data were analyzed using SPSS software. Findings from the focus groups were examined by two researchers utilizing open, axial, and selective coding in MAXQDA software. RESULTS The information obtained in the focus groups showed that small changes during the procedure (tailored adjustments, such as film or music during the procedure, dimmed light) were appreciated, especially by women without any mammography screening experience. Furthermore, the non-invasiveness of the procedure was seen positively by all participants. Among other important factors that influence the perception of the procedure and can therefore affect the decision whether to go or not, was the way they were treated by medical staff and waiting hours. For certain women, how the interaction goes between them, and the clinicians is more important than the technology itself. Results from the online survey complemented these insights on motivation and barriers. Personal belief in breast cancer prevention was the most indicated motivator for women with mammography experience (44%, n=154), followed by an invitation from a screening program (29%, n=154). Barriers indicated by women without experience were mainly: no recommendation from a doctor (53%, n=74), no warning signals (36%, n=74) or no problem related to breasts (28%, n=74) followed by being too young for mammography (23%, n=74). CONCLUSIONS Even though the thermography was perceived mainly positively, rather than the technique itself, women prioritized how they were treated by medical staff. This includes detailed information in understandable language, empathetic communication and adjustments that fit personal preferences. As we also saw in the results from the survey, doctors play the leading role in the decision to go to the screening, so their change in the approach can encourage greater participation in breast cancer screening initiatives.

  • Research Article
  • Cite Count Icon 4
  • 10.1007/s10994-022-06182-z
Distilling ensemble of explanations for weakly-supervised pre-training of image segmentation models
  • Jun 9, 2022
  • Machine Learning
  • Xuhong Li + 6 more

While fine-tuning pre-trained networks has become a popular way to train image segmentation models, such backbone networks for image segmentation are frequently pre-trained using image classification source datasets, e.g., ImageNet. Though image classification datasets could provide the backbone networks with rich visual features and discriminative ability, they are incapable of fully pre-training the target model (i.e., backbone+segmentation modules) in an end-to-end manner. The segmentation modules are left to random initialization in the fine-tuning process due to the lack of segmentation labels in classification datasets. In our work, we propose a method that leverages Pseudo Semantic Segmentation Labels (PSSL), to enable the end-to-end pre-training for image segmentation models based on classification datasets. PSSL was inspired by the observation that the explanation results of classification models, obtained through explanation algorithms such as CAM, SmoothGrad and LIME, would be close to the pixel clusters of visual objects. Specifically, PSSL is obtained for each image by interpreting the classification results and aggregating an ensemble of explanations queried from multiple classifiers to lower the bias caused by single models. With PSSL for every image of ImageNet, the proposed method leverages a weighted segmentation learning procedure to pre-train the segmentation network en masse. Experiment results show that, with ImageNet accompanied by PSSL as the source dataset, the proposed end-to-end pre-training strategy successfully boosts the performance of various segmentation models, i.e., PSPNet-ResNet50, DeepLabV3-ResNet50, and OCRNet-HRNetW18, on a number of segmentation tasks, such as CamVid, VOC-A, VOC-C, ADE20K, and CityScapes, with significant improvements.

  • Research Article
  • Cite Count Icon 3
  • 10.1109/tnnls.2024.3380827
Toward Efficient Convolutional Neural Networks With Structured Ternary Patterns.
  • Mar 1, 2025
  • IEEE Transactions on Neural Networks and Learning Systems
  • Christos Kyrkou

High-efficiency deep learning (DL) models are necessary not only to facilitate their use in devices with limited resources but also to improve resources required for training. Convolutional neural networks (ConvNets) typically exert severe demands on local device resources and this conventionally limits their adoption within mobile and embedded platforms. This brief presents work toward utilizing static convolutional filters generated from the space of local binary patterns (LBPs) and Haar features to design efficient ConvNet architectures. These are referred to as Structured Ternary Patterns (STePs) and can be generated during network initialization in a systematic way instead of having learnable weight parameters thus reducing the total weight updates. The ternary values require significantly less storage and with the appropriate low-level implementation, can also lead to inference improvements. The proposed approach is validated using four image classification datasets, demonstrating that common network backbones can be made more efficient and provide competitive results. It is also demonstrated that it is possible to generate completely custom STeP-based networks that provide good trade-offs for on-device applications such as unmanned aerial vehicle (UAV)-based aerial vehicle detection. The experimental results show that the proposed method maintains high detection accuracy while reducing the trainable parameters by 40%-80%. This work motivates further research toward good priors for nonlearnable weights that can make DL architectures more efficient without having to alter the network during or after training.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/tensymp50017.2020.9230585
Image Classification using DNN with an Improved Optimizer
  • Jan 1, 2020
  • 2020 IEEE Region 10 Symposium (TENSYMP)
  • Nazmus Saqib + 1 more

In deep learning, the optimization techniques are for the most part dependent on gradient descent methods, such as SGD, ADAM which adopt the leading place in the area of optimization methods. Fortuitous methodologies which depend on stochastic gradients are non-adaptive because the prescribed parameter worth's usage should be tuned for every application. But the generalization performance of the stochastic optimizers is far superior to the adaptive methods, whereas Adam and its variants cannot maintain this without a fast convergence rate in deep neural networks. To improve this generalization performance, we need to diminish the oscillation of the weights which is the general problem of the accuracy fall. Along these lines, we have attempted to propose Mean- ADAM, a variance of ADAM which has extended the updated weights by an external weight to diminish the oscillation and overcome a superior accuracy rate than all other adaptive gradient methods till the conclusion of the breeding. Therefore, we can substantially improve the generalization performance, permitting it to contend with SGD with momentum on image classification datasets such as MNIST, CIFAR10, CIFAR100, ImageNet, etc. We have attained 82% at 150 epochs with CIFAR10 and 99.49% with MNIST whereas the ADAM has indicated 76% and 99.43% individually.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant