Articles published on fine-tuning-approaches
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
368 Search results
Sort by Recency
- Research Article
- 10.5121/ijci.2025.140301
- Jun 4, 2025
- International Journal on Cybernetics & Informatics
- Xia Li + 1 more
To improve the work efficiency and code quality of modern software development, users always reuse Application Programming Interfaces (APIs) provided by third-party libraries and frameworks rather than implementing from scratch. However, due to time constraints in software development, API developers often refrain from providing detailed explanations or usage instructions for APIs, resulting in confusion for users. It is important to categorize API reviews into different groups for easily usage. In this paper, we conduct a comprehensive study to evaluate the effectiveness of prompt-based API review classification based on various pre-trained models such as BERT, RoBERTa, BERTOverflow. Our experimental results show that prompts with complete context can achieve best effectiveness and the model RoBERTa outperforms other two models due to the size of training corpus. We also utilize the widely-used fine-tuning approach LoRA to evaluate that the training overhead can be significantly reduced (e.g., 50% reduction) without the loss of the effectiveness of classification.
- Research Article
- 10.3390/app15116286
- Jun 3, 2025
- Applied Sciences
- Jakub Gajda + 1 more
Anomaly detection is a process in which outlier samples can be detected in a given dataset. The purpose of this study is to implement, test, and evaluate the possibility of using deep learning methods for outlier detection with the use of a fine-tuning approach. A Transformer Masked Autoencoder was fine-tuned for a custom satellite image dataset after being pre-trained on the ImageNet subset. The first process of training included building an internal representation of images from a normal class. After adjusting the model weights for this task, a custom dataset with normal and abnormal samples was used for the reconstruction error calculation. The results obtained in this study show that it is possible to distinguish between normal class representatives and outliers using the proposed approach. However, this is not sufficient for the model to be employed in real-life applications. With a given level of precision, the model requires additional knowledge about the subject to correctly classify the sample. To the best of our knowledge, this study is the first to apply ViTMAE for a custom satellite image database. An analysis of the misclassified samples shows that the model tends to generalize the image content and is not sufficiently robust for image noise. As a result of the analysis, a new anomaly indicator is proposed for further study.
- Research Article
3
- 10.1016/j.eij.2025.100683
- Jun 1, 2025
- Egyptian Informatics Journal
- A.K Indira Kumar + 5 more
Multi-task detection of harmful content in code-mixed meme captions using large language models with zero-shot, few-shot, and fine-tuning approaches
- Research Article
- 10.52783/jisem.v10i52s.10779
- Jun 1, 2025
- Journal of Information Systems Engineering and Management
- Yorissa Silviana
Data management of laws and regulations is critical to support an efficient legal information system. Still, the complexity of legal language, the diversity of document structures, and the large volume of data are the main challenges in the automatic classification process. This research aims to optimize the DistilBERT model through a fine-tuning approach with a multi-task learning scheme to predict two labels simultaneously, namely regulation status (Applicable / Not Applicable) and type/form of regulation. The research stages include data collection, preprocessing, model training, and model evaluation. The model achieved high performance on the two classification tasks, with 96% accuracy, 94% precision, 96% recall, and 94% f1-score for Regulation Status classification, as well as 100% perfect results on all evaluation metrics for Regulation Type/Shape classification, demonstrating the accuracy and reliability of the model in understanding and classifying legal documents as a whole. This finding confirms that the optimized model is highly reliable in the classification of the status of laws and regulations.
- Research Article
- 10.1016/j.compbiomed.2025.110235
- Jun 1, 2025
- Computers in biology and medicine
- Lasse Renz-Kiefel + 5 more
Inter-hospital transferability of AI: A case study on phase recognition in cholecystectomy.
- Research Article
6
- 10.1038/s41597-025-05243-x
- May 28, 2025
- Scientific Data
- Diani Sirimewan + 4 more
Efficient management of construction and demolition waste (CDW) is essential for enhancing resource recovery. The lack of publicly available, high-quality datasets for waste recognition limits the development and adoption of automated waste handling solutions. To facilitate data sharing and reuse, this study introduces ‘CDW-Seg’, a benchmark dataset for class-wise segmentation of CDW. The dataset comprises high-resolution images captured at authentic construction sites, featuring skip bins filled with a diverse mixture of CDW materials in-the-wild. It includes 5,413 manually annotated objects across ten categories: concrete, fill dirt, timber, hard plastic, soft plastic, steel, fabric, cardboard, plasterboard, and the skip bin, representing a total of 2,492,021,189 pixels. Each object was meticulously annotated through semantic segmentation, providing reliable ground-truth labels. To demonstrate the applicability of the dataset, an adapter-based fine-tuning approach was implemented using a hierarchical Vision Transformer, ensuring computational efficiency suitable for deployment in automated waste handling scenarios. The CDW-Seg has been made publicly accessible to promote data sharing, facilitate further research, and support the development of automated solutions for resource recovery.
- Research Article
3
- 10.1007/s11571-025-10264-8
- May 26, 2025
- Cognitive neurodynamics
- Wenlong Ding + 5 more
The utilization of transfer learning (TL), particularly through pre-training and fine-tuning, in steady-state visual evoked potential (SSVEP)-based brain-computer interfaces (BCIs) has substantially reduced the calibration efforts. However, commonly employed fine-tuning approaches, including end-to-end fine-tuning and last-layer fine-tuning, require data from target subjects that encompass all categories (stimuli), resulting in a time-consuming data collection process, especially in systems with numerous categories. To address this challenge, this study introduces a straightforward yet effective ShallOw Fine-Tuning (SOFT) method to substantially reduce the number of calibration categories needed for model fine-tuning, thereby further mitigating the calibration efforts for target subjects. Specifically, SOFT involves freezing the parameters of the deeper layers while updating those of the shallow layers during fine-tuning. Freezing the parameters of deeper layers preserves the model's ability to recognize semantic and high-level features across all categories, as established during pre-training. Moreover, data from different categories exhibit similar individual-specific low-level features in SSVEP-BCIs. Consequently, updating the parameters of shallow layers-responsible for processing low-level features-with data solely from partial categories enables the fine-tuned model to efficiently capture the individual-related features shared by all categories. The effectiveness of SOFT is validated using two public datasets. Comparative analysis with commonly used end-to-end and last-layer fine-tuning methods reveals that SOFT achieves higher classification accuracy while requiring fewer calibration categories. The proposed SOFT method further decreases the calibration efforts for target subjects by reducing the calibration category requirements, thereby improving the feasibility of SSVEP-BCIs for real-world applications.
- Research Article
1
- 10.1145/3730403
- May 22, 2025
- ACM Transactions on Multimedia Computing, Communications, and Applications
- Mohan Zhou + 3 more
The ability to fine-tune generative models for text-to-image generation tasks is crucial, particularly when facing the complexity involved in accurately interpreting and visualizing textual inputs. While LoRA is efficient for language model adaptation, it often falls short in text-to-image tasks due to the intricate demands of image generation, such as accommodating a broad spectrum of styles and nuances. To bridge this gap, we introduce StyleInject, a specialized fine-tuning approach tailored for text-to-image models. StyleInject comprises multiple parallel low-rank parameter matrices, maintaining the diversity of visual features. It dynamically adapts to varying styles by adjusting the variance of visual features based on the characteristics of the input signal. This approach significantly minimizes the impact on the original model’s text-image alignment capabilities while adeptly adapting to various styles in transfer learning. StyleInject proves particularly effective in learning from and enhancing a range of advanced, community-fine-tuned generative models. Our comprehensive experiments, including both small-sample and large-scale data fine-tuning as well as base model distillation, show that StyleInject surpasses traditional LoRA in both text-image semantic consistency and human preference evaluation, all while ensuring greater parameter efficiency.
- Research Article
1
- 10.3390/math13101638
- May 16, 2025
- Mathematics
- Xiangang Cao + 5 more
The intelligent maintenance of coal mining equipment is crucial for ensuring safe production in coal mines. Despite the rapid development of large language models (LLMs) injecting new momentum into the intelligent transformation and upgrading of coal mining, their application in coal mining equipment maintenance still faces challenges due to the diversity and technical complexity of the equipment. To address the scarcity of domain knowledge and poor model adaptability in multi-task scenarios within the coal mining equipment maintenance field, a method for constructing a large language model based on multi-dimensional prompt learning and improved LoRA (MPL-LoRA) is proposed. This method leverages multi-dimensional prompt learning to guide LLMs in generating high-quality multi-task datasets for coal mining equipment maintenance, ensuring dataset quality while improving construction efficiency. Additionally, a fine-tuning approach based on the joint optimization of a mixture of experts (MoE) and low-rank adaptation (LoRA) is introduced, which employs multiple expert networks and task-driven gating functions to achieve the precise modeling of different maintenance tasks. Experimental results demonstrate that the self-constructed dataset achieves fluency and professionalism comparable to manually annotated data. Compared to the base LLM, the proposed method shows significant performance improvements across all maintenance tasks, offering a novel solution for intelligent coal mining maintenance.
- Research Article
- 10.4018/ijmcmc.376486
- May 12, 2025
- International Journal of Mobile Computing and Multimedia Communications
- Xinli Zhu + 2 more
Fine-tuning large language models (LLMs) for sports injury prevention and treatment in resource-constrained environments poses significant challenges due to memory demands and growing size of data. This paper proposes an efficient full-parameter fine-tuning approach based on Gradient Low-Rank Projection (GaLore) to reduce memory usage. Further, a data augmentation strategy for sports injury prevention and treatment is utilized to finetune a question-and-answer (Q&A) model with 0.5B parameter on consumer GPUs with 24GB memory. Experiment results show that the proposed method enhanced by GaLore is superior to SOTA methods such as low-rank adaptation (LoRA) in terms of convergence accuracy, training time, memory consumption, and indicators of BLEU-4 and ROUGE-2. Meanwhile, the empirical effect of injury prevention Q&A cases indicate that Qwen2-0.5B-Instruct trained by the proposed method have obvious advantages in professional knowledge understanding and overcoming hallucinations.
- Research Article
8
- 10.1038/s41746-025-01522-4
- May 8, 2025
- npj Digital Medicine
- Avisha Das + 4 more
Critical findings in radiology reports are life threatening conditions that need to be communicated promptly to physicians for timely management of patients. Although challenging, advancements in natural language processing (NLP), particularly large language models (LLMs), now enable the automated identification of key findings from verbose reports. Given the scarcity of labeled critical findings data, we implemented a two-phase, weakly supervised fine-tuning approach on 15,000 unlabeled Mayo Clinic reports. This fine-tuned model then automatically extracted critical terms on internal (Mayo Clinic, n = 80) and external (MIMIC-III, n = 123) test datasets, validated against expert annotations. Model performance was further assessed on 5000 MIMIC-IV reports using LLM-aided metrics, G-eval and Prometheus. Both manual and LLM-based evaluations showed improved task alignment with weak supervision. The pipeline and model, publicly available under an academic license, can aid in critical finding extraction for research and clinical use (https://github.com/dasavisha/CriticalFindings_Extract).
- Research Article
1
- 10.3390/agriculture15091006
- May 6, 2025
- Agriculture
- Hongjun Zhang + 3 more
Pest detection in agriculture faces the challenge of adapting to new pest species while preserving the ability to recognize previously learned ones. Traditional model fine-tuning approaches often result in catastrophic forgetting, where the acquisition of new classes significantly impairs the recognition performance of existing ones. Although knowledge distillation has been shown to effectively mitigate catastrophic forgetting, current research predominantly focuses on feature imitation, neglecting the extraction of potentially valuable information from responses. To address this issue, we introduce a response-based distillation method, called adaptive response distillation (ARD). ARD incorporates an adaptive response filtering strategy that dynamically adjusts the weights of classification and regression responses based on the significance of the information. This approach selectively filters and transfers valuable response data, ensuring efficient propagation of category and localization information. Our method effectively reduces catastrophic forgetting during incremental learning, enabling the student detector to maintain memory of old classes while assimilating new pest categories. Experimental evaluations on the large-scale IP102 pest dataset demonstrate that the proposed ARD method consistently outperforms existing state-of-the-art algorithms across various class-incremental learning scenarios, significantly narrowing the performance gap compared to fully trained models.
- Research Article
2
- 10.3390/computers14050173
- May 2, 2025
- Computers
- Malik Almaliki + 3 more
Cyberhate presents a multifaceted, context-sensitive challenge that existing detection methods often struggle to tackle effectively. Large language models (LLMs) exhibit considerable potential for improving cyberhate detection due to their advanced contextual understanding. However, detection alone is insufficient; it is crucial for software to also promote healthier user behaviors and empower individuals to actively confront the spread of cyberhate. This study investigates whether integrating large language models (LLMs) with persuasive technology (PT) can effectively detect cyberhate and encourage prosocial user behavior in digital spaces. Through an empirical study, we examine users’ perceptions of a self-monitoring persuasive strategy designed to reduce cyberhate. Specifically, the study introduces the Comment Analysis Feature to limit cyberhate spread, utilizing a prompt-based fine-tuning approach combined with LLMs. By framing users’ comments within the relevant context of cyberhate, the feature classifies input as either cyberhate or non-cyberhate and generates context-aware alternative statements when necessary to encourage more positive communication. A case study evaluated its real-world performance, examining user comments, detection accuracy, and the impact of alternative statements on user engagement and perception. The findings indicate that while most of the users (83%) found the suggestions clear and helpful, some resisted them, either because they felt the changes were irrelevant or misaligned with their intended expression (15%) or because they perceived them as a form of censorship (36%). However, a substantial number of users (40%) believed the interventions enhanced their language and overall commenting tone, with 68% suggesting they could have a positive long-term impact on reducing cyberhate. These insights highlight the potential of combining LLMs and PT to promote healthier online discourse while underscoring the need to address user concerns regarding relevance, intent, and freedom of expression.
- Research Article
7
- 10.1016/j.compbiomed.2025.110031
- May 1, 2025
- Computers in biology and medicine
- Jaeung Lee + 3 more
Benchmarking pathology foundation models: Adaptation strategies and scenarios.
- Research Article
1
- 10.3390/ai6050093
- May 1, 2025
- AI
- Xiao Liu + 2 more
Generating character-consistent and personalized dialogue for Non-Player Characters (NPCs) in Role-Playing Games (RPGs) poses significant challenges, especially due to limited memory retention and inconsistent character representation. This paper proposes a framework for generating personalized dialogues based on character-specific knowledge. By combining static knowledge fine-tuning and dynamic knowledge graph technology, the framework generates dialogue content that is more aligned with character settings and is highly personalized. Specifically, the paper introduces a protective static knowledge fine-tuning approach to ensure that the language model does not generate content beyond the character’s cognitive scope during conversations. Additionally, dynamic knowledge graphs are employed to store and update the interaction history between NPCs and players, forming unique “experience-response” patterns. During dialogue generation, the paper first parses player input into an Abstract Meaning Representation (AMR) graph, retrieves relevant memory nodes from the knowledge graph, and constructs a fused graph structure. This integrated graph is encoded via a graph neural network to generate high-dimensional semantic vectors, which are then used to retrieve and supplement knowledge from the vector database. Ultimately, the model generates personalized responses consistent with the NPC’s identity. Experimental results demonstrate that the framework significantly enhances the authenticity of NPC dialogues and player immersion and performs well on multiple large-scale language models.
- Research Article
1
- 10.3390/electronics14091850
- May 1, 2025
- Electronics
- Oded Milman + 2 more
Pupil segmentation in visible-light (RGB) images presents unique challenges due to variable lighting conditions, diverse eye colors, and poor contrast between iris and pupil, particularly in individuals with dark irises. While near-infrared (NIR) imaging has been the traditional solution for eye-tracking systems, the accessibility and practicality of RGB-based solutions make them attractive for widespread adoption in consumer devices. This paper presents a baseline for RGB pupil segmentation by adapting the Segment Anything Model (SAM). We introduce a multi-stage fine-tuning approach that leverages SAM’s exceptional generalization capabilities, further enhancing its elemental capacity for accurate pupil segmentation. The staged approach consists of SAM-BaseIris for enhanced iris detection, SAM-RefinedIris for improving iris segmentation with automated bounding box prompts, and SAM-RefinedPupil for precise pupil segmentation. Our method was evaluated on three standard visible-light datasets: UBIRIS.v2, I-Social DB, and MICHE-I. The results demonstrate robust performance across diverse lighting conditions and eye colors. Our method achieves near SOTA results for iris segmentation and attains mean mIOU and DICE scores of 79.37 and 87.79, respectively, for pupil segmentation across the evaluated datasets. This work establishes a strong foundation for RGB-based eye-tracking systems and demonstrates the potential of adapting foundation models for specialized medical imaging tasks.
- Conference Article
- 10.2118/224127-ms
- Apr 25, 2025
- SPE Western Regional Meeting
- S Ouzineb + 1 more
Abstract When analyzing the eligibility of unconventional reservoirs for CO2 Capture and Storage, it is crucial to have access to key wellbore log data. However, such data is not always acquired or may be of poor quality due to a challenging measurement environment. To facilitate reservoir interpretation and to account for varying geological contexts, our goal is to automatically predict and reconstruct well logs at scale using state-of-the-art machine learning models specialized to each well's geological environment. We introduce a novel AI fine-tuning approach for well log prediction which automatically finds the optimal analog wells for each well of interest. Foundation AI regression models are trained on public datasets and fine-tuned on the analog wells. For log reconstruction, we also introduce a novel segmentation-based algorithm that automatically identifies the inconsistent well intervals where the log prediction differs significantly compared to the existing log. In such intervals, we keep between the measured and the predicted logs, the one that is the most consistent with its local neighboring well log data intervals. When fine-tuning a Foundation AI model by using the wells automatically selected by our new method rather than using all available wells in a study, we could decrease by 24.5% the prediction error, as measured on a public wellbore dataset from Wyoming for the bulk density measurement. On the same study, when comparing with the approach of using a frozen foundation model, we achieve with our method a prediction error that is 39.4% smaller. These experiments show a significant increase in well log prediction quality when using our novel AI analogs-based method compared to classical log prediction methods. We also experiment with different types of AI regression models, including XGBoost and transformer-based models adapted for time-series imputation tasks. We find that transformers-based models achieve the best performance among all compared models. In particular, the self-attention imputation for time-series model (SAITS) achieves a reconstruction error of the bulk density measurement that is 20% lower compared to XGBoost on a test set of 170 wells from the Netherlands. This model benchmarking allows us to further increase the quality of our well log prediction and reconstruction, which in turns facilitates petrophysical interpretation for wellbore interpreters. Our novel AI-powered log prediction method solves the well log covariate shift issue found in the literature by automatically finding the optimal analog wells to use for fine-tuning, thus leading to better quality predicted logs. We also introduce a new method to automatically detect and reconstruct inconsistent measurements intervals. This method is designed and implemented to perform log prediction and reconstruction automatically at scale, thus facilitating the petrophysical interpretation task.
- Research Article
4
- 10.1038/s41598-025-97802-w
- Apr 23, 2025
- Scientific Reports
- Mingyan Wang + 2 more
Colon cancer is a prevalent disease on a global scale, thus making its detection and prevention a critical area in the medical field. In addressing the challenges of high annotation costs and the need for improved accuracy in colon polyp detection, this study explores the segment anything model (SAM) application and fine-tuning strategies for colon polyp segmentation. Conventional full fine-tuning approaches frequently result in catastrophic forgetting, thereby compromising the model’s generalization capabilities. To address this challenge, this paper proposes an efficient fine-tuning method, PSF-SAM, which mitigates catastrophic forgetting while enhancing performance in few-shot scenarios. This is achieved by freezing most SAM parameters and optimizing only specific structures. The efficacy of PSF-SAM is substantiated by experimental evaluations on the Kvasir-SEG and CVC-ClinicDB datasets, which demonstrate its superior performance in metrics such as mDice coefficients and mIoU, as well as its notable advantages in few-shot learning scenarios when compared to existing fine-tuning methods.
- Research Article
6
- 10.1063/5.0256873
- Apr 22, 2025
- APL Machine Learning
- Markus J Buehler
We present an approach for modifying transformer architectures by integrating graph-aware relational reasoning into the attention mechanism, merging concepts from graph neural networks and language modeling. Building on the inherent connection between attention and graph theory, we reformulate the transformer’s attention mechanism as a graph operation and propose graph-aware isomorphic attention. This method leverages advanced graph modeling strategies, including Graph Isomorphism Networks (GINs), to enrich the representation of relational structures. Our approach improves the model’s ability to capture complex dependencies and generalize across tasks, as evidenced by a reduced generalization gap and improved learning performance. We expand the concept of graph-aware attention to introduce sparse-GIN-attention, a fine-tuning approach that enhances the adaptability of pre-trained foundational models with minimal computational overhead, endowing them with graph-aware capabilities. We show that the sparse-GIN-attention framework leverages compositional principles from category theory to align relational reasoning with sparsified graph structures while modeling hierarchical representation learning that bridges local interactions and global task objectives across diverse domains. Our results demonstrate that graph-aware attention mechanisms outperform traditional attention in both training efficiency and validation performance. These insights bridge graph theory and transformer architectures and uncover latent graph-like structures within traditional attention mechanisms, offering a new lens through which transformers can be optimized. By evolving transformers as hierarchical GIN models, we reveal their implicit capacity for graph-level relational reasoning with profound implications for foundational model development and applications in bioinformatics, materials science, language modeling, and beyond, setting the stage for interpretable and generalizable modeling strategies.
- Research Article
- 10.21105/joss.07489
- Apr 12, 2025
- Journal of Open Source Software
- Sayak Chakrabarty + 1 more
Chakrabarty et al., (2025). ReadmeReady: Free and Customizable Code Documentation with LLMs - A Fine-Tuning Approach. Journal of Open Source Software, 10(108), 7489, https://doi.org/10.21105/joss.07489