Articles published on Pre-trained Model
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
14262 Search results
Sort by Recency
- New
- Research Article
- 10.1145/3800586
- Mar 3, 2026
- ACM Transactions on Internet of Things
- Tao Wang + 5 more
Below-ground biomass (BGB) of root tubers is an important phenotypic trait in crop monitoring and other agricultural applications. This paper proposes a novel tuber biomass sensing (TBS) framework that uses internet of things (IoT) devices to enable non-destructive estimation of below-ground root tuber biomass. Specifically, we perform extensive experiments to build a new BGB dataset with more than 700,000 received signal strength (RSS) measurements collected by our low-cost wireless network. Then, we propose a novel data-driven model that integrates convolution neural networks, residual connections, and attention mechanisms to facilitate discriminative feature extraction from RSS data and achieve state-of-the-art (SOTA) performance in biomass estimation. In addition, to mitigate performance degradation caused by imbalanced training data, we propose a contrastive learning method that aligns feature representations of samples with similar biomass values while increasing the separation between those with significantly different values. This method reduces estimation bias toward high-frequency biomass labels, thereby improving the performance and generalizability of the data-driven model. Experimental results demonstrate the efficacy of the proposed TBS framework. Our dataset and pre-trained models are publicly available on https://zenodo.org/records/15000852.
- New
- Research Article
- 10.1177/08953996261419893
- Mar 2, 2026
- Journal of X-ray science and technology
- Simiao Yuan + 7 more
Domain adaptation for low-dose CT denoising via pretraining and self-supervised fine-tuning.
- New
- Research Article
- 10.1109/tpami.2025.3626772
- Mar 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
- Zanlin Ni + 7 more
Recent advances in image synthesis have been propelled by powerful generative models, such as Masked Generative Transformers (MaskGIT), autoregressive models, diffusion models, and rectified flow models. A common principle behind their success is the decomposition of complex synthesis tasks into multiple tractable steps. However, this introduces a proliferation of step-specific parameters to be configured for modulating the iterative generation process (e.g., mask ratio, noise level, or temperature at each step). Existing approaches typically rely on manually-designed scheduling rules to manage this complexity, demanding expert knowledge and extensive trial-and-error. Furthermore, these static schedules lack the flexibility to adapt to the unique characteristics of each individual sample, yielding sub-optimal performance. To address this issue, we present AdaGen, a general, learnable, and sample-adaptive framework for scheduling the iterative generation process. Specifically, we formulate the scheduling problem as a Markov Decision Process, where a lightweight policy network is introduced to adaptively determine the most suitable parameters given the current generation state, and can be trained through reinforcement learning. Importantly, we demonstrate that simple reward designs, such as FID or pre-trained reward models, can be easily hacked and may not reliably guarantee the desired quality or diversity of generated samples. Therefore, we propose an adversarial reward design to guide the training of the policy networks effectively. Finally, we introduce an inference-time refinement strategy and a controllable fidelity-diversity trade-off mechanism to further enhance the performance and flexibility of AdaGen. Comprehensive experiments across five benchmark datasets (ImageNet-256 × 256 & 512 × 512, MS-COCO, CC3M, and LAION-5B) and four distinct generative paradigms validate the superiority of AdaGen . For example, AdaGen achieves better performance on DiT-XL with $\mathbf {\sim 3\times }$∼3× lower inference cost and improves the FID of VAR from 1.92 to 1.59 with negligible additional computational overhead.
- New
- Research Article
- 10.1109/tpami.2025.3626757
- Mar 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
- Wenqi Zhu + 5 more
Open-vocabulary semantic segmentation aims to partition an image into distinct semantic regions based on an open set of categories. Existing approaches primarily rely on image-level pre-trained vision-language models to perform this pixel-level task. In this paper, we propose SED, a simple yet effective encoder-decoder architecture for open-vocabulary semantic segmentation leveraging pre-trained vision-language models. SED consists of a hierarchical image encoder, a text encoder, and a gradual fusion decoder. The hierarchical image encoder and text encoder collaboratively generate a cost volume, which is progressively decoded by the gradual fusion decoder to produce segmentation results. In contrast to a plain encoder, the hierarchical encoder better captures image detail information while maintaining linear computational complexity with respect to input size. The gradual fusion decoder adopts a top-down structure to progressively integrate high-resolution features with the cost volume. Furthermore, a category early rejection strategy is introduced in gradual fusion decoder to filter out non-existent categories at different layers, significantly improving inference efficiency. Based on SED, we further introduce two modules, including non-label text embedding and additional category early rejection in the encoder. Moreover, we extend our method with minimal decoder modification for open-vocabulary video semantic segmentation. Extensive experiments on multiple datasets validate the effectiveness and efficiency of our proposed method. With ConvNeXt-B, our method achieves an mIoU of 34.9% on the ADE20 K with 150 classes (i.e., A-150) at an inference speed of 69 ms per image on a single A6000 GPU, and has an mIoU score of 40.2% on video segmentation dataset VSPW.
- New
- Research Article
2
- 10.1016/j.neunet.2025.108261
- Mar 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Ziyuan Yan + 9 more
CNNCaps-DBP: Leveraging protein language models with attention-augmented convolution for DNA-binding protein prediction.
- New
- Research Article
- 10.1016/j.envres.2026.123763
- Mar 1, 2026
- Environmental research
- Fulin Shao + 6 more
Molreac-Oxi: An end-to-end deep learning-quantum chemistry platform for •OH reactivity (kOH), pathways, and active-site insight.
- New
- Research Article
- 10.1016/j.compmedimag.2026.102734
- Mar 1, 2026
- Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society
- Yeqi Liu + 4 more
Scale-consistent 3D reconstruction in monocular colonoscopy via camera-intrinsics-guided learning.
- New
- Research Article
1
- 10.1109/tpami.2025.3633657
- Mar 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
- Wenrui Li + 5 more
With the rapid growth of video content on social media, video summarization has become a crucial task in multimedia processing. However, existing methods face challenges in capturing global dependencies in video content and accommodating multimodal user customization. Moreover, temporal proximity between video frames does not always correspond to semantic proximity. To tackle these challenges, we propose a novel Language-guided Graph Representation Learning Network (LGRLN) for video summarization. Specifically, we introduce a video graph generator that converts video frames into a structured graph to preserve temporal order and contextual dependencies. By constructing forward, backward and undirected graphs, the video graph generator effectively preserves the sequentiality and contextual relationships of video content. We designed an intra-graph relational reasoning module with a dual-threshold graph convolution mechanism, which distinguishes semantically relevant frames from irrelevant ones between nodes. Additionally, our proposed language-guided cross-modal embedding module generates video summaries with specific textual descriptions. We model the summary generation output as a mixture of Bernoulli distribution and solve it with the EM algorithm. Experimental results show that our method outperforms existing approaches across multiple benchmarks. Moreover, we proposed LGRLN reduces inference time and model parameters by 87.8% and 91.7%, respectively.
- New
- Research Article
- 10.1016/j.eti.2025.104673
- Mar 1, 2026
- Environmental Technology & Innovation
- Fatih Gurcan + 1 more
Automated waste classification for smart recycling: A multi-class CNN approach with transfer learning and pre-trained models
- New
- Research Article
- 10.1016/j.jenvman.2026.129078
- Mar 1, 2026
- Journal of environmental management
- Sireesha Mantena + 5 more
Advances in transfer learning for smart wastewater treatment plants: Learning frameworks and emerging pathways.
- New
- Research Article
- 10.11591/ijict.v15i1.pp93-101
- Mar 1, 2026
- International Journal of Informatics and Communication Technology (IJ-ICT)
- Haresh Rajkumar + 3 more
Plant disease is a significant challenge for agriculture, leading to reduced yield, economic loss, and environmental impact. Leveraging digital photos of plant leaves, convolutional neural networks (CNNs) have emerged as promising tools for disease detection. The methodology involves several steps, including image pre-processing, segmentation, feature extraction using CNNs. Crucially, a diverse dataset comprising images of both healthy and diseased leaves under varying conditions is necessary for training accurate models. Transfer learning, particularly with pre-trained models like ImageNet, can further enhance accuracy, allowing for better performance with fewer training samples. The proposed method demonstrates impressive results, achieving over 95% accuracy, outperforming existing state-of-the-art techniques. This system could serve as a valuable tool for farmers, facilitating timely disease identification and treatment, ultimately leading to increased agricultural yields, reduced financial losses, and the adoption of more sustainable farming practices. Additionally, beyond its practical applications, the proposed system holds promise for advancing sustainable agriculture by promoting environmentally friendly farming methods and contributing to the overall resilience and productivity of agricultural systems.
- New
- Research Article
- 10.1016/j.neunet.2025.108227
- Mar 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Ben Wan + 3 more
Gradient flow-based iterative pruning for efficient and high-quality lightweight diffusion models.
- New
- Research Article
- 10.1016/j.eswa.2025.130068
- Mar 1, 2026
- Expert Systems with Applications
- Abid Ali Khan Danish + 1 more
Bridging the safety-specific language model gap: Domain-adaptive pretraining of transformer-based models across several industrial sectors for occupational safety applications
- New
- Research Article
- 10.1016/j.neunet.2025.108168
- Mar 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Feifan Gao + 6 more
LMcast: A pretrained language model guided long-term memory transformer for precipitation nowcasting.
- New
- Research Article
- 10.1016/j.compbiomed.2026.111514
- Mar 1, 2026
- Computers in biology and medicine
- Gurbaksh Lal + 2 more
Ollama-driven medical insights using LLMs with a federated learning approach.
- New
- Research Article
- 10.1016/j.atech.2025.101674
- Mar 1, 2026
- Smart Agricultural Technology
- Jiuyu Zhang + 5 more
Adapting pre-trained large model to robust individual tree crown detection from UAV optical images
- New
- Research Article
- 10.1016/j.cviu.2026.104668
- Mar 1, 2026
- Computer Vision and Image Understanding
- Zhenyu Yan + 3 more
AnomalySD: One-for-all few-shot anomaly detection via pre-trained diffusion models
- New
- Research Article
- 10.1016/j.mlwa.2025.100833
- Mar 1, 2026
- Machine Learning with Applications
- Chandramohan Abhishek + 1 more
Machine-interactive decision-assistance using a pre-trained natural language processing model for 4D printing technique selection
- New
- Research Article
- 10.1016/j.csl.2025.101900
- Mar 1, 2026
- Computer Speech & Language
- Jieun Choi + 2 more
Compress, Align, and Transfer: A new method for transferring pre-trained language models knowledge to CTC-based speech recognition
- New
- Research Article
- 10.1016/j.bonr.2026.101898
- Mar 1, 2026
- Bone reports
- Wei Huang + 9 more
Intelligent identification of osteoporosis on hip X-rays using vision transformer.