Published in last 50 years
Articles published on Zero-shot Learning
- New
- Research Article
- 10.3390/electronics14214341
- Nov 5, 2025
- Electronics
- Qin Li + 3 more
Zero-shot learning (ZSL) aims to categorize target classes with the aid of semantic knowledge and samples from previously seen classes. In this process, the alignment of visual and attribute modality features is key to successful knowledge transfer. Several previous studies have investigated the extraction of attribute-related local features to reduce visual-semantic domain gaps and overcome issues with domain shifts. However, these techniques do not emphasize the commonality of features across different objects belonging to the same attribute, which is critical for identifying and distinguishing the attributes of unseen classes. In this study, we propose a novel ZSL method, termed dual-contrastive attribute embedding (DCAE), for generalized zero-shot learning. This approach simultaneously learns both class-level and attribute-level prototypes and representations. Specifically, an attribute embedding module is introduced to capture attribute-level features and an attribute semantic encoder is developed to generate attribute prototypes. Attribute-level and class-level contrastive loss terms are then used to optimize an attribute embedding space such that attribute features are compactly distributed around corresponding prototypes. This double contrastive learning mechanism facilitates the alignment of multimodal information from two dimensions. Extensive experiments with three benchmark datasets demonstrated the superiority of the proposed method compared to current state-of-the-art techniques.
- New
- Research Article
- 10.1037/met0000801
- Nov 3, 2025
- Psychological methods
- Benjamin Riordan + 6 more
Thanks to the popularity of smartphones with high-quality cameras and social media platforms, an exceptional amount of image data is generated and shared daily. This visual data can provide unprecedented insights into daily life and can be used to help answer research questions in psychology. However, the traditional methods used to analyze visual data are burdensome and are either time-intensive (e.g., content analysis) or require technical training (e.g., developing and training deep learning models). Zero-shot learning, where a pretrained model is used without any additional training, requires less technical expertise and may be a particularly attractive method for psychology researchers aiming to analyze image data. In this tutorial, we aim to provide an overview and step-by-step guide on how to analyze visual data with zero-shot learning. Specifically, we demonstrate how to use two popular models (Contrastive Language-Image Pretraining and Large Language and Vision Assistant) to identify a beverage in an image from a data set where we manipulated the type of beverage present, the setting, and the prominence of the beverage in the image (foreground, midground, background). To guide researchers through this process, we provide open code and data on GitHub and as a Google Colab notebook. Finally, we discuss how to interpret and report accuracy, how to create a validation data set, what steps need to be taken to implement the models with new data, and discuss future challenges and limitations of the method. To conclude, zero-shot learning requires less technical expertise and may be a particularly attractive method for psychology researchers aiming to analyze image data. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
- New
- Research Article
- 10.1093/bioinformatics/btaf572
- Nov 1, 2025
- Bioinformatics (Oxford, England)
- Yangyang Chen + 5 more
Deep generative methods based on language models have the capability to generate new data that resemble a given distribution and have begun to gain traction in ligand design. However, existing models face significant challenges when it comes to generating ligands for unseen targets, a scenario known as zero-shot learning. The ability to effectively generate ligands for novel targets is crucial for accelerating drug discovery and expanding the applicability of ligand design. Therefore, there is a pressing need to develop robust deep generative frameworks that can operate efficiently in zero-shot scenarios. In this study, we introduce ZeroGEN, a novel zero-shot deep generative framework based on protein sequences. ZeroGEN analyzes extensive data on protein-ligand inter-relationships and incorporates contrastive learning to align known protein-ligand features, thereby enhancing the model's understanding of potential interactions between proteins and ligands. Additionally, ZeroGEN employs self-distillation to filter the initially generated data, retaining only the ligands deemed reliable by the model. It also implements data augmentation techniques to aid the model in identifying ligands that match unseen targets. Experimental results demonstrate that ZeroGEN successfully generates ligands for unseen targets with strong affinity and desirable drug-like properties. Furthermore, visualizations of molecular docking and attention matrices reveal that ZeroGEN can autonomously focus on key residues of proteins, underscoring its capability to understand and generate effective ligands for novel targets. The source code and data of this work is freely available in the https://github.com/viko-3/ZeroGEN.
- New
- Research Article
- 10.1016/j.jbi.2025.104946
- Nov 1, 2025
- Journal of biomedical informatics
- Cheng Peng + 5 more
Scaling up biomedical vision-language models: Fine-tuning, instruction tuning, and multi-modal learning.
- New
- Research Article
- 10.1016/j.optlastec.2025.113036
- Nov 1, 2025
- Optics & Laser Technology
- Zhenmin Zhu + 4 more
Zero-shot and degradation aware learning for polarization super-resolution
- New
- Research Article
- 10.1016/j.engappai.2025.111633
- Nov 1, 2025
- Engineering Applications of Artificial Intelligence
- Wenlong Du + 5 more
A novel compositional zero-shot learning approach based on hierarchical multi-scale feature fusion
- New
- Research Article
- 10.1109/tpami.2025.3597668
- Nov 1, 2025
- IEEE transactions on pattern analysis and machine intelligence
- Qingsheng Wang + 5 more
Compositional Zero-Shot Learning (CZSL) aims to recognize unseen compositional concepts composed of seen single concepts. One of the problems of CZSL is to model attributes interacting with objects and objects interacting with attributes. In this work, we focus on this problem and propose Dual-Stream Conditional Network (DSCNet) that learns dual-stream conditional concepts as a solution, where the conditional visual and semantic embeddings of attributes and objects are learned. First, we argue that the condition of the attribute or object is supposed to contain the recognized object and input image, or the recognized attribute and input image. Next, for each concept which can either be an attribute or object, in the semantic stream, we propose to encode the recognized object or attribute semantic features and the input image visual features as the encoded condition, which is then injected into all concept semantic embeddings by a semantic cross encoder to acquire conditional semantic embeddings. In the visual stream, the conditional attribute or object visual embeddings are acquired by injecting the semantic features of the recognized object or attribute into the mapped attribute or object visual features. Experimental results on CZSL benchmarks demonstrate the superiority of our proposed method.
- New
- Research Article
- 10.1016/j.neucom.2025.131184
- Nov 1, 2025
- Neurocomputing
- Shuangyan Yin + 4 more
A shapelet-driven distillation generation method for generalized zero-shot learning in compound fault diagnosis
- New
- Research Article
- 10.1016/j.imavis.2025.105762
- Nov 1, 2025
- Image and Vision Computing
- Fuqin Deng + 6 more
GNN-based primitive recombination for compositional zero-shot learning
- New
- Research Article
- 10.1016/j.adhoc.2025.103984
- Nov 1, 2025
- Ad Hoc Networks
- Tuğçe Bilen
KDN-Driven zero-shot learning for intelligent self-healing in 6G small cell networks
- New
- Research Article
- 10.1016/j.pnucene.2025.105848
- Nov 1, 2025
- Progress in Nuclear Energy
- Ben Qi + 4 more
A transient detection framework in nuclear power plants using zero-shot learning based on digital twins
- New
- Research Article
- 10.1063/5.0283450
- Oct 29, 2025
- APL Machine Learning
- Alexey V Gulyuk + 3 more
Self-driving laboratories (SDLs) are transforming materials discovery by combining automation, machine learning, and real-time feedback. Yet, their success depends on robust data integration and fusion methods capable of handling materials data that are heterogeneous, sparse, and multi-scale. Such data span theoretical models, simulations, and experimental techniques across diverse spatial and temporal scales, creating significant challenges for interoperability and analysis. This perspective reviews the state-of-the-art techniques, including knowledge graphs, structured pipelines, multimodal machine learning, and physics-informed models, that are enabling materials science and SDLs to unify and learn from disparate data sources, identify critical challenges, and propose forward-looking directions to enhance data readiness, interoperability, and predictive power in SDLs. We also highlight emerging methods such as transformer architectures, zero-shot learning, and real-time stream processing, and discuss the critical need for more scalable, interpretable, and adaptive solutions to fully realize autonomous materials innovation. By mapping out both the current landscape and future opportunities, we argue that next-generation data integration and fusion are not just enablers but essential pillars for achieving fully autonomous, adaptive, and intelligent SDL systems capable of addressing the complexities of hierarchical and multifunctional materials.
- New
- Research Article
- 10.4108/eetiot.9404
- Oct 28, 2025
- EAI Endorsed Transactions on Internet of Things
- Panagiotis Savvidis + 1 more
The convergence of decentralized architectures integrating Machine Learning, Computer Vision and Low Power Wide Area Networks is increasingly becoming an integral part of our daily existence. Internet of Things serves as a real-time data conduit enhancing decision making via embedded technology and continuous data exchange. This paper explores the feasibility of Edge Computing as a foundational pillar in this evolving landscape. We experiment under real world, dynamic conditions, evaluate the technological aspects, strategies, process flows and key observations under the broad Edge Computing domain. Research pathways include Multi-access Edge topologies in future 6G networks, model quantization, and satellite-enhanced communication platforms. Additionally, a discussion is added supporting the advanced AI functionalities, including zero-shot learning, multi modal perception, and decentralized generative AI, thereby broadening the scope of intelligent applications across various domains. The significance and research objective of this study are threefold: (1) evaluation of LoRaWAN and satellite IoT communication strategies, (2) analysis of CV workloads on edge hardware and (3) future research directions where Edge Computing can support low-latency, energy-efficient and socially impactful IoT applications. By explicitly addressing these aspects, we aim to establish a clear link between the technological feasibility, ultimately with a practical and socioeconomic relevance.
- New
- Research Article
- 10.1037/met0000801.supp
- Oct 27, 2025
- Psychological Methods
Supplemental Material for How to Analyze Visual Data Using Zero-Shot Learning: An Overview and Tutorial
- New
- Research Article
- 10.1177/10497315251389554
- Oct 27, 2025
- Research on Social Work Practice
- Jia-Lin Zhao + 2 more
Purpose: The study checked the validity of the entry-level Chinese Social Work Certification Examination in 2024 and tested the social work knowledge of artificial intelligence (AI) by exploring the performance of two systems—ChatGPT 4.0 and ERNIE Bot 4.0. Method: We applied zero-shot learning to the testing models, determined the correctness of their answers based on the reference books, and analyzed their reasons behind. Result: The results show that both AIs passed the exam with scores around 80/100. The AIs answered most questions correctly with adequate reasoning. The reasons for incorrect answers included wrong knowledge, logical fallacies, misunderstanding of question contents, and neglecting the test instruction, which reflected the hallucinations in large-language models. ChatGPT had better reasoning abilities, while ERNIE Bot was more familiar with local contexts and policies. Discussion: The findings support the validity of the exam and also confirm the ability of AI in using social work knowledge.
- Research Article
- 10.1038/s43856-025-01116-x
- Oct 15, 2025
- Communications medicine
- Kai Zhang + 5 more
The vast amount of natural language clinical notes about patients with cancer presents a challenge for efficient information extraction, standardization, and structuring. Traditional NLP methods require extensive annotation by domain experts for each type of named entity and necessitate model training, highlighting the need for an efficient and accurate extraction method. This study introduces a tool based on the Large Language Model (LLM) for zero-shot information extraction from cancer-related clinical notes into structured data aligned with the minimal Common Oncology Data Elements (mCODE™) structure. We utilize the zero-shot learning capabilities of LLMs for information extraction, eliminating the need for data annotated by domain experts for training. Our methodology employs advanced hierarchical prompt engineering strategies to overcome common LLM limitations like token hallucination and accuracy issues. We tested the approach on 1,000 synthetic clinical notes representing various cancer types, comparing its performance to a traditional single-step prompting method. Our hierarchical prompt engineering strategy (accuracy = 94%, misidentification, and misplacement rate = 5%) outperforms the traditional prompt strategy (accuracy = 87%, misidentification, and misplacement rate = 10%) in information extraction. By unifying staging systems (e.g., TNM, FIGO) and specific stage details (e.g., Stage II) into a standardized framework, our approach achieves improved accuracy in extracting cancer stage information. Our approach demonstrates that LLMs, when guided by structured prompting, can accurately extract complex clinical information without the need for expert-labeled data. This method has the potential to harness unstructured data for advancing cancer research.
- Research Article
- 10.2196/64723
- Oct 15, 2025
- JMIR Formative Research
- Augustine Annan + 6 more
BackgroundIn the digital age, social media has become a crucial platform for public discourse on diverse health-related topics, including vaccines. Efficient sentiment analysis and hesitancy detection are essential for understanding public opinions and concerns. Large language models (LLMs) offer advanced capabilities for processing complex linguistic patterns, potentially providing valuable insights into vaccine-related discourse.ObjectiveThis study aims to evaluate the performance of various LLMs in sentiment analysis and hesitancy detection related to vaccine discussions on social media and identify the most efficient, accurate, and cost-effective model for detecting vaccine-related public sentiment and hesitancy trends.MethodsWe used several LLMs—generative pretrained transformer (GPT-3.5), GPT-4, Claude-3 Sonnet, and Llama 2—to process and classify complex linguistic data related to human papillomavirus; measles, mumps, and rubella; and vaccines overall from X (formerly known as Twitter), Reddit, and YouTube. The models were tested across different learning paradigms: zero-shot, 1-shot, and few-shot to determine their adaptability and learning efficiency with varying amounts of training data. We evaluated the models’ performance using accuracy, F1-score, precision, and recall. In addition, we conducted a cost analysis focused on token usage to assess the computational efficiency of each approach.ResultsGPT-4 (F1-score=0.85 and accuracy=0.83) outperformed GPT-3.5, Llama 2, and Claude-3 Sonnet across various metrics, regardless of the sentiment type or learning paradigm. Few-shot learning did not significantly enhance performance compared with the zero-shot paradigm. Moreover, the increased computational costs and token usage associated with few-shot learning did not justify its application, given the marginal improvement in model performance. The analysis highlighted challenges in classifying neutral sentiments and convenience, correctly interpreting sarcasm, and accurately identifying indirect expressions of vaccine hesitancy, emphasizing the need for model refinement.ConclusionsGPT-4 emerged as the most accurate model, excelling in sentiment and hesitancy analysis. Performance differences between learning paradigms were minimal, making zero-shot learning preferable for its balance of accuracy and computational efficiency. However, the zero-shot GPT-4 model is not the most cost-effective compared with traditional machine learning. A hybrid approach, using LLMs for initial annotation and traditional models for training, could optimize cost and performance. Despite reliance on specific LLM versions and a limited focus on certain vaccine types and platforms, our findings underscore the capabilities and limitations of LLMs in vaccine sentiment and hesitancy analysis, highlighting the need for ongoing evaluation and adaptation in public health communication strategies.
- Research Article
- 10.1002/mrm.70143
- Oct 14, 2025
- Magnetic resonance in medicine
- Yuting Chen + 10 more
To develop a new sequence, MIMOSA, for highly efficient T1, T2, T2*, proton density (PD), and source separation quantitative susceptibility mapping (QSM). MIMOSA was developed based on 3D quantification using an interleaved Look-Locker acquisition sequence with a T2 preparation pulse (3D-QALAS) by combining 3D turbo Fast Low Angle Shot (FLASH) and multi-echo gradient echo acquisition modules with a spiral-like Cartesian trajectory to facilitate highly efficient acquisition. Simulations were performed to optimize the sequence. A multi-contrast/-slice zero-shot self-supervised learning algorithm was employed for reconstruction. The accuracy of quantitative mapping was assessed by comparing MIMOSA with 3D-QALAS and reference techniques in both ISMRM/NIST phantom and in vivo experiments. MIMOSA's acceleration capability was assessed at R = 3.3, 6.5, and 11.8 in in vivo experiments, with repeatability assessed through scan-rescan studies. Beyond the 3 T experiments, mesoscale quantitative mapping was performed at 750 μm isotropic resolution at 7 T. Simulations demonstrated that MIMOSA achieved improved parameter estimation accuracy compared to 3D-QALAS. Phantom experiments indicated that MIMOSA exhibited better agreement with the reference techniques than 3D-QALAS. In vivo experiments demonstrated that an acceleration factor of up to R = 11.8-fold can be achieved while preserving parameter estimation accuracy, with intra-class correlation coefficients of 0.998 (T1), 0.973 (T2), 0.947 (T2*), 0.992 (QSM), 0.987 (paramagnetic susceptibility), and 0.977 (diamagnetic susceptibility) in scan-rescan studies. Whole-brain T1, T2, T2*, PD, and source separation QSM were obtained with 1 mm isotropic resolution in 3 min at 3 T and 750 μm isotropic resolution in 13 min at 7 T. MIMOSA demonstrated potential for highly efficient and repeatable multi-parametric mapping.
- Research Article
- 10.1093/jas/skaf300.022
- Oct 4, 2025
- Journal of Animal Science
- Ye Bi + 4 more
Abstract Accurate individual tracking in group-housed pigs is critical for high-precision phenotyping, enabling novel trait development. Current approaches for automatic pig tracking rely on video recordings analyzed by computer vision models, which require labor-intensive manual video annotation and costly model optimization. Recent advancements in zero-shot learning, especially through the integration of large vision models and language models, enable automated annotation and precise context-aware tracking. Grounded-SAM2 is a state-of-the-art vision-language model that ensembles Grounding-DINO and Segment Anything Model 2 (SAM2) to perform detection, segmentation, and tracking based on textual prompts. The objective of this study was to assess the merits of zero-shot artificial intelligence technology for the automatic tracking of individual pigs. Six top-view cameras were installed overlooking six pens of nursery pigs to record videos from 06:00 to 17:00 each of two days. The pigs used in this study were crossbred white pigs (L42 x L337, PIC), with an initial body weight of 5.86 ± 1.01 kg. Pigs were housed at a density of ten animals per pen and they were marked with painted numbers on their backs. From every 64 minutes, we extracted the first 5-minute-long video that was divided into 1-minute segments for tracking, resulting in a total of 550 segments. From each video, we extracted the first frame and used Grounded-SAM2 with the prompt “pig” to detect pig bodies with bounding boxes and segmentations, propagating them across subsequent frames to generate tracks. The tracks of individual pigs were recorded as auto-annotated videos (4,927 videos comprising 2,956,200 frames), with each video containing a single tracked pig, highlighted by its assigned mask. A human evaluator watched every 1-minute auto-annotated video and recorded the presence of one or more of four errors: incorrect mask (incorrect segmentation), duplicated label (same pig detected multiple times), ID switch (track switch between pigs), and lost track (failure to maintain a track). Completely correct tracks were observed in 87.38% ± 0.91% of videos on the first day and 83.74% ± 1.20% on the second day. Error rates on the first day were 6.65% ± 0.69% (incorrect mask), 6.80% ± 0.69% (duplicated label), 2.27% ± 0.41% (ID switch), and 0.45% ± 0.18% (lost track). On the second day, errors increased slightly: 8.93% ± 0.93%, 8.40% ± 0.90%, 2.02% ± 0.46%, and 0.21% ± 0.15%, respectively. These results are remarkably impressive, considering that no training or fine-tuning was performed, and the test videos are an entirely unseen dataset compared to the training data used for Grounded-SAM2.In conclusion, the zero-shot learning vision-language model proved to be a valuable tool for automatically segmenting and tracking pigs in group settings during the daytime without the need for manual annotation and task-specific fine-tuning.
- Research Article
- 10.62762/tis.2025.610574
- Oct 4, 2025
- ICCK Transactions on Intelligent Systematics
- Sheng Hong + 3 more
With the rapid development of multimodal large language models (MLLMs), the demand for structured event extraction (EE) in the field of scientific and technological intelligence is increasing. However, significant challenges remain in zero-shot multimodal and cross-language scenarios, including inconsistent cross-language outputs and the high computational cost of full-parameter fine-tuning. This study takes VideoLLaMA2 (VL2) and its improved version VL2.1 as the core models, and builds a multimodal annotated dataset covering English, Chinese, Spanish, and Russian (including 5,728 EE samples). It systematically evaluates the performance differences of zero-shot learning, and parameter-efficient fine-tuning (QLoRA) techniques. The experimental results show that for EE, by using the VL2 model and the VL2.1 in combination with QLoRA fine-tuning to it, the triggers accuracy rate can be increased to 65.48%, the arguments accuracy rate to 60.54%. The study confirms that fine-tuning significantly enhance model robustness.