Automatic underwater fish species classification with limited data using few-shot learning
Automatic underwater fish species classification with limited data using few-shot learning
- Research Article
1
- 10.18488/76.v10i2.3472
- Sep 15, 2023
- Review of Computer Engineering Research
In recent years, the utilization of deep learning techniques has been employed in the field of image recognition with the aim of improving performance. However, deep learning demands a substantial amount of labeled data for model training, a process that is both expensive and time-consuming. In order to tackle this particular difficulty, the approach of few-shot learning (FSL) has emerged as a viable alternative. FSL, or Few-Shot Learning, is a computational approach that aims to replicate the cognitive processes observed in humans. By using a small set of examples and experiences, FSL enables the acquisition of new concepts. Research in the field of FSL has investigated many approaches to extracting the highest amount of information from limited data or making use of affordable and easily accessible sources of information. Researchers have been incorporating outside data into FSL techniques more frequently. This paper conducts an in-depth exploration of leveraging semantic information to enhance few-shot learning. By reviewing papers from the last five years in WOS, IEEE, and Science Direct (some papers in arXiv are also used), this study delves into the strategies employed to bridge the gap between visual and semantic information. The review extends to encompass zero-shot learning, which is considered a subcategory of FSL, enriching the analysis. Moreover, this paper identifies the potential of employing semantic information to enhance fine-grained few-shot (FGFS) learning. Techniques such as direct projection and the application of generative adversarial networks (GANs) emerge as promising avenues to accomplish this enhancement.
- Conference Article
47
- 10.1117/12.2581496
- Feb 15, 2021
In the current COVID-19 pandemic situation, there is an urgent need to screen infected patients quickly and accurately. Using deep learning models trained on chest X-ray images can become an efficient method for screening COVID-19 patients in these situations. Deep learning approaches are already widely used in the medical community. However, they require a large amount of data to be accurate. The open-source community collectively has made efforts to collect and annotate the data, but it is not enough to train an accurate deep learning model. Few-shot learning is a sub-field of machine learning that aims to learn the objective with less amount of data. In this work, we have experimented with well-known solutions for data scarcity in deep learning to detect COVID-19. These include data augmentation, transfer learning, and few-shot learning, and unsupervised learning. We have also proposed a custom few-shot learning approach to detect COVID-19 using siamese networks. Our experimental results showcased that we can implement an efficient and accurate deep learning model for COVID-19 detection by adopting the few-shot learning approaches even with less amount of data. Using our proposed approach we were able to achieve 96.4% accuracy an improvement from 83% using baseline models.
- Book Chapter
1
- 10.1007/978-3-031-18461-1_18
- Oct 13, 2022
Skin cancer is a severe condition that should be detected early. The two most prevalent types of skin cancer include melanoma and non-melanoma. Melanoma has been identified as the utmost dangerous skin cancer. Yet, discriminating melanoma lesions from non-melanoma lesions has proven challenging. Several artificial intelligence-based strategies have been introduced in the literature to handle skin cancer detection, including deep learning and few-shot learning strategies. According to the evidence in the literature, deep learning algorithms are reported to perform well when trained on large datasets. However, they are only effective when the target domain has enough labeled samples; they do not ensure adequate network activation variables to adjust to new target regions rapidly when the target domain has insufficient data. Consequently, few-shot learning paradigms have been presented in the literature to promote learning from such limited amounts of labeled data. A search on PubMed from inception to 7 June 2022 for studies investigating the review of the application of deep learning and few-shot learning in the detection of skin cancer was performed via the use of title terms “Deep Learning” AND “Few-Shot Learning” AND “Skin Cancer Detection” AND “Review,” combined with title terms or MeSH terms “Deep Learning” AND “Few-Shot Learning” AND “Skin Cancer Detection” AND “Review,” with no limits on language or date of publication. We found no paper that has reviewed the application of deep learning and few-shot learning in detecting skin cancer. This paper, therefore, presents a brief overview of some of the most critical applications of deep learning and few-shot learning schemes in the detection of skin cancer lesions from skin image data.KeywordsArtificial intelligenceDeep learningFew-shot learningMelanomaSkin cancer detection
- Research Article
6
- 10.3390/plants12234043
- Nov 30, 2023
- Plants
Grain filling is essential for wheat yield formation, but is very susceptible to environmental stresses, such as high temperatures, especially in the context of global climate change. Grain RGB images include rich color, shape, and texture information, which can explicitly reveal the dynamics of grain filling. However, it is still challenging to further quantitatively predict the days after anthesis (DAA) from grain RGB images to monitor grain development. Results: The WheatGrain dataset revealed dynamic changes in color, shape, and texture traits during grain development. To predict the DAA from RGB images of wheat grains, we tested the performance of traditional machine learning, deep learning, and few-shot learning on this dataset. The results showed that Random Forest (RF) had the best accuracy of the traditional machine learning algorithms, but it was far less accurate than all deep learning algorithms. The precision and recall of the deep learning classification model using Vision Transformer (ViT) were the highest, 99.03% and 99.00%, respectively. In addition, few-shot learning could realize fine-grained image recognition for wheat grains, and it had a higher accuracy and recall rate in the case of 5-shot, which were 96.86% and 96.67%, respectively. Materials and Methods: In this work, we proposed a complete wheat grain dataset, WheatGrain, which covers thousands of wheat grain images from 6 DAA to 39 DAA, which can characterize the complete dynamics of grain development. At the same time, we built different algorithms to predict the DAA, including traditional machine learning, deep learning, and few-shot learning, in this dataset, and evaluated the performance of all models. Conclusions: To obtain wheat grain filling dynamics promptly, this study proposed an RGB dataset for the whole growth period of grain development. In addition, detailed comparisons were conducted between traditional machine learning, deep learning, and few-shot learning, which provided the possibility of recognizing the DAA of the grain timely. These results revealed that the ViT could improve the performance of deep learning in predicting the DAA, while few-shot learning could reduce the need for a number of datasets. This work provides a new approach to monitoring wheat grain filling dynamics, and it is beneficial for disaster prevention and improvement of wheat production.
- Research Article
41
- 10.1038/s41598-021-87557-5
- Apr 14, 2021
- Scientific Reports
Deep learning is quickly becoming a standard approach to solving a range of materials science objectives, particularly in the field of computer vision. However, labeled datasets large enough to train neural networks from scratch can be challenging to collect. One approach to accelerating the training of deep learning models such as convolutional neural networks is the transfer of weights from models trained on unrelated image classification problems, commonly referred to as transfer learning. The powerful feature extractors learned previously can potentially be fine-tuned for a new classification problem without hindering performance. Transfer learning can also improve the results of training a model using a small amount of data, known as few-shot learning. Herein, we test the effectiveness of a few-shot transfer learning approach for the classification of electron backscatter diffraction (EBSD) pattern images to six space groups within the left( {4/m overline {3} 2/m} right) point group. Training history and performance metrics are compared with a model of the same architecture trained from scratch. In an effort to make this approach more explainable, visualization of filters, activation maps, and Shapley values are utilized to provide insight into the model’s operations. The applicability to real-world phase identification and differentiation is demonstrated using dual phase materials that are challenging to analyze with traditional methods.
- Research Article
55
- 10.1088/2057-1976/ac53bd
- Feb 18, 2022
- Biomedical Physics & Engineering Express
Over the past few years, positron emission tomography/computed tomography (PET/CT) imaging for computer-aided diagnosis has received increasing attention. Supervised deep learning architectures are usually employed for the detection of abnormalities, with anatomical localization, especially in the case of CT scans. However, the main limitations of the supervised learning paradigm include (i) large amounts of data required for model training, and (ii) the assumption of fixed network weights upon training completion, implying that the performance of the model cannot be further improved after training. In order to overcome these limitations, we apply a few-shot learning (FSL) scheme. Contrary to traditional deep learning practices, in FSL the model is provided with less data during training. The model then utilizes end-user feedback after training to constantly improve its performance. We integrate FSL in a U-Net architecture for lung cancer lesion segmentation on PET/CT scans, allowing for dynamic model weight fine-tuning and resulting in an online supervised learning scheme. Constant online readjustments of the model weights according to the users’ feedback, increase the detection and classification accuracy, especially in cases where low detection performance is encountered. Our proposed method is validated on the Lung-PET-CT-DX TCIA database. PET/CT scans from 87 patients were included in the dataset and were acquired 60 minutes after intravenous 18F-FDG injection. Experimental results indicate the superiority of our approach compared to other state-of-the-art methods.
- Research Article
5
- 10.1021/acsomega.3c09735
- Feb 27, 2024
- ACS Omega
With the increasingly widespread application of deep learning technology in the field of coal mines, the image recognition of mine water inrush has become a hot research topic. Underground environments are complex, and images have a high noise and low brightness. Additionally, mine water inrush is accidental, and few actual image samples are available. Therefore, this paper proposes an algorithm that recognizes mine water inrush images based on few-shot deep learning. According to the characteristics of images with coal wall water seepage, a bilinear neural network was used to extract the image features and enhance the network's fine-grained image recognition. First, features were extracted using a bilinear convolutional neural network. Second, the network was pre-trained based on cosine similarity. Finally, the network was fine-tuned for the predicted image. For single-line feature extraction, the method is compared with big data and few-shot learning. According to the experimental results, the recognition rate reaches 95.2% for few-shot learning based on a bilinear neural network, thus demonstrating its effectiveness.
- Research Article
38
- 10.1002/agj2.21285
- Feb 17, 2023
- Agronomy Journal
Monitoring plant diseases is essential for farmers to secure crop quantity and quality. Deep learning has recently been applied to plant disease recognition to help farmers take prompt and proper actions to prevent reductions in crop quantity and quality. Generally, deep learning requires a large‐scale dataset with supervised information annotated often by specialists. However, because collecting plant disease images in natural environments is difficult and obtaining proper annotations from specialists is costly, deep learning is infeasible for plant disease recognition tasks. Few‐shot learning (FSL) is an alternative for plant disease recognition using prior knowledge. Although FSL has attracted considerable attention, comprehensive reports on the application of FSL methods for plant disease recognition are required. Here, we introduce FSL with its applications in plant disease recognition. We begin with an overview of computer vision tasks using machine learning and FSL. We provide practical examples of FSL applications. Utilizing these practical examples, we describe different approaches for data augmentation and FSL methods of embedding, multitask learning, transfer learning, and meta‐learning. Further, we summarize how models are optimized for performance with reference to existing studies. Finally, the advantages and disadvantages are discussed, along with potential challenges for FSL applications in plant disease recognition.
- Research Article
36
- 10.1002/nbm.5143
- Mar 24, 2024
- NMR in biomedicine
Magnetic resonance imaging (MRI) is a ubiquitous medical imaging technology with applications in disease diagnostics, intervention, and treatment planning. Accurate MRI segmentation is critical for diagnosing abnormalities, monitoring diseases, and deciding on a course of treatment. With the advent of advanced deep learning frameworks, fully automated and accurate MRI segmentation is advancing. Traditional supervised deep learning techniques have advanced tremendously, reaching clinical-level accuracy in the field of segmentation. However, these algorithms still require a large amount of annotated data, which is oftentimes unavailable or impractical. One way to circumvent this issue is to utilize algorithms that exploit a limited amount of labeled data. This paper aims to review such state-of-the-art algorithms that use a limited number of annotated samples. We explain the fundamental principles of self-supervised learning, generative models, few-shot learning, and semi-supervised learning and summarize their applications in cardiac, abdomen, and brain MRI segmentation. Throughout this review, we highlight algorithms that can be employed based on the quantity of annotated data available. We also present a comprehensive list of notable publicly available MRI segmentation datasets. To conclude, we discuss possible future directions of the field-including emerging algorithms, such as contrastive language-image pretraining, and potential combinations across the methods discussed-that can further increase the efficacy of image segmentation with limited labels.
- Research Article
2
- 10.3390/biomedicines12040741
- Mar 27, 2024
- Biomedicines
This study evaluated the utility of incorporating deep learning into the relatively novel imaging technique of wide-field optical coherence tomography angiography (WF-OCTA) for glaucoma diagnosis. To overcome the challenge of limited data associated with this emerging imaging, the application of few-shot learning (FSL) was explored, and the advantages observed during its implementation were examined. A total of 195 eyes, comprising 82 normal controls and 113 patients with glaucoma, were examined in this study. The system was trained using FSL instead of traditional supervised learning. Model training can be presented in two distinct ways. Glaucoma feature detection was performed using ResNet18 as a feature extractor. To implement FSL, the ProtoNet algorithm was utilized to perform task-independent classification. Using this trained model, the performance of WF-OCTA through the FSL technique was evaluated. We trained the WF-OCTA validation method with 10 normal and 10 glaucoma images and subsequently examined the glaucoma detection effectiveness. FSL using the WF-OCTA image achieved an area under the receiver operating characteristic curve (AUC) of 0.93 (95% confidence interval (CI): 0.912–0.954) and an accuracy of 81%. In contrast, supervised learning using WF-OCTA images produced worse results than FSL, with an AUC of 0.80 (95% CI: 0.778–0.823) and an accuracy of 50% (p-values < 0.05). Furthermore, the FSL method using WF-OCTA images demonstrated improvement over the conventional OCT parameter-based results (all p-values < 0.05). This study demonstrated the effectiveness of applying deep learning to WF-OCTA for glaucoma diagnosis, highlighting the potential of WF-OCTA images in glaucoma diagnostics. Additionally, it showed that FSL could overcome the limitations associated with a small dataset and is expected to be applicable in various clinical settings.
- Research Article
- 10.1007/s42452-025-08188-3
- Jan 11, 2026
- Discover Applied Sciences
Medical image analysis has benefited tremendously from advancements in deep learning algorithms, enabling accurate and efficient diagnoses of various medical conditions. However, the accuracy of these deep learning models depends on the size and diversity of the training dataset. In medical imaging, these datasets are often limited due to privacy constraints, infrequent conditions, and the substantial expense of expert annotation, resulting in models that exhibit poor generalization. To mitigate the limitation of data scarcity, privacy regulations, and lack of annotated data, Few-Shot Learning (FSL) has emerged as a viable option. FSL seeks to develop models capable of learning efficiently from minimal examples, employing methodologies, i.e., metric and meta-learning, to address the data limitation. FSL is especially beneficial for medical imaging applications related to rare diseases or personalized medicine. This research examines the current applications of FSL in medical image disease research. Therefore, potential research publications are identified using the databases IEEE Xplore, PubMed, the Association for Computing Machinery (ACM) Digital Library, and ScienceDirect during the period from 2011 to 2025. This study evaluated 332 publications and discovered that the FSL had been applied for brain tumor image analysis (101/332, 30.4 $$\%$$ ), followed by skin cancer (67/332, 20.2 $$\%$$ ) and breast cancer (66/332, 19.9 $$\%$$ ). Moreover, in medical image disease research, Siamese networks (307/332, 92.5 $$\%$$ ) were by far the most prevalent FSL model, and image data was the predominant data type (84.3%). Using FSL, the primary focus of medical disease research is diagnosis (95/332, 28.6 $$\%$$ ). Furthermore, data scarcity, class imbalance, difficulty in domain adaptation, and limited model generalization constitute 71% of the overall challenges. Our research distinguishes itself by conducting a broader quantitative analysis across 332 papers to identify disease-specific application patterns. Moreover, the applications and challenges are mapped to provide a pragmatic path for future research. This analysis concludes with concrete, actionable recommendations for developing medical images analysis, i.e., the creation of defined benchmarks, clinically verified generative models for data augmentation, and designed frameworks for multimodal and federated FSL. These recommendations aim to explicitly address the challenges, i.e., data scarcity, model interpretability, and translational impact, offering clear directions for future study.
- Research Article
48
- 10.1109/joe.2022.3221127
- Jan 1, 2024
- IEEE Journal of Oceanic Engineering
Acoustic imaging sonar systems are widely used for long-range underwater surveillance in various civilian and military applications. They provide 2-D images of underwater objects, even in turbid water conditions where optical underwater imaging systems fail. Achieving high accuracy in automatic deep learning based underwater image classification remains an open problem due to insufficient data availability, poor image resolution, low signal-to-noise ratio surroundings, etc. In this study, we conduct a comparative analysis of different advanced deep learning approaches, i.e., transfer learning and few-shot learning, to address the problem of automatic object classification in sonar images, using a few samples of data. Specifically, two metric learning-based approaches, i.e., siamese network and triplet network as well as library-based approaches, are studied under the few-shot learning paradigm. Extensive experiments are conducted on a novel custom-made dataset developed in-house, along with the publicly available SeabedObjectsKLSG dataset. In addition, the effectiveness of the sampling technique in handling class imbalance during model training is also investigated in this work. Our experimental results highlight that the few-shot learning based approach is a promising direction for future research on underwater image classification with a few samples.
- Research Article
22
- 10.1155/2020/3152174
- Dec 17, 2020
- Shock and Vibration
In recent years, deep learning has become a popular topic in the intelligent fault diagnosis of industrial equipment. In practical working conditions, how to realize intelligent fault diagnosis in the case of the different mechanical components with a tiny labeled sample is a challenging problem. That means training with one component sample but testing with another component sample has not been resolved. In this paper, we propose a deep convolutional nearest neighbor matching network (DC-NNMN) based on few-shot learning. The 1D convolution embedding network is constructed to extract the high-dimensional fault feature. The cosine distance is merged into the K-Nearest Neighbor method to model the distance distribution between the unlabeled sample from the query set and labeled sample from the support set in high-dimensional fault features. The multiple few-shot learning fault diagnosis tasks as the testing dataset are constructed, and then the network parameters are optimized through training in multiple tasks. Thus, a robust network model is obtained to classify the unknown fault categories in different components with tiny labeled fault samples. We use the CWRU bearing vibration dataset, the bearing vibration data selected from the Lab-built experimental platform, and another gearing vibration dataset for across components experiment to prove the proposed method. Experimental results show that the proposed method can achieve fault diagnosis accuracy of 82.19% for gearing and 82.63% for bearings with only one sample of each fault category. The proposed DC-NNMN model provides a new approach to solve the across components fault diagnosis in few-shot learning.
- Research Article
- 10.1158/1538-7445.sabcs22-p6-04-10
- Mar 1, 2023
- Cancer Research
Recurrence Prediction in Ductal Carcinoma In Situ (DCIS) Patients Using Generative Adversarial Network (GAN) Augmented Deep Learning Model Background: DCIS patients have an excellent overall survival rate and over-treatment is always a cause for concern due to potential side-effects. Standard clinicopathological factors (age, growth pattern, tumor size, margin status and grade) have been shown to have limited value in predicting recurrence and segregation of high and low risk patients. Early and accurate recurrence prediction would facilitate a more aggressive treatment policy for high-risk patients (mastectomy or adjuvant radiation therapy), and simultaneously reduce over-treatment of low-risk patients. In this work, we have developed a deep learning (DL) classification framework that predicts recurrence in DCIS patients from Tissue microarrays (TMAs) hematoxylin and eosin (H&E) images using a generative adversarial network (GAN) augmented deep learning (DL) classification model. A GAN is a class of DL models, in which two adversarial neural networks, generator and discriminator contest among each other to generate high quality images. During the adversarial training process, the generator learns to synthesize realistic images similar to those in the training set while the discriminator learns to distinguish between real and generated images. In recent years, high quality medical images have been generated by GAN models. To the best of our knowledge, this is the first time a GAN model has been used to generate H&E images to train a DL classification model to predict recurrence in DCIS patients. Materials and methods: The cohort was comprised of 68 DCIS patients, aged between 35-89 years, lesion size of 5-90 mm, with a mix of low (15%), intermediate (35%) and 50% high grade cases. Patients were treated with mastectomy and/or a combination of lumpectomy, radiation and hormone therapy. TMAs were constructed from 2mm cores (1-3 cores per patient) in consultation with a breast pathologist to create hematoxylin and eosin (H&E) images for further analysis. The cohort was split into independent training (n=50 patients, 10 with recurrences at 5years) and validation groups (n=18 patients, 6 with recurrences at 5years). TMA (H&E) images were divided into smaller image patches of size 256x256 to train a GAN to generate image patches. A DL classification network (Resnet-Inception v2) was trained using TMA image patches and aggressive image patches generated by GAN to predict recurrence. The ability to generate synthetic image patches of aggressive lesions permitted training of a large DL classification network and predict recurrence in DCIS patients. Importantly, manual annotation was not necessary for the process. Results: The DL classification model trained with both TMA and GAN generated image patches predicted recurrence with an AUC of 0.87, sensitivity of 0.83 and specificity of 0.91 in the validation dataset. The DL classification model trained with image patches from TMAs only predicted recurrence with an AUC of 0.81. Conclusions: The use of a GAN model to generate H&E images circumvents the needs for a large cohort and accurate labor-intensive manual annotation of histopathological images, which is often required for training a large DL classification model. The use of GAN generated aggressive image patches during training significantly improves recurrence prediction accuracy of the DL classification model. Validation in independent larger cohorts is ongoing, and if successful, could provide a novel assay for risk prediction that does not waste precious tissue samples. Citation Format: Ghose Soumya, Yesim Gokmen-Polar, Sanghee Cho, Elizabeth McDonough, Cynthia Davis, Jhimli Mitra, Zhanpan Zhang, Fiona Ginty, Sunil Badve. Recurrence Prediction in Ductal Carcinoma In Situ (DCIS) Patients from Tissue Microarrays (TMAs) [abstract]. In: Proceedings of the 2022 San Antonio Breast Cancer Symposium; 2022 Dec 6-10; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2023;83(5 Suppl):Abstract nr P6-04-10.
- Research Article
20
- 10.2196/44293
- May 4, 2023
- JMIR AI
BackgroundNatural language processing (NLP) has become an emerging technology in health care that leverages a large amount of free-text data in electronic health records to improve patient care, support clinical decisions, and facilitate clinical and translational science research. Recently, deep learning has achieved state-of-the-art performance in many clinical NLP tasks. However, training deep learning models often requires large, annotated data sets, which are normally not publicly available and can be time-consuming to build in clinical domains. Working with smaller annotated data sets is typical in clinical NLP; therefore, ensuring that deep learning models perform well is crucial for real-world clinical NLP applications. A widely adopted approach is fine-tuning existing pretrained language models, but these attempts fall short when the training data set contains only a few annotated samples. Few-shot learning (FSL) has recently been investigated to tackle this problem. Siamese neural network (SNN) has been widely used as an FSL approach in computer vision but has not been studied well in NLP. Furthermore, the literature on its applications in clinical domains is scarce.ObjectiveThe aim of our study is to propose and evaluate SNN-based approaches for few-shot clinical NLP tasks.MethodsWe propose 2 SNN-based FSL approaches, including pretrained SNN and SNN with second-order embeddings. We evaluate the proposed approaches on the clinical sentence classification task. We experiment with 3 few-shot settings, including 4-shot, 8-shot, and 16-shot learning. The clinical NLP task is benchmarked using the following 4 pretrained language models: bidirectional encoder representations from transformers (BERT), BERT for biomedical text mining (BioBERT), BioBERT trained on clinical notes (BioClinicalBERT), and generative pretrained transformer 2 (GPT-2). We also present a performance comparison between SNN-based approaches and the prompt-based GPT-2 approach.ResultsIn 4-shot sentence classification tasks, GPT-2 had the highest precision (0.63), but its recall (0.38) and F score (0.42) were lower than those of BioBERT-based pretrained SNN (0.45 and 0.46, respectively). In both 8-shot and 16-shot settings, SNN-based approaches outperformed GPT-2 in all 3 metrics of precision, recall, and F score.ConclusionsThe experimental results verified the effectiveness of the proposed SNN approaches for few-shot clinical NLP tasks.