SimCLR-enhanced MammalHairNet: advancing species identification through hair scale classification
Abstract The aim of this study is to enhance the classification accuracy of mammalian hair scale images using deep learning techniques, particularly SimCLR (Simple Framework for Contrastive Learning of Visual Representations) pretraining with unlabeled data, providing reliable technical support for species identification. We created a mammalian scanning electron microscope image dataset of 9,953 valid images of 33 mammal species. Four segmentation models—U-Net, SegNet, DeepLabV3+, and Segment Anything Model (SAM)—were evaluated for performance. SAM achieved the highest segmentation accuracy overall, although minor errors were still observed in the delineation of scales with clearly defined edges. SimCLR was pretrained on 2 datasets: one containing all 33 species, and a subset of 25 species with over 200 images per species. These models achieved Top-1 accuracies of 94.64% and 94.40%, exceeding ResNet-50, EfficientNet_b0, and ViT-B/16 trained from scratch, comparable to transfer learning (pretrained on ImageNet) results of ResNet-50 and EfficientNet_b0, and superior to that of ViT-B/16. Score-CAM visualizations revealed that the network's attention was focused on image regions that correspond to the morphological features traditionally used for species identification, demonstrating strong biological interpretability. Additionally, t-SNE visualizations confirmed the model's ability to effectively distinguish between different species. Cosine distance–based clustering of species-specific scale features further highlighted interspecies similarity patterns and commonly misclassified species pairs. The results demonstrate that deep learning models, particularly with the integration of SimCLR pretraining, are highly effective in classifying mammalian hair scales, providing a reliable method for species identification.
- Research Article
- 10.1158/1538-7445.am2021-184
- Jul 1, 2021
- Cancer Research
Purpose: Although deep learning (DL) models have shown increasing ability to accurately classify diagnostic images in oncology, significantly large amounts of well-curated data are often needed to match human level performance. Given the relative paucity of imaging datasets for less prevalent cancer types, there is an increasing need for methods which can improve the performance of deep learning models trained using limited diagnostic images. Deep metric learning (DML) is a potential method which can improve accuracy in deep learning models trained on limited datasets. Leveraging a triplet-loss function, DML exponentially increases training data compared to a traditional DL model. In this study, we investigated the utility of DML to improve the accuracy of DL models trained to classify cancerous lesions found on screening mammograms. Methods: Using a dataset of 2620 lesions found on routine screening mammogram, we trained both a traditional DL and DML models to classify suspicious lesions as cancerous or benign. The VGG16 architecture was used as the basis for the DL and DML models. Model performance was compared by calculating model accuracy, sensitivity, and specificity on a blinded test set of 378 lesions. In addition to individual model performance, we also measured agreement accuracy when both the DL and DML models were combined. Sub-analyses were conducted to identify phenotypes which were best suited for each model type. Both models underwent hyperparameters optimization to identify ideal batch size, learning rate, and regularization to prevent overfitting. Results: We found that the combination of the traditional DL model with DML model resulted in the highest overall accuracy (78.7%) representing a 7.1% improvement compared to the traditional DL model (p<.001). Alone, the traditional DL model had an improved accuracy compared to the DML model (71.4% vs 66.4%). The traditional DL model had a higher sensitivity (94.8% vs 73.6 %) , but lower specificity (34.7% vs 55.1%) compared the DML model. Sub-analyses suggested the traditional DL model was more accurate on higher density breasts, whereas the DML model was more accurate on lower density breasts. Additionally, the traditional DL model had the highest accuracy on oval shaped lesions, compared to the DML model which was most accurate on irregularly shaped breast lesions. Conclusion: Our study suggests that addition of DML models with traditional DL models can improve diagnostic image classification performance in cancer. Our results suggest DML models may provide increased specificity and help with classification of unique populations often misclassified by traditional DL models. Further studied investigating the utility of DML on other cancer imaging tasks are necessary to successfully build more robust DL models in cancer imaging. Citation Format: Justin Du, Sachin Umrao, Enoch Chang, Marina Joel, Aidan Gilson, Guneet Janda, Rachel Choi, Yongfeng Hui, Sanjay Aneja. The utility of deep metric learning for breast cancer identification on mammographic images [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 184.
- Research Article
3
- 10.21271/zjpas.34.2.3
- Apr 12, 2022
- ZANCO JOURNAL OF PURE AND APPLIED SCIENCES
Comprehensive Study for Breast Cancer Using Deep Learning and Traditional Machine Learning
- Research Article
2
- 10.1007/s11042-020-09997-x
- Oct 10, 2020
- Multimedia Tools and Applications
The pedestrian re-identification problem (i.e., re-id) is essential and pre-requisite in multi-camera video surveillance studies, provided the fact that pedestrian targets need to be accurately re-identified across a network of multiple cameras with non-overlapping fields of views before other post-hoc high-level utilizations (i.e., tracking, behaviors analyses, activities monitoring, etc.) can be carried out. Driven by recent developments in deep learning techniques, the important re-id problem is often tackled via either deep discriminant learning or deep generative learning techniques. However, most contemporary deep learning-based models with tremendously deep structures are not easy to be trained because of the notorious vanishings gradient problem. In this study, a novel full-scaled deep discriminant learning model is proposed. The novelty of the full-scale model is significant, as three crucial concepts in designing a deep learning model, including depth, width, and cardinality, are all taken into consideration, simultaneously. Therefore, the new model needs not to be tremendously deep but is more convenient to be trained. Moreover, based on the new model, a novel deep metric learning method is proposed to further solve the important re-id problem. Technically, two algorithms either based on the conventional SGD (stochastic gradient descent) or an alternative more efficient PGD (proximal gradient descent) are both derived. For experimental analyses, the newly introduced full-scaled deep metric learning method has been comprehensively compared with dozens of popular re-id methods proposed from either deep learning or shallow learning perspectives. Several well-known public re-id datasets have been incorporated and rigorous statistical analyses have been carried out to compare all methods regarding their re-id performance. The superiority of the novel full-scaled deep metric learning method has been substantiated, from the statistical point of view.
- Research Article
6
- 10.1007/s00607-025-01485-0
- May 22, 2025
- Computing
This study comprehensively analyzes the application of innovative deep learning (DL) and machine learning (ML) techniques in smart energy management systems (EMSs), with an emphasis on load forecasting, demand response, and the development of smart energy sectors. The application of various ML and DL models were examined in over 200 studies from 2014 to 2024 in an electrical network's EMS to highlight the key benefits and advances made by each technology for the sustainable management systems in energy sector. The findings emphasize DL and ML models’ enhanced precision and predictive capabilities in load forecasting, their efficacy in enabling efficient demand response mechanisms, and their significance in supporting the development of smart energy sectors. Furthermore, recommendations are made based on the survey results to assist in incorporating these techniques into EMS frameworks, such as investment in data infrastructure, model training and validation, and collaboration between researchers, industry experts, and policymakers. The study also discusses the limitations identified in the literature, such as limited real-world implementations, challenges regarding quality and data availability, and the need for enhanced ML and DL model interpretability. Addressing these limitations can assist in increasing the application and efficacy of ML and DL techniques in EMSs, enabling a more efficient and sustainable energy landscape. Finally, this study facilitates researchers' exploration of ML and DL in energy management, highlighting relevant limitations, strengths, and alternative approaches associated with sustainable energy management. It also indicates potential future research directions for further investigation.
- Research Article
- 10.37943/22sksg8575
- Jun 30, 2025
- Scientific Journal of Astana IT University
This article investigates the identification of hate speech on social media using machine learning and deep learning techniques. The research uses metrics such as F-measure, AUC-ROC, precision, accuracy, and recall assessing the effectiveness of various tactics. The findings indicate that deep learning models, particularly the bidirectional long short-term memory (BiLSTM) architecture, consistently outperform other methods in categorization tasks. The research emphasizes the importance of sophisticated neural network designs in identifying the intricacies of hostile and offensive content online. The study offers insights for promoting early identification and prevention of cyberbullying, improving secure and inclusive online environments. Future research may explore real-time detection systems, hybrid approaches, or the integration of complementary components to enhance and improve innovative technology in tackling this significant social issue. A sample tweet was annotated by specialists who categorize tweets as hate speech, offensive language, or neutral. The researchers applied shallow learning methodologies and integrated word embeddings like Word2Vec and GloVe to enhance the efficacy of deep learning models. The results indicate that BiLSTM surpasses shallow learning methods in detecting hate speech on Twitter, highlighting the efficacy of deep learning approaches in recognizing and tracking hate speech on social media platforms. When comparing different deep learning and machine learning models on different datasets, the results reveal that deep learning techniques are usually more effective. A reasonably high level of accuracy is achieved by KNN and SVM among classical algorithms, whereas Naïve Bayes performs the poorest. While deep learning approaches provide better results, tree-based models such as Random Forest and Decision Trees offer more consistent accuracy. Models based on neural networks, such as LSTM, CNN, and BI-LSTM, perform well, with LSTM-based methods excelling in particular. The most successful strategy for classification problems is the model presented, which obtains the greatest accuracy, precision, recall, F1-score of 95%. The research aids in the development of advanced tools and methodologies to mitigate hate speech on social media and foster positive online interactions. Future research may investigate alternative deep learning architectures, such as transformers, to enhance hate speech detection efficacy. The advancement of interpretable AI methodologies for identifying hate speech and delivering transparent forecasts might enhance user confidence and facilitate better content moderation decisions.
- Research Article
16
- 10.1111/2041-210x.14294
- Feb 21, 2024
- Methods in Ecology and Evolution
Machine learning‐based behaviour classification using acceleration data is a powerful tool in bio‐logging research. Deep learning architectures such as convolutional neural networks (CNN), long short‐term memory (LSTM) and self‐attention mechanism as well as related training techniques have been extensively studied in human activity recognition. However, they have rarely been used in wild animal studies. The main challenges of acceleration‐based wild animal behaviour classification include data shortages, class imbalance problems, various types of noise in data due to differences in individual behaviour and where the loggers were attached and complexity in data due to complex animal‐specific behaviours, which may have limited the application of deep learning techniques in this area. To overcome these challenges, we explored the effectiveness of techniques for efficient model training: data augmentation, manifold mixup and pre‐training of deep learning models with unlabelled data, using datasets from two species of wild seabirds and state‐of‐the‐art deep learning model architectures. Data augmentation improved the overall model performance when one of the various techniques (none, scaling, jittering, permutation, time‐warping and rotation) was randomly applied to each data during mini‐batch training. Manifold mixup also improved model performance, but not as much as random data augmentation. Pre‐training with unlabelled data did not improve model performance. The state‐of‐the‐art deep learning models, including a model consisting of four CNN layers, an LSTM layer and a multi‐head attention layer, as well as its modified version with shortcut connection, showed better performance among other comparative models. Using only raw acceleration data as inputs, these models outperformed classic machine learning approaches that used 119 handcrafted features. Our experiments showed that deep learning techniques are promising for acceleration‐based behaviour classification of wild animals and highlighted some challenges (e.g. effective use of unlabelled data). There is scope for greater exploration of deep learning techniques in wild animal studies (e.g. advanced data augmentation, multimodal sensor data use, transfer learning and self‐supervised learning). We hope that this study will stimulate the development of deep learning techniques for wild animal behaviour classification using time‐series sensor data.
- Research Article
28
- 10.1111/1365-2478.13097
- Jun 6, 2021
- Geophysical Prospecting
ABSTRACTSignificant advances have been made towards fault detection using deep learning. However, the fault labelling of seismic data requires great human effort. The resulting small sample problem makes traditional deep learning methods difficult to achieve desired results. Existing research proposes to train a deep learning model with labelled synthetic seismic data to get good fault detection results. However, due to the complexity of the actual geological situation, there are inevitable differences between synthetic seismic data and real seismic data in many aspects such as seismic signal frequency, frequency of fault distribution and degree of noise disturbance, which lead to the fact that the deep learning model trained by synthetic seismic data is difficult to get good fault detection result in field data applications. We propose to use transfer learning to reduce the impact of data differences to solve this problem: part of the deep transfer learning model is used to learn fault‐related features. And the other part of the deep transfer learning model is used to mine common features between the real seismic data and the synthetic seismic data, which makes the deep transfer learning model more suitable for real seismic data. Compared with the latest research progress, our method can greatly improve the effect of fault detection without real data label, which can significantly save the cost of manual label processing.
- Research Article
184
- 10.1007/s00521-023-08957-4
- Sep 7, 2023
- Neural Computing and Applications
The current development in deep learning is witnessing an exponential transition into automation applications. This automation transition can provide a promising framework for higher performance and lower complexity. This ongoing transition undergoes several rapid changes, resulting in the processing of the data by several studies, while it may lead to time-consuming and costly models. Thus, to address these challenges, several studies have been conducted to investigate deep learning techniques; however, they mostly focused on specific learning approaches, such as supervised deep learning. In addition, these studies did not comprehensively investigate other deep learning techniques, such as deep unsupervised and deep reinforcement learning techniques. Moreover, the majority of these studies neglect to discuss some main methodologies in deep learning, such as transfer learning, federated learning, and online learning. Therefore, motivated by the limitations of the existing studies, this study summarizes the deep learning techniques into supervised, unsupervised, reinforcement, and hybrid learning-based models. In addition to address each category, a brief description of these categories and their models is provided. Some of the critical topics in deep learning, namely, transfer, federated, and online learning models, are explored and discussed in detail. Finally, challenges and future directions are outlined to provide wider outlooks for future researchers.
- Research Article
12
- 10.1038/s41598-024-66481-4
- Jul 8, 2024
- Scientific Reports
The need for intubation in methanol-poisoned patients, if not predicted in time, can lead to irreparable complications and even death. Artificial intelligence (AI) techniques like machine learning (ML) and deep learning (DL) greatly aid in accurately predicting intubation needs for methanol-poisoned patients. So, our study aims to assess Explainable Artificial Intelligence (XAI) for predicting intubation necessity in methanol-poisoned patients, comparing deep learning and machine learning models. This study analyzed a dataset of 897 patient records from Loghman Hakim Hospital in Tehran, Iran, encompassing cases of methanol poisoning, including those requiring intubation (202 cases) and those not requiring it (695 cases). Eight established ML (SVM, XGB, DT, RF) and DL (DNN, FNN, LSTM, CNN) models were used. Techniques such as tenfold cross-validation and hyperparameter tuning were applied to prevent overfitting. The study also focused on interpretability through SHAP and LIME methods. Model performance was evaluated based on accuracy, specificity, sensitivity, F1-score, and ROC curve metrics. Among DL models, LSTM showed superior performance in accuracy (94.0%), sensitivity (99.0%), specificity (94.0%), and F1-score (97.0%). CNN led in ROC with 78.0%. For ML models, RF excelled in accuracy (97.0%) and specificity (100%), followed by XGB with sensitivity (99.37%), F1-score (98.27%), and ROC (96.08%). Overall, RF and XGB outperformed other models, with accuracy (97.0%) and specificity (100%) for RF, and sensitivity (99.37%), F1-score (98.27%), and ROC (96.08%) for XGB. ML models surpassed DL models across all metrics, with accuracies from 93.0% to 97.0% for DL and 93.0% to 99.0% for ML. Sensitivities ranged from 98.0% to 99.37% for DL and 93.0% to 99.0% for ML. DL models achieved specificities from 78.0% to 94.0%, while ML models ranged from 93.0% to 100%. F1-scores for DL were between 93.0% and 97.0%, and for ML between 96.0% and 98.27%. DL models scored ROC between 68.0% and 78.0%, while ML models ranged from 84.0% to 96.08%. Key features for predicting intubation necessity include GCS at admission, ICU admission, age, longer folic acid therapy duration, elevated BUN and AST levels, VBG_HCO3 at initial record, and hemodialysis presence. This study as the showcases XAI's effectiveness in predicting intubation necessity in methanol-poisoned patients. ML models, particularly RF and XGB, outperform DL counterparts, underscoring their potential for clinical decision-making.
- Research Article
6
- 10.3389/fendo.2024.1296047
- Jun 4, 2024
- Frontiers in endocrinology
The main objective of this study is to assess the possibility of using radiomics, deep learning, and transfer learning methods for the analysis of chest CT scans. An additional aim is to combine these techniques with bone turnover markers to identify and screen for osteoporosis in patients. A total of 488 patients who had undergone chest CT and bone turnover marker testing, and had known bone mineral density, were included in this study. ITK-SNAP software was used to delineate regions of interest, while radiomics features were extracted using Python. Multiple 2D and 3D deep learning models were trained to identify these regions of interest. The effectiveness of these techniques in screening for osteoporosis in patients was compared. Clinical models based on gender, age, and β-cross achieved an accuracy of 0.698 and an AUC of 0.665. Radiomics models, which utilized 14 selected radiomics features, achieved a maximum accuracy of 0.750 and an AUC of 0.739. The test group yielded promising results: the 2D Deep Learning model achieved an accuracy of 0.812 and an AUC of 0.855, while the 3D Deep Learning model performed even better with an accuracy of 0.854 and an AUC of 0.906. Similarly, the 2D Transfer Learning model achieved an accuracy of 0.854 and an AUC of 0.880, whereas the 3D Transfer Learning model exhibited an accuracy of 0.740 and an AUC of 0.737. Overall, the application of 3D deep learning and 2D transfer learning techniques on chest CT scans showed excellent screening performance in the context of osteoporosis. Bone turnover markers may not be necessary for osteoporosis screening, as 3D deep learning and 2D transfer learning techniques utilizing chest CT scans proved to be equally effective alternatives.
- Research Article
52
- 10.1038/s41598-020-71914-x
- Sep 15, 2020
- Scientific Reports
Survivors following very premature birth (i.e., ≤ 32 weeks gestational age) remain at high risk for neurodevelopmental impairments. Recent advances in deep learning techniques have made it possible to aid the early diagnosis and prognosis of neurodevelopmental deficits. Deep learning models typically require training on large datasets, and unfortunately, large neuroimaging datasets with clinical outcome annotations are typically limited, especially in neonates. Transfer learning represents an important step to solve the fundamental problem of insufficient training data in deep learning. In this work, we developed a multi-task, multi-stage deep transfer learning framework using the fusion of brain connectome and clinical data for early joint prediction of multiple abnormal neurodevelopmental (cognitive, language and motor) outcomes at 2 years corrected age in very preterm infants. The proposed framework maximizes the value of both available annotated and non-annotated data in model training by performing both supervised and unsupervised learning. We first pre-trained a deep neural network prototype in a supervised fashion using 884 older children and adult subjects, and then re-trained this prototype using 291 neonatal subjects without supervision. Finally, we fine-tuned and validated the pre-trained model using 33 preterm infants. Our proposed model identified very preterm infants at high-risk for cognitive, language, and motor deficits at 2 years corrected age with an area under the receiver operating characteristic curve of 0.86, 0.66 and 0.84, respectively. Employing such a deep learning model, once externally validated, may facilitate risk stratification at term-equivalent age for early identification of long-term neurodevelopmental deficits and targeted early interventions to improve clinical outcomes in very preterm infants.
- Research Article
27
- 10.1148/radiol.2021203758
- Sep 7, 2021
- Radiology
Background The ability of deep learning (DL) models to classify women as at risk for either screening mammography-detected or interval cancer (not detected at mammography) has not yet been explored in the literature. Purpose To examine the ability of DL models to estimate the risk of interval and screening-detected breast cancers with and without clinical risk factors. Materials and Methods This study was performed on 25 096 digital screening mammograms obtained from January 2006 to December 2013. The mammograms were obtained in 6369 women without breast cancer, 1609 of whom developed screening-detected breast cancer and 351 of whom developed interval invasive breast cancer. A DL model was trained on the negative mammograms to classify women into those who did not develop cancer and those who developed screening-detected cancer or interval invasive cancer. Model effectiveness was evaluated as a matched concordance statistic (C statistic) in a held-out 26% (1669 of 6369) test set of the mammograms. Results The C statistics and odds ratios for comparing patients with screening-detected cancer versus matched controls were 0.66 (95% CI: 0.63, 0.69) and 1.25 (95% CI: 1.17, 1.33), respectively, for the DL model, 0.62 (95% CI: 0.59, 0.65) and 2.14 (95% CI: 1.32, 3.45) for the clinical risk factors with the Breast Imaging Reporting and Data System (BI-RADS) density model, and 0.66 (95% CI: 0.63, 0.69) and 1.21 (95% CI: 1.13, 1.30) for the combined DL and clinical risk factors model. For comparing patients with interval cancer versus controls, the C statistics and odds ratios were 0.64 (95% CI: 0.58, 0.71) and 1.26 (95% CI: 1.10, 1.45), respectively, for the DL model, 0.71 (95% CI: 0.65, 0.77) and 7.25 (95% CI: 2.94, 17.9) for the risk factors with BI-RADS density (b rated vs non-b rated) model, and 0.72 (95% CI: 0.66, 0.78) and 1.10 (95% CI: 0.94, 1.29) for the combined DL and clinical risk factors model. The P values between the DL, BI-RADS, and combined model's ability to detect screen and interval cancer were .99, .002, and .03, respectively. Conclusion The deep learning model outperformed in determining screening-detected cancer risk but underperformed for interval cancer risk when compared with clinical risk factors including breast density. © RSNA, 2021 Online supplemental material is available for this article. See also the editorial by Bae and Kim in this issue.
- Research Article
19
- 10.1016/j.solener.2023.111803
- Jun 21, 2023
- Solar Energy
Deep learning techniques for solar tracking systems: A systematic literature review, research challenges, and open research directions
- Conference Article
1
- 10.1109/iccke57176.2022.9960076
- Nov 17, 2022
Despite the continuous evolution and significant improvement of cybersecurity mechanisms, malware threats remain one of the most important concerns in cyberspace. Meanwhile, Android malware plays a big role in these ever-growing threats. In recent years, deep learning has become the dominant machine learning technique for malware detection and continues to make outstanding achievements. Deep graph representation learning is the task of embedding graph-structured data into a low-dimensional space using deep learning models. Recently, autoencoders have proven to be an effective way for deep representation learning. However, it is not straightforward to apply the idea of autoencoder to graph-structured data because of their irregular structure. In this paper, we present DroidMalGNN, a novel deep learning technique that combines autoencoders with graph neural networks (GNNs) to detect Android malware in an end-to-end manner. DroidMalGNN represents each Android application with an attributed function call graph (AFCG) that allows it to model complex relationships between data. For more efficiency, DroidMalGNN performs graph representation learning in a supervised manner where two autoencoders are trained with benign and malicious AFCGs separately. In this way, it generates two informative embedding vectors for each AFCG in a low-dimensional space and feeds them into a dense neural network to classify the AFCG as benign or malicious. Our experimental results show that DroidMalGNN can achieve good detection performance in terms of different evaluation measures.
- Research Article
- 10.1093/humrep/deab130.259
- Aug 6, 2021
- Human Reproduction
Study question Can heatmaps generated by occlusion explain the patterns learned by deep learning (DL) models classifying the embryo viability in ART? Summary answer Occlusion experiments generate heatmaps that reveal which regions in frames of time-lapse video (TLV) are more discriminative for classification and prediction by the DL models. What is known already DL has widely been explored in ART for embryo selection. Depending upon input (video or image), different DL models classifying embryo viability are developed. However, whether the prediction is based on actual input features or random guessing is unknown. The embryo selection in ART is subjective. If the intention is using DL models’ prediction to transfer, freeze or discard the embryo, explanations of how they interpret embryonic development features brings transparency and trust. In other areas, heatmaps are used for explaining DL predictions. The heatmaps can be a tool to understand patterns learned by DL models for embryo selection. Study design, size, duration We trained two separate DL models for predicting the presence of fetal heartbeat for the transferred embryos. We further used occlusion generated heatmaps to explain the predictions. For training, retrospective data was used. The input dataset consisted of 136 TLVs and corresponding patient data for 132 participants (128: single embryo transfers and 8: double embryo transfer) from both IVF and ICSI treatment. Each video was assessed by an embryologist. Participants/materials, setting, methods DL models (A as ResNet–18, B as VGG16) are trained for predicting the presence of fetal heartbeat on a single frame extracted from TLV after day three or later. Model A has a better recall (0.7) compared to B (0.5). Heatmaps explain the reason behind models’ recall rate by visually representing patterns learned by them. Using occlusion filter size 30*30 with stride 14 and size 50*50 with stride 25, we generate heatmaps for both models. Main results and the role of chance The heatmaps generated using occlusion can represent visually the patterns discovered by the DL models when predicting the presence of a fetal heartbeat. Using occlusion filter size 30*30 with stride 14, we verified that Model B has lower recall because the heatmaps show that the model finds redundant features present outside the embryo region in many input frames. It could be interpreted that either the model has not learned relevant patterns or is more robust to noise. This representation of DL models equips us in better decision-making, whether to consider or discard the prediction or rather train the model further, preprocess training data or change network architecture. The heatmaps revealed that for frames where significant patterns learned by the models are within the embryo region, more weight was given to specific features like the inner cell mass, trophectoderm and some parts within the zona pellucida. Moreover, the heat maps generated using occlusion are independent of the underlying model’s architecture as the same experiment settings were used for both models. For occlusion filter size 50*50 with stride 25, the expanse of input regions (in or outside the embryo) considered relevant could be visualized for both models A and B. Limitations, reasons for caution Heatmaps generated by occluding input regions give a visual representation of features in individual frames not directly on videos. Explaining DL models by heatmaps besides occlusion, other techniques (Grad-Cam) exist but were not evaluated. Furthermore, there is no quantitative measure for evaluating whether heatmaps are a good explanation or not. Wider implications of the findings: The heatmaps make the patterns discovered by DL models visually recognized and bring forth the prominent portions of embryo regions. This will again improve understanding and trust in DL models’ predictions. Visual representation of DL models using heatmaps enables interpreting a prediction, performing model analysis and determining scope for improvement. Trial registration number Not applicable
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.