Attention-Enhanced Deep Learning Framework for Automated Concrete Crack Depth Prediction Using Infrared Thermography

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Traditional manual methods for measuring concrete crack depth are inefficient, time-consuming, and heavily reliant on operator experience, often resulting in inconsistent and subjective outcomes. Moreover, most existing studies on crack characterization primarily emphasize surface-level parameters such as crack length, width, and area. The crack depth, a key indicator of structural integrity and residual load-bearing capacity, remains insufficiently addressed. To bridge this gap, this study proposes an automated crack depth prediction framework that integrates infrared thermography (IRT) with an enhanced SE-ResNet-18 deep learning model. Concrete beam specimens with precisely calibrated crack depths were fabricated under controlled laboratory conditions, and corresponding thermal images were acquired to establish a robust training dataset. By embedding a squeeze-and-excitation (SE) attention mechanism into the conventional ResNet-18 architecture, the model’s capacity to capture and emphasize salient thermal features was significantly improved, resulting in more accurate and stable depth predictions. Experimental results demonstrate that the proposed SE-ResNet-18 achieves 93.77% accuracy within a ±1[Formula: see text]mm tolerance, outperforming the baseline ResNet-18 network by a substantial margin. This solution is fully automated in its predictive analysis and noncontact in its sensing modality. It shows strong potential for practical implementation in real-world structural health monitoring and provides a foundation for future research on field-scale applications and model generalization under varying environmental conditions.

Similar Papers
  • Conference Article
  • Cite Count Icon 4
  • 10.1117/12.2585024
Imaging real cracks: evaluation of the depth and width of narrow fatigue cracks in and Al-alloys using laser-spot lock-in thermography
  • Apr 12, 2021
  • Mateu Colom + 3 more

Thermographic nondestructive techniques with focused laser excitation have proven as very efficient tools for the detection of narrow cracks. Moreover, it has been shown that in the ideal case of infinite cracks, the width of the crack can be assessed quantitatively using laser spot thermography, both in lock-in and pulsed regimes. In this ideal case, the surface temperature of the cracked material can be obtained analytically. However, real cracks feature finite penetration and length and, in these conditions, the calculation of the surface temperature needs to be performed numerically. In this work, we combine laser-spot lock-in thermography with finite elements modelling (FEM) to perform a full characterization of the local values of the width and depth of narrow cracks along the whole crack length in two Alalloys plates after fatigue test. First, in order to locate and image the crack, we combine the squares of the spatial derivatives of the amplitude thermograms along two perpendicular directions for different positions of the laser spot. Then, we place the laser close to the crack and we fit the numerical model to the amplitude data, so as to obtain the values of the width and depth of the crack at the current position of the laser. By displacing the laser spot at different positions along the crack length, we fully characterize the width and depth of the crack, whose resulting values are of the order of 1 µm and 0.5 mm, respectively.

  • Dissertation
  • 10.32657/10356/182221
Backdoor in deep learning: new threats and opportunities
  • Jan 1, 2025
  • Kangjie Chen

Deep learning has become increasingly popular due to its remarkable ability to learn high-dimensional feature representations. Numerous algorithms and models have been developed to enhance the application of deep learning across various real-world tasks, including image classification, natural language processing, and autonomous driving. However, deep learning models are susceptible to backdoor threats, where an attacker manipulates the training process or data to cause incorrect predictions on malicious samples containing specific triggers, while maintaining normal performance on benign samples. With the advancement of deep learning, including evolving training schemes and the need for large-scale training data, new threats in the backdoor domain continue to emerge. Conversely, backdoors can also be leveraged to protect deep learning models, such as through watermarking techniques. In this thesis, we conduct an in-depth investigation into backdoor techniques from three novel perspectives. In the first part of this thesis, we demonstrate that emerging deep learning training schemes can introduce new backdoor risks. Specifically, pre-trained Natural Language Processing (NLP) models can be easily adapted to a variety of downstream language tasks, significantly accelerating the development of language models. However, the pre-trained model becomes a single point of failure for these downstream models. We propose a novel task-agnostic backdoor attack against pre-trained NLP models, wherein the adversary does not need prior information about the downstream tasks when implanting the backdoor into the pre-trained model. Any downstream models transferred from this malicious model will inherit the backdoor, even after extensive transfer learning, revealing the severe vulnerability of pre-trained foundation models to backdoor attacks. In the second part of this thesis, we develop novel backdoor attack methods suited to new threat scenarios. The rapid expansion of deep learning models necessitates large-scale training data, much of which is unlabeled and outsourced to third parties for annotation. To ensure data security, most datasets are read-only for training samples, preventing the addition of input triggers. Consequently, attackers can only achieve data poisoning by uploading malicious annotations. In this practical scenario, all existing data poisoning methods that add triggers to the input are infeasible. Therefore, we propose new backdoor attack methods that involve poisoning only the labels without modifying any input samples. In the third part of this thesis, we utilize the backdoor technique to proactively protect our deep learning models, specifically for intellectual property protection. Considering the complexity of deep learning tasks, generating a well-trained deep learning model requires substantial computational resources, training data, and expertise. Therefore, it is essential to protect these assets and prevent copyright infringement. Inspired by backdoor attacks that can induce specific behaviors in target models through carefully designed samples, several watermarking methods have been proposed to protect the intellectual property of deep learning models. Model owners can train their models to produce unique outputs for certain crafted samples and use these samples for ownership verification. While various extraction techniques have been designed for supervised deep learning models, challenges arise when applying them to deep reinforcement learning models due to differences in model features and scenarios. Therefore, we propose a novel watermarking scheme to protect deep reinforcement learning models from unauthorized distribution. Instead of using spatial watermarks as in conventional deep learning models, we design temporal watermarks that minimize potential impact and damage to the protected deep reinforcement learning model while achieving high-fidelity ownership verification. In summary, this thesis investigates the evolving landscape of backdoor threats during the development of deep learning techniques and the use of backdoors for beneficial purposes in intellectual property protection.

  • Research Article
  • 10.1158/1538-7445.am2021-184
Abstract 184: The utility of deep metric learning for breast cancer identification on mammographic images
  • Jul 1, 2021
  • Cancer Research
  • Justin Du + 8 more

Purpose: Although deep learning (DL) models have shown increasing ability to accurately classify diagnostic images in oncology, significantly large amounts of well-curated data are often needed to match human level performance. Given the relative paucity of imaging datasets for less prevalent cancer types, there is an increasing need for methods which can improve the performance of deep learning models trained using limited diagnostic images. Deep metric learning (DML) is a potential method which can improve accuracy in deep learning models trained on limited datasets. Leveraging a triplet-loss function, DML exponentially increases training data compared to a traditional DL model. In this study, we investigated the utility of DML to improve the accuracy of DL models trained to classify cancerous lesions found on screening mammograms. Methods: Using a dataset of 2620 lesions found on routine screening mammogram, we trained both a traditional DL and DML models to classify suspicious lesions as cancerous or benign. The VGG16 architecture was used as the basis for the DL and DML models. Model performance was compared by calculating model accuracy, sensitivity, and specificity on a blinded test set of 378 lesions. In addition to individual model performance, we also measured agreement accuracy when both the DL and DML models were combined. Sub-analyses were conducted to identify phenotypes which were best suited for each model type. Both models underwent hyperparameters optimization to identify ideal batch size, learning rate, and regularization to prevent overfitting. Results: We found that the combination of the traditional DL model with DML model resulted in the highest overall accuracy (78.7%) representing a 7.1% improvement compared to the traditional DL model (p<.001). Alone, the traditional DL model had an improved accuracy compared to the DML model (71.4% vs 66.4%). The traditional DL model had a higher sensitivity (94.8% vs 73.6 %) , but lower specificity (34.7% vs 55.1%) compared the DML model. Sub-analyses suggested the traditional DL model was more accurate on higher density breasts, whereas the DML model was more accurate on lower density breasts. Additionally, the traditional DL model had the highest accuracy on oval shaped lesions, compared to the DML model which was most accurate on irregularly shaped breast lesions. Conclusion: Our study suggests that addition of DML models with traditional DL models can improve diagnostic image classification performance in cancer. Our results suggest DML models may provide increased specificity and help with classification of unique populations often misclassified by traditional DL models. Further studied investigating the utility of DML on other cancer imaging tasks are necessary to successfully build more robust DL models in cancer imaging. Citation Format: Justin Du, Sachin Umrao, Enoch Chang, Marina Joel, Aidan Gilson, Guneet Janda, Rachel Choi, Yongfeng Hui, Sanjay Aneja. The utility of deep metric learning for breast cancer identification on mammographic images [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 184.

  • Research Article
  • 10.33317/ssurj.699
Impact of Soil Heterogeneity on Deep Learning Performance Metrics for Surface Soil Moisture Prediction
  • Dec 29, 2025
  • Sir Syed University Research Journal of Engineering & Technology
  • Pascal Yamakili + 2 more

Although there is a growth of interest in academia in applying remote sensing and particularly Deep Learning (DL) model development techniques, in exploring and computing surface soil moisture over a large area, most of the state-of-the-art works in this field have mainly centered on improving model accuracy. Few works have focused on revealing the influence of soil heterogeneity. This study has clearly focused on evaluating the consequences of soil type heterogeneity on the performance of Deep Learning (DL) models. The study developed the DL models first enhanced by Residual Learning (RL) and Squeeze and Excitation (SE) mechanisms. The models developed also used a data fusion mechanism where Synthetic Aperture Radar (SAR), Normalized Difference Vegetation Index (NDVI), and Red, Green, Blue (RGB) imagery datasets were stacked together. Upon model completion, the study tested the effect of sand, clay, vegetated, and bare soilson the performance of the developed models. The Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) were the performance measurement metrics that were used to assess the effects of soil variation on the model’s performance. The Results indicate that predictive accuracy varies considerably across soil types, with vegetated soils achieving the lowest RMSE (0.0099 m³/m³) and bare soils the highest (0.014 m³/m³). Comparative analysis with standard models, including LightGBM, LSTM, and SMAP filters, proved that the enhanced model with SE and RL models outperformed the majority of state-of-the-art models, achieving lower error rates and improved performance metrics. The results in this study have stressed the influence of soil heterogeneity on deep learning model design and performance assessments. The results suggest that soil heterogeneity affects the conclusion and generalization of the performance of most DL models. Model’s performancemay vary depending on which soil type whose datasets were considered under similar experiments. This outcome provides a milestone and an overview toward the development and implementation of soil moisture models, which are to be developed under different soil types.

  • Research Article
  • 10.17515/resm2024.367me0725rs
Analyzing and examining the impact of various fiber types on the mechanical and functional characteristics of UHPC
  • Jan 1, 2024
  • Research on Engineering Structures and Materials
  • Mohammad Sadegh Shahid Zadeh + 2 more

In this research, we compared the performance and mechanical properties of (UHPC) produced with varying percentages of waste and recycled fibers, including polypropylene plastic (plastic sack fibers) (PP), polyethylene terephthalate (PET), date palm fibers (DP), Monterey pine tree fibers (MP), human hair (HH), and aluminum (from metal cans) (AF). This was done in relation to a control sample of UHPC (WC). The study investigated the effects of different percentages of these fibers (0, 0.5, 1, 1.5, and 2 percent by weight of cement) with a length of 3 cm in self-compacting concrete. We conducted tests on fresh concrete, including Slump Flow, J-Ring, V-Funnel, L-Box, and U-Box, as well as tests on hardened concrete, such as compressive strength, tensile strength, permeability, crack width control, thermal cracking, Schmidt hammer tests, and ultrasonic pulse velocity. The results indicated that increasing the percentage of fibers (PP, PET, DP, AF, MP, and HH) in UHPC enhances tensile strength, reduces permeability, and increases compressive strength. Additionally, under the influence of temperature, a decrease in both the depth and length of cracks was observed in the concrete slabs. Notably, the inclusion of 2% human hair fibers (HH) in UHPC resulted in superior tensile strength, reduced permeability, and minimized both the length and depth of cracks compared to other fiber types. Conversely, the addition of 2% aluminum fibers (AF) led to a reduction in tensile strength, an increase in permeability, and an increase in both the length and depth of cracks. This research demonstrated that, in terms of mechanical and functional properties, human hair fibers provided better results in UHPC across all tests conducted, significantly enhancing the longevity of the structure.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 23
  • 10.1038/s41598-024-66481-4
Explainable artificial intelligence (XAI) for predicting the need for intubation in methanol-poisoned patients: a study comparing deep and machine learning models
  • Jul 8, 2024
  • Scientific Reports
  • Khadijeh Moulaei + 14 more

The need for intubation in methanol-poisoned patients, if not predicted in time, can lead to irreparable complications and even death. Artificial intelligence (AI) techniques like machine learning (ML) and deep learning (DL) greatly aid in accurately predicting intubation needs for methanol-poisoned patients. So, our study aims to assess Explainable Artificial Intelligence (XAI) for predicting intubation necessity in methanol-poisoned patients, comparing deep learning and machine learning models. This study analyzed a dataset of 897 patient records from Loghman Hakim Hospital in Tehran, Iran, encompassing cases of methanol poisoning, including those requiring intubation (202 cases) and those not requiring it (695 cases). Eight established ML (SVM, XGB, DT, RF) and DL (DNN, FNN, LSTM, CNN) models were used. Techniques such as tenfold cross-validation and hyperparameter tuning were applied to prevent overfitting. The study also focused on interpretability through SHAP and LIME methods. Model performance was evaluated based on accuracy, specificity, sensitivity, F1-score, and ROC curve metrics. Among DL models, LSTM showed superior performance in accuracy (94.0%), sensitivity (99.0%), specificity (94.0%), and F1-score (97.0%). CNN led in ROC with 78.0%. For ML models, RF excelled in accuracy (97.0%) and specificity (100%), followed by XGB with sensitivity (99.37%), F1-score (98.27%), and ROC (96.08%). Overall, RF and XGB outperformed other models, with accuracy (97.0%) and specificity (100%) for RF, and sensitivity (99.37%), F1-score (98.27%), and ROC (96.08%) for XGB. ML models surpassed DL models across all metrics, with accuracies from 93.0% to 97.0% for DL and 93.0% to 99.0% for ML. Sensitivities ranged from 98.0% to 99.37% for DL and 93.0% to 99.0% for ML. DL models achieved specificities from 78.0% to 94.0%, while ML models ranged from 93.0% to 100%. F1-scores for DL were between 93.0% and 97.0%, and for ML between 96.0% and 98.27%. DL models scored ROC between 68.0% and 78.0%, while ML models ranged from 84.0% to 96.08%. Key features for predicting intubation necessity include GCS at admission, ICU admission, age, longer folic acid therapy duration, elevated BUN and AST levels, VBG_HCO3 at initial record, and hemodialysis presence. This study as the showcases XAI's effectiveness in predicting intubation necessity in methanol-poisoned patients. ML models, particularly RF and XGB, outperform DL counterparts, underscoring their potential for clinical decision-making.

  • Research Article
  • 10.1093/humrep/deab130.259
P–260 Towards better explainable deep learning models for embryo selection in ART
  • Aug 6, 2021
  • Human Reproduction
  • Ashu Sharma + 4 more

Study question Can heatmaps generated by occlusion explain the patterns learned by deep learning (DL) models classifying the embryo viability in ART? Summary answer Occlusion experiments generate heatmaps that reveal which regions in frames of time-lapse video (TLV) are more discriminative for classification and prediction by the DL models. What is known already DL has widely been explored in ART for embryo selection. Depending upon input (video or image), different DL models classifying embryo viability are developed. However, whether the prediction is based on actual input features or random guessing is unknown. The embryo selection in ART is subjective. If the intention is using DL models’ prediction to transfer, freeze or discard the embryo, explanations of how they interpret embryonic development features brings transparency and trust. In other areas, heatmaps are used for explaining DL predictions. The heatmaps can be a tool to understand patterns learned by DL models for embryo selection. Study design, size, duration We trained two separate DL models for predicting the presence of fetal heartbeat for the transferred embryos. We further used occlusion generated heatmaps to explain the predictions. For training, retrospective data was used. The input dataset consisted of 136 TLVs and corresponding patient data for 132 participants (128: single embryo transfers and 8: double embryo transfer) from both IVF and ICSI treatment. Each video was assessed by an embryologist. Participants/materials, setting, methods DL models (A as ResNet–18, B as VGG16) are trained for predicting the presence of fetal heartbeat on a single frame extracted from TLV after day three or later. Model A has a better recall (0.7) compared to B (0.5). Heatmaps explain the reason behind models’ recall rate by visually representing patterns learned by them. Using occlusion filter size 30*30 with stride 14 and size 50*50 with stride 25, we generate heatmaps for both models. Main results and the role of chance The heatmaps generated using occlusion can represent visually the patterns discovered by the DL models when predicting the presence of a fetal heartbeat. Using occlusion filter size 30*30 with stride 14, we verified that Model B has lower recall because the heatmaps show that the model finds redundant features present outside the embryo region in many input frames. It could be interpreted that either the model has not learned relevant patterns or is more robust to noise. This representation of DL models equips us in better decision-making, whether to consider or discard the prediction or rather train the model further, preprocess training data or change network architecture. The heatmaps revealed that for frames where significant patterns learned by the models are within the embryo region, more weight was given to specific features like the inner cell mass, trophectoderm and some parts within the zona pellucida. Moreover, the heat maps generated using occlusion are independent of the underlying model’s architecture as the same experiment settings were used for both models. For occlusion filter size 50*50 with stride 25, the expanse of input regions (in or outside the embryo) considered relevant could be visualized for both models A and B. Limitations, reasons for caution Heatmaps generated by occluding input regions give a visual representation of features in individual frames not directly on videos. Explaining DL models by heatmaps besides occlusion, other techniques (Grad-Cam) exist but were not evaluated. Furthermore, there is no quantitative measure for evaluating whether heatmaps are a good explanation or not. Wider implications of the findings: The heatmaps make the patterns discovered by DL models visually recognized and bring forth the prominent portions of embryo regions. This will again improve understanding and trust in DL models’ predictions. Visual representation of DL models using heatmaps enables interpreting a prediction, performing model analysis and determining scope for improvement. Trial registration number Not applicable

  • Research Article
  • 10.1186/s12885-025-14971-7
Deep multi-instance learning model based on gadoxetic acid-enhanced MRI for predicting microvascular invasion of hepatocellular carcinoma: a multicenter, retrospective study
  • Oct 22, 2025
  • BMC Cancer
  • Yi Luo + 7 more

ObjectiveMicrovascular invasion (MVI) is of great significance for the individualized treatment of hepatocellular carcinoma (HCC) and preoperative noninvasive prediction of MVI is still an urgent clinical problem. To explore the effects of different regions of interest (ROI) and image input dimensions on the performance of deep learning (DL) models, and to select the best result to develop and validate a DL model for preoperative prediction of MVI.Materials and methodsA total of 206 patients with pathologically confirmed HCC from three hospitals were retrospectively enrolled and divided into training, internal validation and external test set. Based on hepatobiliary phase images (HBP) of gadoxetic acid-enhanced MRI, 2D DL, 3D DL and 2.5D deep multi-instance learning (MIL) models were established. The receiver operating characteristic curve (ROC) was used to evaluate the predictive efficacy of the above models. Based on the optimal performance model, the T1WI-FS and T2WI-FS images were preprocessed correspondingly, and a multimodal prediction model including three sequences was constructed. The ROC, and decision curve were used to visualize the predictive ability of the model.ResultsCompared with 2D DL and 3D DL models, the 2.5D DL model based on all axial images of ROI had the highest performance, with the AUC values of 0.802 (95% CI, 0.669–0.936) and 0.759 (95% CI, 0.643–0.875) in the validation and test sets. The AUCs of the multimodal MRI model were 0.954 (95% CI, 0.920–0.989) in the training set, 0.857 (95% CI, 0.736–0.978) in the validation set, and 0.788 (95% CI, 0.681–0.895) in the test set.ConclusionThe DL model that selects all axial slices of intratumor and peritumor as input shows robust capability in predicting MVI, which is expected to help clinical decision-making of individualized treatment for HCC.Supplementary InformationThe online version contains supplementary material available at 10.1186/s12885-025-14971-7.

  • Research Article
  • Cite Count Icon 34
  • 10.1016/j.foodchem.2024.141999
Evaluation and process monitoring of jujube hot air drying using hyperspectral imaging technology and deep learning for quality parameters
  • Mar 1, 2025
  • Food Chemistry
  • Quancheng Liu + 8 more

Evaluation and process monitoring of jujube hot air drying using hyperspectral imaging technology and deep learning for quality parameters

  • Research Article
  • Cite Count Icon 11
  • 10.1016/j.jclepro.2023.137564
Incorporation of feature engineering and attention mechanisms into deep learning models to develop an early warning system for harmful algal blooms
  • May 23, 2023
  • Journal of Cleaner Production
  • Taeho Kim + 2 more

Incorporation of feature engineering and attention mechanisms into deep learning models to develop an early warning system for harmful algal blooms

  • Research Article
  • Cite Count Icon 41
  • 10.1016/j.eswa.2022.117268
A novel attention-based deep learning method for post-disaster building damage classification
  • Apr 20, 2022
  • Expert Systems with Applications
  • Chang Liu + 3 more

A novel attention-based deep learning method for post-disaster building damage classification

  • Research Article
  • 10.1080/17480272.2026.2620438
Crack localization and depth prediction in wood based on anisotropic velocity model of acoustic emission
  • Feb 3, 2026
  • Wood Material Science & Engineering
  • Yongyou Chen + 7 more

For identifying crack location and depth on wood surfaces, this study proposes a crack localization method based on the anisotropic velocity model of acoustic emission (AE) signals. First, AE signal propagation velocities in different directions were calculated based on time difference of arrival (TDOA), thereby establishing an anisotropic wave velocity model as a function of propagation angle. Second, eight cracks with varying depths were sequentially introduced at the same surface position with 5 mm increments, and the corresponding propagation path for each crack is reconstructed using the velocity model. A crack localization algorithm is developed based on the geometric relationship between the AE source and three sensors, enabling determination of the crack position range and subsequent prediction of crack location and depth. The results showed that the proposed method effectively identifies crack positions. When crack depth exceeds 10 mm, AE signal propagation paths change significantly, with relative errors between calculated and measured TDOA ranging from 5.0% to 17.6%. Base on this, the crack depth and location were calculated. The relative errors between the calculated and actual values range from −4.0 % to 16.7 % for depth and from 4.8 % to 13.3 % for location.

  • Research Article
  • Cite Count Icon 25
  • 10.1038/s41598-024-82931-5
Explainable artificial intelligence for stroke prediction through comparison of deep learning and machine learning models
  • Dec 28, 2024
  • Scientific Reports
  • Khadijeh Moulaei + 5 more

Failure to predict stroke promptly may lead to delayed treatment, causing severe consequences like permanent neurological damage or death. Early detection using deep learning (DL) and machine learning (ML) models can enhance patient outcomes and mitigate the long-term effects of strokes. The aim of this study is to compare these models, exploring their efficacy in predicting stroke. This study analyzed a dataset comprising 663 records from patients hospitalized at Hazrat Rasool Akram Hospital in Tehran, Iran, including 401 healthy individuals and 262 stroke patients. A total of eight established ML (SVM, XGB, KNN, RF) and DL (DNN, FNN, LSTM, CNN) models were utilized to predict stroke. Techniques such as 10-fold cross-validation and hyperparameter tuning were implemented to prevent overfitting. The study also focused on interpretability through Shapley Additive Explanations (SHAP). The evaluation of model’s performance was based on accuracy, specificity, sensitivity, F1-score, and ROC curve metrics. Among DL models, LSTM showed superior sensitivity at 96.15%, while FNN exhibited better specificity (96.0%), accuracy (96.0%), F1-score (95.0%), and ROC (98.0%) among DL models. For ML models, RF displayed higher sensitivity (99.9%), accuracy (99.0%), specificity (100%), F1-score (99.0%), and ROC (99.9%). Overall, RF outperformed all models, while DL models surpassed ML models in most metrics except for RF. DL models (CNN, LSTM, DNN, FNN) achieved sensitivities from 93.0 to 96.15%, specificities from 80.0 to 96.0%, accuracies from 92.0 to 96.0%, F1-scores from 87.34 to 95.0%, and ROC scores from 95.0 to 98.0%. In contrast, ML models (KNN, XGB, SVM) showed sensitivities between 29.0% and 94.0%, specificities between 89.47% and 96.0%, accuracies between 71.0% and 95.0%, F1-scores between 44.0% and 95.0%, and ROC scores between 64.0% and 95.0%. This study demonstrates the efficacy of DL and ML models in predicting stroke, with the RF models outperforming all others in key metrics. While DL models generally surpassed ML models, RF’s exceptional performance highlights the potential of combining these technologies for early stroke detection, significantly improving patient outcomes by preventing severe consequences like permanent neurological damage or death.

  • Research Article
  • Cite Count Icon 5
  • 10.7759/cureus.80872
Evaluating the Impact of Attention Mechanisms on a Fine-Tuned Neural Network for Magnetic Resonance Imaging Tumor Classification: A Comparative Analysis.
  • Mar 20, 2025
  • Cureus
  • Kian A Huang + 1 more

Background Magnetic resonance imaging (MRI) is essential for brain tumor diagnosis. Deep learning models, such as Residual Network 50 Version 2 (ResNet50V2), have demonstrated strong performance in tumor classification. However, integrating attention mechanisms may further enhance diagnostic accuracy. This study evaluates the impact of different attention mechanisms on a ResNet50V2-based MRI tumor classification model for distinguishing between meningioma, glioma, pituitary tumors, and cases with no tumor. Methods A ResNet50V2-based model was trained on 3,096 annotated MRI scans from a publicly available dataset on Kaggle. Five model configurations were evaluated: baseline ResNet50V2, Squeeze-and-Excitation (SE), Convolutional Block Attention Module (CBAM), Self-Attention (SA), and Attention Gated Network (AGNet). Performance was assessed using accuracy, area under the receiver operating characteristic curve (AUC), precision, and recall. Two-proportion Z-tests were conducted to compare classification accuracies among models. Results The SE-enhanced model achieved the highest classification performance, with an accuracy of 98.4% and an AUC of 1.00, outperforming the base ResNet50V2 (92.6%) and other attention-based frameworks (CBAM: 93.5%, SA: 91.6%, AGNet: 94.2%). Compared to the baseline model, the SE model also demonstrated improved meningioma and pituitary tumor classification(Z = 2.485, p = 0.013 and Z = 2.423, p = 0.015, respectively). Additionally, the SE model demonstrated superior precision and recall across all tumor classes. Conclusion Incorporating attention mechanisms significantly improves MRI-based tumor classification, with SE proving to be the most effective. These findings suggest that SE-enhanced models may improve diagnostic accuracy in both research and clinical applications. Future research should explore hybrid attention mechanisms, such as transformer-based models, and their broader applications in medical imaging.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.1038/s41598-020-79809-7
Development and validation of deep learning algorithms for automated eye laterality detection with anterior segment photography
  • Jan 12, 2021
  • Scientific Reports
  • Ce Zheng + 9 more

This paper aimed to develop and validate a deep learning (DL) model for automated detection of the laterality of the eye on anterior segment photographs. Anterior segment photographs for training a DL model were collected with the Scheimpflug anterior segment analyzer. We applied transfer learning and fine-tuning of pre-trained deep convolutional neural networks (InceptionV3, VGG16, MobileNetV2) to develop DL models for determining the eye laterality. Testing datasets, from Scheimpflug and slit-lamp digital camera photography, were employed to test the DL model, and the results were compared with a classification performed by human experts. The performance of the DL model was evaluated by accuracy, sensitivity, specificity, operating characteristic curves, and corresponding area under the curve values. A total of 14,468 photographs were collected for the development of DL models. After training for 100 epochs, the DL models of the InceptionV3 mode achieved the area under the receiver operating characteristic curve of 0.998 (with 95% CI 0.924–0.958) for detecting eye laterality. In the external testing dataset (76 primary gaze photographs taken by a digital camera), the DL model achieves an accuracy of 96.1% (95% CI 91.7%–100%), which is better than an accuracy of 72.3% (95% CI 62.2%–82.4%), 82.8% (95% CI 78.7%–86.9%) and 86.8% (95% CI 82.5%–91.1%) achieved by human graders. Our study demonstrated that this high-performing DL model can be used for automated labeling for the laterality of eyes. Our DL model is useful for managing a large volume of the anterior segment images with a slit-lamp camera in the clinical setting.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant