Lightweight concrete crack detection for urban intelligent management and maintenance

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

The continuous growth of global infrastructure stock has elevated the importance of smart urban maintenance, with concrete crack detection emerging as a critical component for intelligent infrastructure management. To enhance detection efficiency in this domain, a lightweight deep-learning model named GSGAA-Yolo is proposed for concrete crack detection. Firstly, the backbone and neck networks were reconstructed using ghost convolution modules to streamline the network architecture. Then, a novel feature extraction module (GSAA-C3k2) was designed based on the slim-neck architecture, incorporating agent attention mechanisms to optimise the accuracy–efficiency balance. Finally, the SPPELAN module is introduced to strengthen multi-scale feature extraction capabilities through spatial pyramid processing. Experimental validation on public datasets demonstrated that the proposed GSGAA-Yolo achieved 88.2% mean average precision, outperforming the baseline YoloV11 model by 1.1%. Compared with the baseline, the optimised architecture reduced the parameter count by 24% and the computational load by 19% while maintaining comparable inference speed. Cross-dataset evaluation confirmed the model's robust generalisation and transfer learning capabilities, indicating high practical value for infrastructure maintenance applications.

Similar Papers
  • Research Article
  • Cite Count Icon 2
  • 10.1109/access.2022.3202222
Clickable Object Detection Network for a Wide Range of Mobile Screen Resolutions
  • Jan 1, 2022
  • IEEE Access
  • Boseon Kang + 2 more

Recently, as the development cycle of applications has been shortened, it is important to develop rapid and accurate application testing technology. Since application testing requires a lot of cost, mobile component detection technology using deep learning is essential to prevent the use of expensive human resources. In this paper, we shall propose a Clickable Object Detection Network (CODNet) for mobile component detection in a wide range of mobile screen resolutions. CODNet consists of three modules: feature extraction, deconvolution and prediction modules in order to provide performance improvement and scalability. The Feature Extraction module uses squeeze and excitation blocks to efficiently extract features and change the ratio of the input image to 1:2 most close to that of mobile screen. Deconvolution module provides feature map of various sizes by upsampling feature map through top-down pathway and lateral connections. The prediction module selects an anchor size suitable for the mobile environment using the Anchor Transfer block, among the set of anchor candidates obtained through the analysis of mobile dataset. Moreover, we shall improve object detection performance by building a new mobile screen dataset consisting of data collected from various resolutions and operating systems. We shall show that our model achieves competitive performance in mean average precision on our dataset compared to the other models.

  • Research Article
  • Cite Count Icon 2
  • 10.1021/acs.jcim.5c00167
Prediction of Chromatographic Retention Time of a Small Molecule from SMILES Representation Using a Hybrid Transformer-LSTM Model.
  • Mar 28, 2025
  • Journal of chemical information and modeling
  • Sargol Mazraedoost + 4 more

Accurate retention time (RT) prediction in liquid chromatography remains a significant consideration in molecular analysis. In this study, we explore the use of a transformer-based language model to predict RTs by treating simplified molecular input line entry system (SMILES) sequences as textual input, an approach that has not been previously utilized in this field. Our architecture combines a pretrained RoBERTa (robustly optimized BERT approach, a variant of BERT) with bidirectional long short-term memory (BiLSTM) networks to predict retention times in reversed-phase high-performance liquid chromatography (RP-HPLC). The METLIN small molecule retention time (SMRT) data set comprising 77,980 small molecules after preprocessing, was encoded using SMILES notation and processed through a tokenizer to enable molecular representation as sequential data. The proposed transformer-LSTM architecture incorporates layer fusion from multiple transformer layers and bidirectional sequence processing, achieving superior performance compared to existing methods with a mean absolute error (MAE) of 26.23 s, a mean absolute percentage error (MAPE) of 3.25%, and R-squared (R2) value of 0.91. The model's explainability was demonstrated through attention visualization, revealing its focus on key molecular features that can influence RT. Furthermore, we evaluated the model's transfer learning capabilities across ten data sets from the PredRet database, demonstrating robust performance across different chromatographic conditions with consistent improvement over previous approaches. Our results suggest that the hybrid model presents a valuable approach for predicting RT in liquid chromatography, with potential applications in metabolomics and small molecule analysis.

  • Research Article
  • 10.3233/jcm-247578
Crack detection method for concrete surface based on feature fusion
  • Aug 14, 2024
  • Journal of Computational Methods in Sciences and Engineering
  • Cheng Hong

In recent years, detection methods based on deep learning have received widespread attention in the field of concrete crack detection. In view of the shortcomings of traditional image detection methods, a concrete crack detection method based on feature fusion is proposed. The Fourier frequency domain processed image is used as the input of the deep learning neural network. The original time domain image and the frequency domain image are respectively input into two feature extraction modules to extract high-level features, and then the two features are fused to fully characterize the characteristics of the time domain and frequency domain, and finally the concrete crack detection results of the feature fusion are obtained. The performance of the proposed method is compared with VGG-16, AlexNet and DenseNet. Experiments show that the accuracy of the proposed method is higher than VGG-16, AlexNet and DenseNet. The proposed method has good results in concrete crack detection. To verify the generalization ability of the proposed model, the Concrete Crack Images for Classification data set was input into the proposed model for testing. The experimental results show that the proposed model has good generalization ability.

  • Research Article
  • Cite Count Icon 1
  • 10.3390/app15084432
FEPA-Net: A Building Extraction Network Based on Fusing the Feature Extraction and Position Attention Module
  • Apr 17, 2025
  • Applied Sciences
  • Yuexin Liu + 4 more

The extraction of buildings from remote sensing images is of crucial significance in urban management and planning, but it remains difficult to automatically extract buildings with precise boundaries from remote sensing images. In this paper, we propose the FEPA-Net network model, which integrates the feature extraction and position attention module for the extraction of buildings in remote sensing images. The suggested model is implemented by employing U-Net as a base model. Firstly, the number of convolutional operations in the model was increased to extract more abstract features of the objects on the ground; secondly, within the network, the ordinary convolution is substituted with the dilated convolution. This substitution aims to broaden the receptive field, with the primary intention of enabling the output of each convolution layer to incorporate a broader spectrum of feature information. Additionally, a feature extraction module is added to mitigate the loss of detailed features. Finally, the position attention module is introduced to obtain more context information. The model undergoes validation and analysis using the Massachusetts dataset and the WHU dataset. The experimental results demonstrate that the FEPA-Net model outperforms other comparative methods in quantitative evaluation. Specifically, compared to the U-Net model, the average cross-merge ratio on the two datasets improves by 1.41% and 1.43%, respectively. The comparison of the results shows that the FEPA-Net model effectively improves the accuracy of building extraction, reduces the phenomenon of wrong detection and omission, and can more clearly identify the building outline.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.3390/s23062907
ARTD-Net: Anchor-Free Based Recyclable Trash Detection Net Using Edgeless Module.
  • Mar 7, 2023
  • Sensors
  • Boseon Kang + 1 more

Due to the sharp increase in household waste, its separate collection is essential in order to reduce the huge amount of household waste, since it is difficult to recycle trash without separate collection. However, since it is costly and time-consuming to separate trash manually, it is crucial to develop an automatic system for separate collection using deep learning and computer vision. In this paper, we propose two Anchor-free-based Recyclable Trash Detection Networks (ARTD-Net) which can recognize overlapped multiple wastes of different types efficiently by using edgeless modules: ARTD-Net1 and ARTD-Net2. The former is an anchor-free based one-stage deep learning model which consists of three modules: centralized feature extraction, multiscale feature extraction and prediction. The centralized feature extraction module in backbone architecture focuses on extracting features around the center of the input image to improve detection accuracy. The multiscale feature extraction module provides feature maps of different scales through bottom-up and top-down pathways. The prediction module improves classification accuracy of multiple objects based on edge weights adjustments for each instance. The latter is an anchor-free based multi-stage deep learning model which can efficiently finds each of waste regions by additionally exploiting region proposal network and RoIAlign. It sequentially performs classification and regression to improve accuracy. Therefore, ARTD-Net2 is more accurate than ARTD-Net1, while ARTD-Net1 is faster than ARTD-Net2. We shall show that our proposed ARTD-Net1 and ARTD-Net2 methods achieve competitive performance in mean average precision and F1 score compared to other deep learning models. The existing datasets have several problems that do not deal with the important class of wastes produced commonly in the real world, and they also do not consider the complex arrangement of multiple wastes with different types. Moreover, most of the existing datasets have an insufficient number of images with low resolution. We shall present a new recyclables dataset which is composed of a large number of high-resolution waste images with additional essential classes. We shall show that waste detection performance is improved by providing various images with the complex arrangement of overlapped multiple wastes with different types.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 15
  • 10.1109/access.2022.3187185
Driver Behaviors Recognizer Based on Light-Weight Convolutional Neural Network Architecture and Attention Mechanism
  • Jan 1, 2022
  • IEEE Access
  • Duy-Linh Nguyen + 2 more

Driving is a set of behaviors that need high concentration. Sometimes these behaviors are dominated by other acts such as smoking, eating, drinking, talking, phone calls, adjusting the radio, or drowsiness. These are also the main causes of current traffic accidents. Therefore, developing applications to warn drivers in advance is essential. This research introduces a light-weight convolutional neural network architecture to recognize driver behaviors, helping the warning system to provide accurate information and to minimize traffic collisions. This network is a combination of feature extraction and classifier modules. The feature extraction module uses the advantages of the standard convolution layers, depthwise separable convolution layers, average pooling layers, and proposed adaptive connections to extract the feature maps. The benefit of the convolution block attention module is deployed in the feature extraction module that guides the network in learning the salient features. The classifier module is comprised of a global average pooling and softmax layer to calculate the probability of each class. The overall design optimizes the network parameters and maintains classification accuracy. The entire network is trained and evaluated on three benchmark datasets: the State Farm Distracted Driver Detection, the American University in Cairo version 1, and the American University in Cairo version 2. As a result, the accuracies on overall classes (ten classes) are 99.95&#x0025;, 95.57&#x0025;, and 99.61&#x0025;, respectively. Also, several video tests with VGA (Video Graphics Array), HD (High Definition), and FHD (Full High Definition) resolution were conducted, and they can be seen at <uri>https://bit.ly/3GY2iJl</uri>.

  • Research Article
  • Cite Count Icon 68
  • 10.1016/j.engappai.2022.105808
A deeper generative adversarial network for grooved cement concrete pavement crack detection
  • Jan 4, 2023
  • Engineering Applications of Artificial Intelligence
  • Jingtao Zhong + 7 more

A deeper generative adversarial network for grooved cement concrete pavement crack detection

  • Research Article
  • Cite Count Icon 4
  • 10.1088/1361-6501/ada786
Lightweight multi-scale encoder– decoder network with locally enhanced attention mechanism for concrete crack segmentation
  • Jan 28, 2025
  • Measurement Science and Technology
  • Shuai Dong + 5 more

Concrete surface crack detection and maintenance are crucial for ensuring structural safety. Deep learning-based techniques for detecting concrete cracks have become popular due to the quick advancement of artificial intelligence. However, the actual uses of these methods are limited due to issues like large model sizes and significant dependence on powerful computing hardware. To address these issues, this paper presents a lightweight multi-scale encoder–decoder network (LMED-Net) for crack detection of concrete structures. LMED-Net employs MobileNetV2 as the encoder for the initial feature extraction. A multi-scale feature extraction (MFE) module is developed and serially attached after the encoder for refining feature extraction. Finally, to strengthen the network’s perception of pixels surrounding the cracks, a novel enhanced attention mechanism (EAM) is deployed in the decoder. By improving the network’s attention to information within the crack regions, this mechanism keeps contextual information from being lost. Comparative experimental results show that the proposed network achieves an F1 score (F1) of 60.32% and a mean intersection over union (mIoU) of 71.04% on the crack forest dataset. On the DeepCrack dataset, the F1 and mIoU increase to 79.09% and 81.85% respectively. Notably, LMED-Net performs exceptionally well in crack segmentation since its model size and parameters count are much smaller than those of other image segmentation methods. Furthermore, ablation studies further validate the effectiveness of the proposed MFE module and EAM.

  • Research Article
  • Cite Count Icon 9
  • 10.1142/s0218126623502717
A Deep Learning and Morphological Method for Concrete Cracks Detection
  • May 13, 2023
  • Journal of Circuits, Systems and Computers
  • Qilin Jin + 4 more

Concrete crack detection is essential for infrastructure safety, and its detection efficiency and accuracy are the key issues. An improved YOLOV5 and three measurement algorithms are proposed in this paper, where the original prediction heads are replaced by Transformer Heads (TH) to expose the prediction potential with one self-attention model. Experiments show that the improved YOLOV5 effectively enhances the detection and classification of concrete cracks, and the Mean Average Precision (MAP) value of all classes increases to 99.5%. The first method is more accurate for small cracks, whilst the average width obtained based on the axial traverse correction method is more exact for large cracks. The crack width obtained from the concrete picture sample is the same as that obtained from the manual detection, with a deviation rate of 0–5.5%. This research demonstrates the recognition and classification of concrete cracks by integrating deep learning and machine vision with high precision and high efficiency. It is helpful for the real-time measurement and analysis of concrete cracks with potential safety hazards in bridges, high-rise buildings, etc.

  • Research Article
  • 10.58845/jstt.utt.2024.en.4.3.11-23
Enhancing concrete structure maintenance through automated crack detection: A computer vision approach
  • Sep 4, 2024
  • Journal of Science and Transport Technology
  • Nha Huu Nguyen + 3 more

This paper presents the development of an Artificial Intelligence (AI) and Machine Learning (ML) model designed to detect cracks on concrete surfaces. The objective is to enhance the automation, precision, and performance of crack detection using the computer vision algorithm. Employing a ML approach and the YOLOv9 algorithm, this study developed a system to accurately identify concrete cracks from a diverse dataset. A total of 16,301 images of concrete surfaces, balanced between those with and without cracks, were utilized. The dataset was split into various sets with different ratios to ensure comprehensive model training. A transfer-learning methodology was employed to optimize the model's performance. The accuracy of the model was measured in each experiment to determine the optimal result. The most successful experiment resulted in a model with a mean Average Precision (mAP) of 94.6%, a Precision of 94.1%, and a Recall of 88.4%. These results demonstrate the effectiveness of AI and ML in concrete crack detection.

  • Research Article
  • Cite Count Icon 1
  • 10.1117/1.jmi.11.4.044501
Examining feature extraction and classification modules in machine learning for diagnosis of low-dose computed tomographic screening-detected in vivo lesions.
  • Jul 9, 2024
  • Journal of medical imaging (Bellingham, Wash.)
  • David D Liang + 5 more

Medical imaging-based machine learning (ML) for computer-aided diagnosis of in vivo lesions consists of two basic components or modules of (i)feature extraction from non-invasively acquired medical images and (ii)feature classification for prediction of malignancy of lesions detected or localized in the medical images. This study investigates their individual performances for diagnosis of low-dose computed tomography (CT) screening-detected lesions of pulmonary nodules and colorectal polyps. Three feature extraction methods were investigated. One uses the mathematical descriptor of gray-level co-occurrence image texture measure to extract the Haralick image texture features (HFs). One uses the convolutional neural network (CNN) architecture to extract deep learning (DL) image abstractive features (DFs). The third one uses the interactions between lesion tissues and X-ray energy of CT to extract tissue-energy specific characteristic features (TFs). All the above three categories of extracted features were classified by the random forest (RF) classifier with comparison to the DL-CNN method, which reads the images, extracts the DFs, and classifies the DFs in an end-to-end manner. The ML diagnosis of lesions or prediction of lesion malignancy was measured by the area under the receiver operating characteristic curve (AUC). Three lesion image datasets were used. The lesions' tissue pathological reports were used as the learning labels. Experiments on the three datasets produced AUC values of 0.724 to 0.878 for the HFs, 0.652 to 0.965 for the DFs, and 0.985 to 0.996 for the TFs, compared to the DL-CNN of 0.694 to 0.964. These experimental outcomes indicate that the RF classifier performed comparably to the DL-CNN classification module and the extraction of tissue-energy specific characteristic features dramatically improved AUC value. The feature extraction module is more important than the feature classification module. Extraction of tissue-energy specific characteristic features is more important than extraction of image abstractive and characteristic features.

  • Research Article
  • 10.1371/journal.pone.0337318.r006
Aerial small target detection algorithm based on cross-scale separated attention
  • Nov 26, 2025
  • PLOS One
  • Ju Liang + 8 more

In UAV aerial photography scenarios, targets exhibit characteristics such as multi-scale distribution, a high proportion of small targets, complex occlusions, and strong background interference. These characteristics impose high demands on detection algorithms in terms of fine-grained feature extraction, cross-scale fusion capability, and occlusion resistance.The YOLOv11s model has significant limitations in practical applications: its feature extraction module has a single semantic representation, the traditional feature pyramid network has limited capability to detect multi-scale targets, and it lacks an effective feature compensation mechanism when targets are occluded.To address these issues, we propose a UAV aerial small target detection algorithm named UAS-YOLO (Universal Inverted Bottleneck with Adaptive BiFPN and Separated and Enhancement Attention module YOLO), which incorporates three key optimizations. First, an Adaptive Bidirectional Feature Pyramid Network (ABiFPN) is designed as the Neck structure. Through cross-scale connections and dynamic weighted fusion, ABiFPN adjusts weight allocation based on target scale characteristics, focusing on enhancing feature integration for scales related to small targets and improving multi-scale feature representation capability. Second, a Separated and Enhancement Attention Module (SEAM) is introduced to replace the original SPPF module. This module focuses on key target regions, enhances effective feature responses in unoccluded areas, and specifically compensates for information loss in occluded regions, thereby improving the detection stability of occluded small targets. Third, a Universal Inverted Bottleneck (UIB) structure is proposed, which is fused with the C3K2 module to form the C3K2_UIB module. By leveraging dynamic channel attention and spatial feature recalibration, C3K2_UIB suppresses background noise; although this increases parameters by 34%, it achieves improved detection accuracy through efficient feature selection, striking a balance between accuracy and complexity.Experimental results show that on the VisDrone2019 dataset and the TinyPerson dataset from Kaggle, the mean Average Precision (mAP) of the algorithm is increased by 4.9 and 2.1 percentage points, respectively. Moreover, it demonstrates greater advantages compared to existing advanced algorithms, effectively addressing the challenge of small target detection in complex UAV scenarios.

  • PDF Download Icon
  • Research Article
  • 10.3390/electronics13204017
Improved YOLOv7 Electric Work Safety Belt Hook Suspension State Recognition Algorithm Based on Decoupled Head
  • Oct 12, 2024
  • Electronics
  • Xiaona Xie + 4 more

Safety is the eternal theme of power systems. In view of problems such as time-consuming and poor real-time performance in the correct use of seat belt hooks by manual supervision operators in the process of power operation, this paper proposes an improved YOLOv7 seat belt hook suspension state recognition algorithm. Firstly, the feature extraction part of the YOLOv7 backbone network is improved, and the M-Spatial Pyramid Pooling Concurrent Spatial Pyramid Convolution (M-SPPCSPC) feature extraction module is constructed to replace the Spatial Pyramid Pooling Concurrent Spatial Pyramid Convolution (SPPCSPC) module of the backbone network, which reduces the amount of computation and improves the detection speed of the backbone network while keeping the sensory field of the backbone network unchanged. Second, a decoupled head, which realizes the confidence and regression frames separately, is introduced to alleviate the negative impact of the conflict between the classification and regression tasks, consequently improving the network detection accuracy and accelerating the network convergence. Ultimately, a dynamic non-monotonic focusing mechanism is introduced in the output layer, and the Wise Intersection over Union (WioU) loss function is used to reduce the competitiveness of high-quality anchor frames while reducing the harmful gradient generated by low-quality anchor frames, which ultimately improves the overall performance of the detection network. The experimental results show that the mean Average Precision (mAP@0.5) value of the improved network reaches 81.2%, which is 7.4% higher than that of the original YOLOv7, therefore achieving better detection results for multiple-state recognition of hooks.

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.heliyon.2024.e26145
MTD-YOLOv5: Enhancing marine target detection with multi-scale feature fusion in YOLOv5 model
  • Feb 1, 2024
  • Heliyon
  • W.E.I Lian-Suo + 2 more

MTD-YOLOv5: Enhancing marine target detection with multi-scale feature fusion in YOLOv5 model

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.3390/robotics12050137
Keypoint Detection and Description through Deep Learning in Unstructured Environments
  • Sep 30, 2023
  • Robotics
  • Georgios Petrakis + 1 more

Feature extraction plays a crucial role in computer vision and autonomous navigation, offering valuable information for real-time localization and scene understanding. However, although multiple studies investigate keypoint detection and description algorithms in urban and indoor environments, far fewer studies concentrate in unstructured environments. In this study, a multi-task deep learning architecture is developed for keypoint detection and description, focused on poor-featured unstructured and planetary scenes with low or changing illumination. The proposed architecture was trained and evaluated using a training and benchmark dataset with earthy and planetary scenes. Moreover, the trained model was integrated in a visual SLAM (Simultaneous Localization and Maping) system as a feature extraction module, and tested in two feature-poor unstructured areas. Regarding the results, the proposed architecture provides a mAP (mean Average Precision) in a level of 0.95 in terms of keypoint description, outperforming well-known handcrafted algorithms while the proposed SLAM achieved two times lower RMSE error in a poor-featured area with low illumination, compared with ORB-SLAM2. To the best of the authors’ knowledge, this is the first study that investigates the potential of keypoint detection and description through deep learning in unstructured and planetary environments.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.