Sequential Hybrid Integration of U-Net and Fully Convolutional Networks with Mask R-CNN for Enhanced Building Boundary Segmentation from Satellite Imagery
In the recent years, building boundary segmentation obtained significant advancement through using deep learning. The present algorithms, such as Convolutional Neural Network (CNN) are unable to detect buildings in challenging urban areas like occlusions. This study investigates the integration of U-Net and Fully Convolutional Networks (FCN) with Mask R-CNN to improve building boundary segmentation using high-resolution satellite imagery. A sequential hybrid approach has been developed for combining semantic and instant segmentation. The integration between the U-Net with Mask R-CNN has been achieved by feeding the segmentation result from the U-Net as an input into the Mask R-CNN. A similar procedure was applied in the integration of the FCN with Mask R-CNN. The integration of U-Net with Mask R-CNN resulted in an improvement in the recall by 9.9% and an increase by 4.3 % in the F1-score, demonstrating its capability in segmenting boundary precision and fine-grained details. Similarly, FCN combined with Mask R-CNN has shown an enhancement of recall by 9.9% and precision by 7.6%, assuring its capability in the capture of global context. Further analysis through comparison between integration U-Net with Mask R-CNN with results from previous studies, demonstrates that the proposed integration scheme outperforms the existing results. The performance evaluation across RGB and panchromatic datasets highlights the flexibility of these integrations by proving their efficiency in different applications. Despite the minor challenges that appeared in boundary alignment, the results brought out the potential of such hybrid models for applications in urban planning, cadastral mapping, and disaster management.
- Research Article
11
- 10.5194/isprs-archives-xlii-2-w13-155-2019
- Jun 4, 2019
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. Up-to-date 3D building models are important for many applications. Airborne very high resolution (VHR) images often acquired annually give an opportunity to create an up-to-date 3D model. Building segmentation is often the first and utmost step. Convolutional neural networks (CNNs) draw lots of attention in interpreting VHR images as they can learn very effective features for very complex scenes. This paper employs Mask R-CNN to address two problems in building segmentation: detecting different scales of building and segmenting buildings to have accurately segmented edges. Mask R-CNN starts from feature pyramid network (FPN) to create different scales of semantically rich features. FPN is integrated with region proposal network (RPN) to generate objects with various scales with the corresponding optimal scale of features. The features with high and low levels of information are further used for better object classification of small objects and for mask prediction of edges. The method is tested on ISPRS benchmark dataset by comparing results with the fully convolutional networks (FCN), which merge high and low level features by a skip-layer to create a single feature for semantic segmentation. The results show that Mask R-CNN outperforms FCN with around 15% in detecting objects, especially in detecting small objects. Moreover, Mask R-CNN has much better results in edge region than FCN. The results also show that choosing the range of anchor scales in Mask R-CNN is a critical factor in segmenting different scale of objects. This paper provides an insight into how a good anchor scale for different dataset should be chosen.
- Research Article
5
- 10.25165/j.ijabe.20231601.7173
- Jan 1, 2023
- International Journal of Agricultural and Biological Engineering
Objects in agricultural soils will seriously affect the farming operations of agricultural machinery. At present, it still relies on human experience to judge abnormal Gounrd-penetrting Radar (GPR) signals. It is difficult for traditional image processing technology to form a general positioning method for the randomness and diversity characteristics of GPR signals in soil. Although many scholars had researched a variety of image-processing techniques, most methods lack robustness. In this study, the deep learning algorithm Mask Region-based Convolutional Neural Network (Mask-RCNN) and a geometric model were combined to improve the GPR positioning accuracy. First, a soil stratification experiment was set to classify the physical parameters of the soil and study the attenuation law of electromagnetic waves. Secondly, a SOIL-GPR geometric model was proposed, which can be combined with Mask-RCNN's MASK geometric size to predict object sizes. The results have proved the effectiveness and accuracy of the model for position detection and evaluation of objects in soils; then use the improved Mask RCNN method to compare the feature extraction accuracy of U-Net and Fully Convolutional Networks (FCN); Finally, the operating speed of agricultural machinery was simulated and designed the A-B survey line experiment. The detection accuracy was evaluated by several indicators, such as the survey line direction, soil depth false alarm rate, Mean Average Precision (mAP), and Intersection over Union (IoU). The results showed that pixel-level segmentation and positioning based on Mask RCNN can improve the accuracy of the position detection of objects in agricultural soil effectively, and the average error of depth prediction is 2.87 cm. The results showed that the detection technology proposed in this study integrates the advantage of soil environmental parameters, geometric models, and artificial intelligence algorithms to provide a high-precision and technical solution for the GPR non-destructive detection of soils. Keywords: foreign object, soil object, position, agricultural soil, Mask R-CNN, GPR image DOI: 10.25165/j.ijabe.20231601.7173 Citation: Li Y H, Wang C F, Wang C Y, Deng X L, Chen S D, Zhao Z X, et al. Detection of the foreign object positions in agricultural soils using Mask-RCNN. Int J Agric & Biol Eng, 2023; 16(1): 220–231.
- Research Article
- 10.4015/s1016237225300019
- Mar 26, 2025
- Biomedical Engineering: Applications, Basis and Communications
Early identification is essential to prevent blindness in glaucoma patients since the disease is an optic neuropathy that, if left untreated, leads to irreversible vision loss. Because it makes it possible to analyze important structural features such as the Optic Cup and Disc (OC and OD), whose morphological alterations are markers of disease progression, retinal fundus imaging is essential for glaucoma diagnosis. Deep learning breakthroughs have led to improvements in segmentation approaches, which help early diagnosis and therapy by examining retinal structures with greater accuracy and efficiency. Numerous methods for segmenting retinal fundus images are reviewed in this paper, including those for Optic Nerve Head (ONH), retinal vasculature, OD, OC, and macula segmentation. In addition to cutting-edge deep learning designs like U-Net, Mask R-CNN, Fully Convolutional Networks (FCN), and DeepLab, conventional techniques like thresholding and region growth are investigated. By improving segmentation precision with machine and deep learning algorithms, less manual interpretation by human experts is required. A comparison of various methods is given, with consideration to datasets, performance measures, advantages, and disadvantages. While Graph Convolutional Networks (GCNs) and Convolutional Neural Networks (CNNs) perform well in tasks like glaucoma detection and classification, more notable approaches include UNet and Mask RCNN, which have proven to have enormous potential in OD/OC segmentation. In addition, the paper discusses the obstacles that glaucoma detection currently faces, such as inconsistent images, limited datasets, and the requirement for more broadly applicable models. To advance automated glaucoma detection and enhance patient care, several challenges must be overcome.
- Research Article
107
- 10.1016/j.tust.2019.103156
- Oct 24, 2019
- Tunnelling and Underground Space Technology
Deep learning–based image instance segmentation for moisture marks of shield tunnel lining
- Research Article
12
- 10.21037/atm-21-5822
- Dec 1, 2021
- Annals of Translational Medicine
BackgroundLiver segmentation in computed tomography (CT) imaging has been widely investigated as a crucial step for analyzing liver characteristics and diagnosing liver diseases. However, obtaining satisfactory liver segmentation performance is highly challenging because of the poor contrast between the liver and its surrounding organs and tissues, the high levels of CT image noise, and the wide variability in liver shapes among patients.MethodsTo overcome these challenges, we propose a novel method for liver segmentation in CT image sequences. This method uses an enhanced mask region-based convolutional neural network (Mask R-CNN) with graph-cut segmentation. Specifically, the k-nearest neighbor (k-NN) algorithm is employed to cluster the target liver pixels in order to get an appropriate aspect ratio. Then, anchors are adapted to the liver size using the ratio information. Thus, high-accuracy liver localization can be achieved using the anchors and rotation-invariant object recognition. Next, a fully convolutional network (FCN) is used to segment the foreground objects, and local fine-grained liver detection is realized by pixel prediction. Finally, a whole liver mask is obtained by Mask R-CNN proposed in this paper.ResultsWe proposed a Mask R-CNN algorithm which achieved superior performance in comparison with the conventional Mask R-CNN algorithms in term of the dice similarity coefficient (DSC), and the Medical Image Computing and Computer-Assisted Intervention (MICCAI) metrics.ConclusionsOur experimental results demonstrate that the improved Mask R-CNN architecture has good performance, accuracy, and robustness for liver segmentation in CT image sequences.
- Research Article
119
- 10.1016/j.autcon.2021.103830
- Jul 28, 2021
- Automation in Construction
Automatic recognition of tunnel lining elements from GPR images using deep convolutional networks with data augmentation
- Research Article
- 10.55041/ijsrem49672
- Jun 9, 2025
- INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
Abstract—This research presents an integrated framework for satellite image retrieval, segmentation, and visualization, utilizing Python-based algorithms and geospatial data processing methods. It incorporates Google Maps-based image acquisition, a Tkinter-powered interactive GUI, and the U-Net segmentation model to efficiently process satellite imagery. Satellite image segmentation, powered by U-Net, plays a vital role in geospatial analysis, enabling automated classification and extraction of key features from high-resolution imagery. The framework facilitates seamless high-resolution map generation, segmentation, and an- alytical visualization through interactive tools like bar graphs and pie charts. By leveraging U-Net’s robust architecture, this implementation enhances segmentation accuracy and supports applications in urban planning, environmental monitoring, and disaster management. Through the automation of retrieval, processing, and visualization of satellite imagery, this research advances geospatial intelligence and provides a scalable solution for efficient image segmentation workflows. Keywords—Satellite Image Segmentation, Land Cover Classi- fication, Remote Sensing, Image Processing, Semantic Segmen- tation, Deep Learning Models, Convolutional Neural Networks (CNNs), U-Net Architecture, Geospatial Analysis, Land Use Analysis
- Research Article
- 10.3390/rs17050824
- Feb 26, 2025
- Remote Sensing
Change detection is an important technique that identifies areas of change by comparing images of the same location taken at different times, and it is widely used in urban expansion monitoring, resource exploration, land use detection, and post-disaster monitoring. However, existing change detection methods often struggle with balancing the extraction of fine-grained spatial details and effective semantic information integration, particularly for high-resolution remote sensing imagery. This paper proposes a high-resolution remote sensing image change detection model called FFLKCDNet (First Fusion Large-Kernel Change Detection Network) to solve this issue. FFLKCDNet features a Bi-temporal Feature Fusion Module (BFFM) to fuse remote sensing features from different temporal scales, and an improved ResNet network (RAResNet) that combines large-kernel convolution and multi-attention mechanisms to enhance feature extraction. The model also includes a Contextual Dual-Land-Cover Attention Fusion Module (CD-LKAFM) to integrate multi-scale information during the feature recovery stage, improving the resolution of details and the integration of semantic information. Experimental results showed that FFLKCDNet outperformed existing methods on datasets such as GVLM, SYSU, and LEVIR, achieving superior performance in metrics such as Kappa coefficient, mIoU, MPA, and F1 score. The model achieves high-precision change detection for remote sensing images through multi-scale feature fusion, noise suppression, and fine-grained information capture. These advancements pave the way for more precise and reliable applications in urban planning, environmental monitoring, and disaster management.
- Preprint Article
- 10.20944/preprints202507.1048.v1
- Jul 15, 2025
Urban roof segmentation plays a pivotal role in applications such as urban planning, infrastructure management, and renewable energy deployment. This study explores the evolution of deep learning techniques from traditional Convolutional Neural Networks (CNNs) to cutting-edge Transformer-based models in the context of roof segmentation from satellite imagery. We highlight the limitations of conventional methods when applied to urban environments, including resolution constraints and the complexity of roof structures. To address these challenges, we evaluate two advanced deep learning models: Mask R-CNN and MaskFormer, which have shown significant promise in accurately segmenting roofs, even in dense urban settings with diverse roof geometries. These models, especially the one based on transformers, offer improved segmentation accuracy by capturing both global and local image features, enhancing their performance in tasks where fine detail and contextual awareness are critical. A case study on Ben Guerir City in Morocco, an urban area experiencing rapid development, serves as the foundation for testing these models. Using high-resolution satellite imagery, the segmentation results offer a deeper understanding of the accuracy and effectiveness of these models, particularly in optimizing urban planning and renewable energy assessments. Quantitative metrics such as Intersection over Union (IoU), precision, recall, and F1-score are used to benchmark model performance. Mask R-CNN achieved a mean IoU of 74.6%, precision of 81.3%, recall of 78.9%, and F1-score of 80.1%. MaskFormer outperformed Mask R-CNN, reaching a mean IoU of 79.8%, precision of 85.6%, recall of 82.7%, and F1-score of 84.1%, highlighting the transformative potential of transformer-based architectures for scalable and precise urban imaging. The study also outlines future work in 3D modelling and height estimation, positioning these advancements as critical tools for sustainable urban development.
- Research Article
31
- 10.3389/fpubh.2022.981019
- Aug 25, 2022
- Frontiers in Public Health
One of the primary factors contributing to death across all age groups is cardiovascular disease. In the analysis of heart function, analyzing the left ventricle (LV) from 2D echocardiographic images is a common medical procedure for heart patients. Consistent and accurate segmentation of the LV exerts significant impact on the understanding of the normal anatomy of the heart, as well as the ability to distinguish the aberrant or diseased structure of the heart. Therefore, LV segmentation is an important and critical task in medical practice, and automated LV segmentation is a pressing need. The deep learning models have been utilized in research for automatic LV segmentation. In this work, three cutting-edge convolutional neural network architectures (SegNet, Fully Convolutional Network, and Mask R-CNN) are designed and implemented to segment the LV. In addition, an echocardiography image dataset is generated, and the amount of training data is gradually increased to measure segmentation performance using evaluation metrics. The pixel's accuracy, precision, recall, specificity, Jaccard index, and dice similarity coefficients are applied to evaluate the three models. The Mask R-CNN model outperformed the other two models in these evaluation metrics. As a result, the Mask R-CNN model is used in this study to examine the effect of training data. For 4,000 images, the network achieved 92.21% DSC value, 85.55% Jaccard index, 98.76% mean accuracy, 96.81% recall, 93.15% precision, and 96.58% specificity value. Relatively, the Mask R-CNN outperformed other architectures, and the performance achieves stability when the model is trained using more than 4,000 training images.
- Research Article
278
- 10.1016/j.compag.2020.105380
- Mar 26, 2020
- Computers and Electronics in Agriculture
Detection and segmentation of overlapped fruits based on optimized mask R-CNN application in apple harvesting robot
- Research Article
183
- 10.1109/jstars.2018.2835377
- Aug 1, 2018
- IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Establishing up-to-date large scale building maps is essential to understand the urban dynamics, such as estimating population, urban planning, and many other applications. Although many computer vision tasks have been successfully carried out with deep convolutional neural networks, there is a growing need to understand their large scale impact on building mapping with remote sensing imagery. Taking advantage of the scalability of convolutional neural networks (CNNs) and using only few areas with the abundance of building footprints, for the first time we conduct a comparative analysis of four state-of-the-art CNNs for extracting building footprints across the entire continental United States. The four CNN architectures namely: Branch-out CNN, fully convolutional network (FCN), conditional random field as recurrent neural network (CRFasRNN), and SegNet, support semantic pixelwise labeling and focus on capturing textural information at multiscale. We use 1-meter resolution aerial images from National Agriculture Imagery Program as the test-bed, and compare the extraction results across the four methods. In addition, we propose to combine signed-distance labels with SegNet, the preferred CNN architecture identified by our extensive evaluations, to advance building extraction results to instance level. We further demonstrate the usefulness of fusing additional near IR information into the building extraction framework. Large scale experimental evaluations are conducted and reported using metrics that include: Precision, recall rate, intersection over union, and the number of buildings extracted. With the improved CNN model and no requirement of further postprocessing, we have generated building maps for the United States with an average processing time less than one minute for an area of size <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\sim {\text{56}}$</tex-math></inline-formula> <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX"> $\text{km}^2$</tex-math></inline-formula> . The quality of extracted buildings and processing time demonstrated that the proposed CNN based framework fits the need of building extraction at scale.
- Research Article
96
- 10.3390/rs10071135
- Jul 18, 2018
- Remote Sensing
Building extraction from remotely sensed imagery plays an important role in urban planning, disaster management, navigation, updating geographic databases, and several other geospatial applications. Several published contributions dedicated to the applications of deep convolutional neural networks (DCNN) for building extraction using aerial/satellite imagery exists. However, in all these contributions, high accuracy is always obtained at the price of extremely complex and large network architectures. In this paper, we present an enhanced fully convolutional network (FCN) framework that is designed for building extraction of remotely sensed images by applying conditional random fields (CRFs). The main objective is to propose a methodology selecting a framework that balances high accuracy with low network complexity. A modern activation function, namely, the exponential linear unit (ELU), is applied to improve the performance of the fully convolutional network (FCN), thereby resulting in more accurate building prediction. To further reduce the noise (falsely classified buildings) and to sharpen the boundaries of the buildings, a post-processing conditional random fields (CRFs) is added at the end of the adopted convolutional neural network (CNN) framework. The experiments were conducted on Massachusetts building aerial imagery. The results show that our proposed framework outperformed the fully convolutional network (FCN), which is the existing baseline framework for semantic segmentation, in terms of performance measures such as the F1-score and IoU measure. Additionally, the proposed method outperformed a pre-existing classifier for building extraction using the same dataset in terms of the performance measures and network complexity.
- Research Article
8
- 10.3390/rs17030550
- Feb 6, 2025
- Remote Sensing
The present survey examines the role of big data analytics in advancing remote sensing and geospatial analysis. The increasing volume and complexity of geospatial data are driving the adoption of machine learning (ML) and artificial intelligence (AI) techniques, such as convolutional neural networks (CNNs) and long short-term memory (LSTM) networks, to extract meaningful insights from large, diverse datasets. These AI methods enhance the accuracy and efficiency of spatial and temporal data analysis, benefiting applications in environmental monitoring, urban planning, and disaster management. Despite these advancements, challenges related to computational efficiency, data integration, and model transparency remain. This paper also discusses emerging trends and highlights the potential of hybrid approaches, cloud computing, and edge processing in overcoming these challenges. The integration of AI with geospatial data is poised to significantly improve our ability to monitor and manage Earth systems, supporting more informed and sustainable decision-making.
- Conference Article
1
- 10.1109/robio54168.2021.9739282
- Dec 27, 2021
This paper proposes a novel automatically generating image masks method for the state-of-the-art Mask R-CNN deep learning method. The Mask R-CNN method achieves the best results in object detection until now, however, it is very time consuming and laborious to get the object Masks for training, the proposed method is composed by a two-stage design, to automatically generating image masks, the first stage implements a fully convolutional networks (FCN) based segmentation network, the second stage network, a Mask R-CNN based object detection network, which is trained on the object image masks from FCN output, the original input image, and additional label information. Through experimentation, our proposed method can obtain the image masks automatically to train Mask R-CNN, and it can achieve very high classification accuracy with an over 90% mean of average precision (mAP) for segmentation.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.