Gengmm: Generalized Gaussian-Mixture-Based Domain Adaptation Model for Semantic Segmentation

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Domain adaptive semantic segmentation is the task of generating precise and dense predictions for an unlabeled target domain using a model trained on a labeled source domain. While significant efforts have been devoted to improving unsupervised domain adaptation for this task, it is crucial to note that many models rely on a strong assumption that the source data is entirely and accurately labeled, while the target data is unlabeled. In real-world scenarios, however, we often encounter partially or noisy labeled data in source and target domains, referred to as Generalized Domain Adaptation (GDA). In such cases, we suggest leveraging weak or unlabeled data from both domains to narrow the gap between them, resulting in effective adaptation. We introduce the Generalized Gaussian-mixture-based (GenGMM) domain adaptation model, which harnesses the underlying data distribution in both domains to refine noisy weak and pseudo labels. The experiments demonstrate the effectiveness of our approach.

Similar Papers
  • Research Article
  • 10.26565/2304-6201-2024-62-07
Perspectives of using deep learning models for semantic image segmentation on autonomous devices
  • Jun 21, 2024
  • Bulletin of V.N. Karazin Kharkiv National University, series «Mathematical modeling. Information technology. Automated control systems»
  • Mykhaylo Trusov + 1 more

Relevance. The implementation of deep learning models for semantic segmentation on autonomous devices is a promising direction for the development of intelligent systems capable of analyzing visual information without constant connection to external resources. This enables the creation of more autonomous and efficient systems that can operate in real-time and under resource constraints. Such an approach is highly significant for various industries, including robotics, autonomous vehicles, medical diagnostics, and other fields where high accuracy and speed of image processing are required. Goal. The goal of this work is to explore the possibilities and challenges of using deep learning models for semantic segmentation on autonomous devices. This includes analyzing the efficiency of the models, their adaptation to the limited resources of the devices, and developing methods to ensure the security of access to the trained models. Research methods. The research methods include theoretical analysis, systematization, and generalization of the use of deep learning models in autonomous devices. Special attention is given to the parameters affecting the memory footprint of the models and the specifics of implementing trained models into proprietary software products. Additionally, modern approaches to encrypting models to ensure their security have been considered. Results. A comparative analysis of traditional models and deep learning models for semantic segmentation of images has been conducted. Significant potential of deep learning technology for creating autonomous intelligent systems is identified. Various deep learning models currently used for semantic segmentation of images have been reviewed. The impact of key parameters on the efficiency of models on devices with limited resources has been determined, and the role of model size has been considered. Recommendations for implementing trained models into software products are presented, including optimizing models to reduce the size and increase the speed. Special attention has been paid to the analysis of encrypting trained models. It is shown that ensuring the security of access to trained deep learning models for semantic segmentation of images on autonomous devices requires a comprehensive approach that combines hardware and software solutions. Conclusions. Further developments in the field of deep learning for semantic segmentation on autonomous devices will contribute to the development of more efficient and autonomous systems for a wide range of applications, including computer vision, robotics, and more. Ensuring the security of access to trained deep learning models for semantic segmentation of images on autonomous devices requires a comprehensive approach that combines hardware and software solutions. This not only protects the intellectual property of developers but also ensures the integrity and confidentiality of the data processed by autonomous devices when performing semantic segmentation tasks.

  • Research Article
  • 10.20532/cit.2023.1005740
An Innovative Deep Learning Approach for Image Semantic and Instance Segmentation
  • Apr 4, 2024
  • Journal of Computing and Information Technology
  • Chuangchuang Chen + 3 more

In this study, we propose a segmentation model based on convolutional neural networks (CNNs) to address image segmentation challenges in computer vision. Prior to designing the model, the activation function and other modules of the convolutional neural network were optimized to meet specific requirements. The segmentation task was transformed into binary classification problem to simplify network calculations and improve efficiency. Additionally, the model utilized a mask map obtained from the semantic segmentation model to aid in instance segmentation. Class activation technology was introduced to extract feature mapping maps. The corresponding thermal maps were obtained to achieve target instance segmentation. To further validate the effectiveness of the segmentation model, simulation experiments were conducted on semantic segmentation and instance segmentation respectively. The results show that the accuracy of the basic semantic segmentation model reached 87.58%, while the average accuracy of the entire class of the optimized instance segmentation model reached 97.9%. Therefore, the research and design of image segmentation models demonstrate high accuracy and good robustness.

  • Research Article
  • Cite Count Icon 15
  • 10.3389/fpls.2022.914829
Appearance quality classification method of Huangguan pear under complex background based on instance segmentation and semantic segmentation
  • Oct 19, 2022
  • Frontiers in Plant Science
  • Yuhang Zhang + 5 more

The ‘Huangguan’ pear disease spot detection and grading is the key to fruit processing automation. Due to the variety of individual shapes and disease spot types of ‘Huangguan’ pear. The traditional computer vision technology and pattern recognition methods have some limitations in the detection of ‘Huangguan’ pear diseases. In recent years, with the development of deep learning technology and convolutional neural network provides a new solution for the fast and accurate detection of ‘Huangguan’ pear diseases. To achieve automatic grading of ‘Huangguan’ pear appearance quality in a complex context, this study proposes an integrated framework combining instance segmentation, semantic segmentation and grading models. In the first stage, Mask R-CNN and Mask R-CNN with the introduction of the preprocessing module are used to segment ‘Huangguan’ pears from complex backgrounds. In the second stage, DeepLabV3+, UNet and PSPNet are used to segment the ‘Huangguan’ pear spots to get the spots, and the ratio of the spot pixel area to the ‘Huangguan’ pear pixel area is calculated and classified into three grades. In the third stage, the grades of ‘Huangguan’ pear are obtained using ResNet50, VGG16 and MobileNetV3. The experimental results show that the model proposed in this paper can segment the ‘Huangguan’ pear and disease spots in complex background in steps, and complete the grading of ‘Huangguan’ pear fruit disease severity. According to the experimental results. The Mask R-CNN that introduced the CLAHE preprocessing module in the first-stage instance segmentation model is the most accurate. The resulting pixel accuracy (PA) is 97.38% and the Dice coefficient is 68.08%. DeepLabV3+ is the most accurate in the second-stage semantic segmentation model. The pixel accuracy is 94.03% and the Dice coefficient is 67.25%. ResNet50 is the most accurate among the third-stage classification models. The average precision (AP) was 97.41% and the F1 (harmonic average assessment) was 95.43%.In short, it not only provides a new framework for the detection and identification of ‘Huangguan’ pear fruit diseases in complex backgrounds, but also lays a theoretical foundation for the assessment and grading of ‘Huangguan’ pear diseases.

  • Research Article
  • Cite Count Icon 20
  • 10.1016/j.knosys.2022.108881
Fused information of DeepLabv3+ and transfer learning model for semantic segmentation and rich features selection using equilibrium optimizer (EO) for classification of NPDR lesions
  • May 4, 2022
  • Knowledge-Based Systems
  • Javaria Amin + 2 more

Fused information of DeepLabv3+ and transfer learning model for semantic segmentation and rich features selection using equilibrium optimizer (EO) for classification of NPDR lesions

  • Research Article
  • Cite Count Icon 29
  • 10.1016/j.isprsjprs.2024.03.025
Maize stem–leaf segmentation framework based on deformable point clouds
  • Apr 3, 2024
  • ISPRS Journal of Photogrammetry and Remote Sensing
  • Xin Yang + 8 more

Maize stem–leaf segmentation framework based on deformable point clouds

  • Research Article
  • Cite Count Icon 40
  • 10.1007/s40477-022-00726-8
Objective assessment of segmentation models for thyroid ultrasound images.
  • Oct 4, 2022
  • Journal of ultrasound
  • Niranjan Yadav + 2 more

Ultrasound features related to thyroid lesions structure, shape, volume, and margins are considered to determine cancer risk. Automatic segmentation of the thyroid lesion would allow the sonographic features to be estimated. On the basis of clinical ultrasonography B-mode scans, a multi-output CNN-based semantic segmentation is used to separate thyroid nodules' cystic & solid components. Semantic segmentation is an automatic technique that labels the ultrasound (US) pixels with an appropriate class or pixel category, i.e., belongs to a lesion or background. In the present study, encoder-decoder-based semantic segmentation models i.e. SegNet using VGG16, UNet, and Hybrid-UNet implemented for segmentation of thyroid US images. For this work, 820 thyroid US images are collected from the DDTI and ultrasoundcases.info (USC) datasets. These segmentation models were trained using a transfer learning approach with original and despeckled thyroid US images. The performance of segmentation models is evaluated by analyzing the overlap region between the true contour lesion marked by the radiologist and the lesion retrieved by the segmentation model. The mean intersection of union (mIoU), mean dice coefficient (mDC) metrics, TPR, TNR, FPR, and FNR metrics are used to measure performance. Based on the exhaustive experiments and performance evaluation parameters it is observed that the proposed Hybrid-UNet segmentation model segments thyroid nodules and cystic components effectively.

  • Research Article
  • Cite Count Icon 3
  • 10.1002/mp.15827
Multiscale unsupervised domain adaptation for automatic pancreas segmentation in CT volumes using adversarial learning.
  • Jul 27, 2022
  • Medical Physics
  • Yan Zhu + 6 more

Computer-aided automatic pancreas segmentation is essential for early diagnosis and treatment of pancreatic diseases. However, the annotation of pancreas images requires professional doctors and considerable expenditure. Due to imaging differences among various institution population, scanning devices, imaging protocols, and so on, significant degradation in the performance of model inference results is prone to occur when models trained with domain-specific (usually institution-specific) datasets are directly applied to new (other centers/institutions) domain data. In this paper, we propose a novel unsupervised domain adaptation method based on adversarial learning to address pancreas segmentation challenges with the lack of annotations and domain shift interference. A 3D semantic segmentation model with attention module and residual module is designed as the backbone pancreas segmentation model. In both segmentation model and domain adaptation discriminator network, a multiscale progressively weighted structure is introduced to acquire different field of views. Features of labeled data and unlabeled data are fed in pairs into the proposed multiscale discriminator to learn domain-specific characteristics. Then the unlabeled data features with pseudodomain label are fed to the discriminator to acquire domain-ambiguous information. With this adversarial learning strategy, the performance of the segmentation network is enhanced to segment unseen unlabeleddata. Experiments were conducted on two public annotated datasets as source datasets, respectively, and one private dataset as target dataset, where annotations were not used for the training process but only for evaluation. The 3D segmentation model achieves comparative performance with state-of-the-art pancreas segmentation methods on source domain. After implementing our domain adaptation architecture, the average dice similarity coefficient (DSC) of the segmentation model trained on the NIH-TCIA source dataset increases from 58.79% to 72.73% on the local hospital dataset, while the performance of the target domain segmentation model transferred from the medical segmentation decathlon (MSD) source dataset rises from 62.34% to 71.17%. Correlations of features across data domains are utilized to train the pancreas segmentation model on unlabeled data domain, improving the generalization of the model. Our results demonstrate that the proposed method enables the segmentation model to make meaningful segmentation for unseen data of the training set. In the future, the proposed method has the potential to apply segmentation model trained on public dataset to clinical unannotated CT images from local hospital, effectively assisting radiologists in clinicalpractice.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/icce-taiwan55306.2022.9869121
Road Semantic Segmentation and Traffic Object Detection Model Based on Encoder-Decoder CNN Architecture
  • Jul 6, 2022
  • Yih-Chen Wang + 3 more

Nowadays, there are lots of deep learning models being used, in the case of limited computing resources, the speed of performing object detection and semantic segmentation at the same time may encounter the problem of slowing down. To tackle this issue, we propose a multi-tasking learning model based on the Encoder-decoder CNN architecture, which merges the object detection and semantic segmentation models into one, thus could be trained with semantic segmentation task and object detection task at the same time, and applied on road and traffic object recognition in Taiwan's unique driving environment. Comparing to executing semantic segmentation and object detection models simultaneously, our proposed model has faster recognition speed and higher accuracy on Cityscapes dataset. The result shows that our proposed method can achieve faster recognition speed and maintain accuracy rate on an embedded platform of Nvidia Jetson TX2 with fewer computing resources.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/ictc55196.2022.9952938
Real-time semantic segmentation on edge devices: A performance comparison of segmentation models
  • Oct 19, 2022
  • Myeongseok Lee + 2 more

Recent advances in convolutional neural networks have led to considerably accurate semantic segmentation tasks. However, previous works use a pre-trained backbone borrowed from image classification tasks with considerable computational complexity, resulting in large latencies. Many methods have been proposed to reduce the latency of the segmentation network without loss of accuracy. These methods have reported varying experimental results with different computing devices, acceleration techniques, and input image sizes. Although most studies claim that their results are state-of-the-art, it is necessary to reconsider whether they are being compared under the same conditions. We propose a performance evaluation method of real-time semantic segmentation models to compare the performance under the same conditions fairly. In addition, we carry out an empirical study to evaluate the performance of recent real-time semantic segmentation networks and make a comparative analysis between them. We train the segmentation models using the same input data and data augmentation method. Then, the performance of the segmentation methods is analyzed regarding accuracy and speed. In contrast to most studies that exclude the time required for the pre-processing and post-processing steps, we measured the actual processing time needed to perform semantic segmentation with a real dataset. Further, we measured the processing speed and power consumption of the segmentation models in embedded devices in which real-time segmentation is applied, unlike previous studies that measured performance on a PC. Experimental results showed that the real-time semantic segmentation methods could not run in real-time on embedded devices when considering the pre-processing and post-processing steps. By comprehensively considering the inference speed, energy consumption, and processing time of semantic segmentation models, the experimental results show that FasterSeg-S is suited for embedded devices.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/itsc48978.2021.9564549
Situation-Aware Model Refinement for Semantic Image Segmentation
  • Sep 19, 2021
  • Lukas Habermayr + 3 more

The quality of semantic image segmentation models can be affected by external factors such as weather or daytime. Those factors can lead to safety-critical mistakes. In this work, we propose a systematic approach to detect and alleviate such weaknesses of semantic segmentation models. We systematically evaluate a semantic segmentation model under different external factors and analyze which factors have the largest impact on the performance. Then, we collect new training data under the most harmful external factors and fine-tune the model. We use the CARLA simulator to obtain driving data under various environment settings. We deploy a state-of-the-art semantic segmentation model in two distinct driving environments. Then, we use the proposed process to detect which external factors affect model performance the most. We collect new training data under those factors and fine-tune the model. The proposed approach outperforms collecting the same amount of random additional data by up to 10.6%. Our results show the benefit of using an iterative refinement approach as opposed to merely collecting larger data sets. Finally, we use the knowledge about which factors affect performance the most to train a simple decision tree classifier to predict the model's performance given the current external factors. Problematic environments can be detected at an average accuracy of 87.5%.

  • Research Article
  • Cite Count Icon 44
  • 10.1109/tpami.2023.3248294
ADPL: Adaptive Dual Path Learning for Domain Adaptation of Semantic Segmentation.
  • Aug 1, 2023
  • IEEE transactions on pattern analysis and machine intelligence
  • Yiting Cheng + 4 more

To alleviate the need for large-scale pixel-wise annotations, domain adaptation for semantic segmentation trains segmentation models on synthetic data (source) with computer-generated annotations, which can be then generalized to segment realistic images (target). Recently, self-supervised learning (SSL) with a combination of image-to-image translation shows great effectiveness in adaptive segmentation. The most common practice is to perform SSL along with image translation to well align a single domain (source or target). However, in this single-domain paradigm, unavoidable visual inconsistency raised by image translation may affect subsequent learning. In addition, pseudo labels generated by a single segmentation model aligned in either the source or target domain may be not accurate enough for SSL. In this paper, based on the observation that domain adaptation frameworks performed in the source and target domain are almost complementary, we propose a novel adaptive dual path learning (ADPL) framework to alleviate visual inconsistency and promote pseudo-labeling by introducing two interactive single-domain adaptation paths aligned in source and target domain respectively. To fully explore the potential of this dual-path design, novel technologies such as dual path image translation (DPIT), dual path adaptive segmentation (DPAS), dual path pseudo label generation (DPPLG) and Adaptive ClassMix are proposed. The inference of ADPL is extremely simple, only one segmentation model in the target domain is employed. Our ADPL outperforms the state-of-the-art methods by large margins on GTA5 →Cityscapes, SYNTHIA → Cityscapes and GTA5 →BDD100K scenarios. Code and models are available at https://github.com/royee182/DPL.

  • Research Article
  • Cite Count Icon 1
  • 10.1038/s41598-025-00236-7
Automated diagnosis for extraction difficulty of maxillary and mandibular third molars and post-extraction complications using deep learning
  • May 30, 2025
  • Scientific Reports
  • Junseok Lee + 4 more

Optimal surgical methods require accurate prediction of extraction difficulty and complications. Although various automated methods related to third molar (M3) extraction have been proposed, none fully predict both extraction difficulty and post-extraction complications. This study proposes an automatic diagnosis method based on state-of-the-art semantic segmentation and classification models to predict the extraction difficulty of maxillary and mandibular M3s and possible complications (sinus perforation and inferior alveolar nerve (IAN) injury). A dataset of 4,903 orthopantomographys (OPGs), annotated by experts, was used. The proposed diagnosis method segments M3s (#18, #28, #38, #48), second molars (#17, #27, #37, #47), maxillary sinuses, and inferior alveolar canal (IAC) in OPGs using a segmentation model and extracts the region of interest (RoI). Using the RoI as input, the classification model predicts extraction difficulty and complication possibilities. The model achieved 87.97% and 88.85% accuracy in predicting maxillary and mandibular M3 extraction difficulty, with area under the receiver operating characteristic curve (AUROC) of 96.25% and 97.3%, respectively. It also predicted the possibility of sinus perforation and IAN injury with 91.45% and 88.47% accuracy, and AUROC of 91.78% and 94.13%, respectively. Our results show that the proposed method effectively predicts the extraction difficulty and complications of maxillary and mandibular M3s using OPG, and could serve as a decision support system for clinicians before surgery.

  • Research Article
  • Cite Count Icon 2
  • 10.33897/fujeas.v2i1.424
Comparison of Multiple Deep Models on Semantic Segmentation for Breast Tumor Detection
  • Sep 20, 2021
  • Foundation University Journal of Engineering and Applied Sciences <br><i style="color:black;">(HEC Recognized Y Category , ISSN 2706-7351)</i>
  • Sajid Ullah Khan + 4 more

The early diagnosis of breast tumor detection is the most significant research issue in mammography. Computer-aided diagnosis (CAD) is one of the highly essential methods to prevent breast cancer. This research work explored the effectiveness of deep-based pixel-wise segmentation models for low energy X-rays (mammographic imagery) to detect tumors in the breast region. For this purpose, various semantic segmentation models were incorporated into the experimental procedure. All the models were analyzed using the medical images dataset, which was gathered and annotated from one of the largest teaching hospitals in the Khyber Pakhtunkhwa province, known as Lady reading hospital. It is coordinated in cooperation with local health specialists, radiologists, and technologists. The comparative analysis of the incorporated segmentation techniques' performance was observed, selecting the most appropriate model for detecting tumors and normal breast regions. The experimental evaluation of the proposed models performs efficient detection of tumor and non-tumor areas in breast mammograms using traditional evaluation metrics such as mean IoU and Pixel accuracy. The performance of the semantic segmentation techniques was evaluated on two datasets (Cityscapes and mammogram). Dilation 10 (global) performed the best among the four semantic segmentation models by achieving a higher pixel accuracy of 93.69%. It reflects the effectiveness of the pixel-wise segmentation techniques by outperforming other state-of-the-art automatic image segmentation models.

  • Research Article
  • Cite Count Icon 16
  • 10.3390/su15086434
A Study on Identification of Urban Waterlogging Risk Factors Based on Satellite Image Semantic Segmentation and XGBoost
  • Apr 10, 2023
  • Sustainability
  • Jinping Tong + 6 more

As global warming exacerbates and urbanization accelerates, extreme climatic events occur frequently. Urban waterlogging is seriously spreading in China, resulting in a high level of vulnerability in urban societies and economies. It has been urgent for regional sustainable development to effectively identify and analyze the risk factors behind urban waterlogging. A novel model incorporating satellite image semantic segmentation into extreme gradient boosting (XGBoost) is employed for identifying and forecasting the urban waterlogging risk factors. Ground object features of waterlogging points are extracted by the satellite image semantic segmentation, and XGBoost is employed to predict waterlogging points and identify the primary factors affecting urban waterlogging. This paper selects the coastal cities of Haikou, Xiamen, Shanghai, and Qingdao as research areas, and obtains data from social media. According to the comprehensive performance evaluation of the semantic segmentation and XGBoost models, the semantic segmentation model could effectively identify and extract water bodies, roads, and green spaces in satellite images, and the XGBoost model is more accurate and reliable than other common machine learning methods in prediction performance and precision. Among all waterlogging risk factors, elevation is the main factor affecting waterlogging in the research areas. For Shanghai and Qingdao, the secondary factor affecting waterlogging is roads. Water bodies are the secondary factor affecting urban waterlogging in Haikou. For Xiamen, the four indicators other than the elevation are equally significant, which could all be regarded as secondary factors affecting urban waterlogging.

  • Research Article
  • Cite Count Icon 13
  • 10.1155/2022/6010912
Research Contribution and Comprehensive Review towards the Semantic Segmentation of Aerial Images Using Deep Learning Techniques
  • Mar 20, 2022
  • Security and Communication Networks
  • P Anilkumar + 1 more

Semantic segmentation is a significant research topic for decades and has been employed in several applications. In recent years, semantic segmentation has been focused on different deep learning approaches in the area of computer vision, which has aimed for getting superior efficiency while analyzing the aerial and remote-sensing images. The main aim of this review is to provide a clear algorithmic categorization and analysis of the diverse contribution of semantic segmentation of aerial images and expects to give the comprehensive details associated with the recent developments. In addition, the emerged deep learning methods demonstrated much improved performance measures on several public datasets and incredible efforts have been dedicated to advancing pixel-level accuracy. Hence, the analysis on diverse datasets of each contribution is studied, and also, the best performance measures achieved by the existing semantic segmentation models are evaluated. Thus, this survey can facilitate researchers in understanding the development of semantic segmentation in a shorter time, simplify understanding of its latest advancements, research gaps, and challenges to be used as a reference for developing the new semantic image segmentation models in the future.

Save Icon
Up Arrow
Open/Close