Review of deep learning-based segmentation methods: Popular approaches, literature gaps, and opportunities
Review of deep learning-based segmentation methods: Popular approaches, literature gaps, and opportunities
- Research Article
54
- 10.1016/j.compbiomed.2022.106194
- Oct 14, 2022
- Computers in biology and medicine
Cx22: A new publicly available dataset for deep learning-based segmentation of cervical cytology images
- Research Article
1
- 10.34133/hds.0166
- Jan 1, 2024
- Health data science
Background: MRI segmentation offers crucial insights for automatic analysis. Although deep learning-based segmentation methods have attained cutting-edge performance, their efficacy heavily relies on vast sets of meticulously annotated data. Methods: In this study, we propose a novel semi-supervised MRI segmentation model that is able to explore unlabeled data in multiple aspects based on various semi-supervised learning technologies. Results: We compared the performance of our proposed method with other deep learning-based methods on 2 public datasets, and the results demonstrated that we have achieved Dice scores of 90.3% and 89.4% on the LA and ACDC datasets, respectively. Conclusions: We explored the synergy of various semi-supervised learning technologies for MRI segmentation, and our investigation will inspire research that focuses on designing MRI segmentation models.
- Conference Article
40
- 10.1109/icmla.2019.00105
- Dec 1, 2019
With the Internet of Things (IoT) devices becoming an integral part of human life, the need for robust anomaly detection in streaming data has also been elevated. Dozens of distance-based, density-based, kernel-based, and cluster-based algorithms have been proposed in the area of anomaly detection. Recently, because of the robustness of the deep neural networks (DNN), different deep learning-based anomaly detection methods have also been proposed. With all these rapid developments, there exists a small number of comparative studies for anomaly detection methods. Even in those studies, the comparison is done only in typical anomaly detection settings without taking the streaming data into consideration. The presence of intrinsic time-series characteristics like trend, seasonality, and change-point makes it important to study the behavior of commonly used anomaly detection methods on streaming data. Moreover, the comparison of traditional methods with deep learning-based methods also brings exciting insights about the data which are generally overlooked by traditional methods. In this study, we compare 13 anomaly detection methods on two commonly used streaming data sets. We used four different evaluation metrics to evaluate the methods from different perspectives. Our analysis reveals that the deep learning-based anomaly detection methods are superior to traditional anomaly detection methods.
- Research Article
159
- 10.1039/d3sc04185a
- Jan 1, 2024
- Chemical Science
The last few years have seen the development of numerous deep learning-based protein-ligand docking methods. They offer huge promise in terms of speed and accuracy. However, despite claims of state-of-the-art performance in terms of crystallographic root-mean-square deviation (RMSD), upon closer inspection, it has become apparent that they often produce physically implausible molecular structures. It is therefore not sufficient to evaluate these methods solely by RMSD to a native binding mode. It is vital, particularly for deep learning-based methods, that they are also evaluated on steric and energetic criteria. We present PoseBusters, a Python package that performs a series of standard quality checks using the well-established cheminformatics toolkit RDKit. The PoseBusters test suite validates chemical and geometric consistency of a ligand including its stereochemistry, and the physical plausibility of intra- and intermolecular measurements such as the planarity of aromatic rings, standard bond lengths, and protein-ligand clashes. Only methods that both pass these checks and predict native-like binding modes should be classed as having "state-of-the-art" performance. We use PoseBusters to compare five deep learning-based docking methods (DeepDock, DiffDock, EquiBind, TankBind, and Uni-Mol) and two well-established standard docking methods (AutoDock Vina and CCDC Gold) with and without an additional post-prediction energy minimisation step using a molecular mechanics force field. We show that both in terms of physical plausibility and the ability to generalise to examples that are distinct from the training data, no deep learning-based method yet outperforms classical docking tools. In addition, we find that molecular mechanics force fields contain docking-relevant physics missing from deep-learning methods. PoseBusters allows practitioners to assess docking and molecular generation methods and may inspire new inductive biases still required to improve deep learning-based methods, which will help drive the development of more accurate and more realistic predictions.
- Research Article
4
- 10.3390/rs14081854
- Apr 12, 2022
- Remote Sensing
Visual odometry is the task of estimating the trajectory of the moving agents from consecutive images. It is a hot research topic both in robotic and computer vision communities and facilitates many applications, such as autonomous driving and virtual reality. The conventional odometry methods predict the trajectory by utilizing the multiple view geometry between consecutive overlapping images. However, these methods need to be carefully designed and fine-tuned to work well in different environments. Deep learning has been explored to alleviate the challenge by directly predicting the relative pose from the paired images. Deep learning-based methods usually focus on the consecutive images that are feasible to propagate the error over time. In this paper, graph loss and geodesic rotation loss are proposed to enhance deep learning-based visual odometry methods based on graph constraints and geodesic distance, respectively. The graph loss not only considers the relative pose loss of consecutive images, but also the relative pose of non-consecutive images. The relative pose of non-consecutive images is not directly predicted but computed from the relative pose of consecutive ones. The geodesic rotation loss is constructed by the geodesic distance and the model regresses a Lie algebra so(3) (3D vector). This allows a robust and stable convergence. To increase the efficiency, a random strategy is adopted to select the edges of the graph instead of using all of the edges. This strategy provides additional regularization for training the networks. Extensive experiments are conducted on visual odometry benchmarks, and the obtained results demonstrate that the proposed method has comparable performance to other supervised learning-based methods, as well as monocular camera-based methods. The source code and the weight are made publicly available.
- Book Chapter
- 10.1007/978-981-19-6068-0_34
- Nov 23, 2022
Medical imaging has been evolving at a steady pace generating enormous amounts of health data, and the use of deep learning (DL) has helped a great deal in processing the detailed data. Deep learning-based methods are used in different medical imaging tasks to detect and diagnose diseases. For example, medical imaging is used to diagnose pulmonary embolism (PE), a commonly occurring cardiovascular disease with high mortality and prevalence and a low diagnosis rate. According to medical experts, PE has resulted in many deaths because of missed diagnoses for the medical condition. Another critical aspect of the disease is the possibility of permanent lung damage if left untreated. The use of deep learning methods in medical imaging is attributed to their ability to use learning-based methods to process enormous amounts of data. However, there are some unique challenges in the detection of PE. PE is not specific in its clinical presentation and is easily ignored, making it difficult to diagnose. Deep learning-based detection methods help a great deal in the disease detection in miniature sub-branches of the alveoli, and images with noisy artifacts easily compared to manual diagnosis.
- Research Article
34
- 10.3390/diagnostics11020158
- Jan 22, 2021
- Diagnostics
COVID-19 is a fast-growing disease all over the world, but facilities in the hospitals are restricted. Due to unavailability of an appropriate vaccine or medicine, early identification of patients suspected to have COVID-19 plays an important role in limiting the extent of disease. Lung computed tomography (CT) imaging is an alternative to the RT-PCR test for diagnosing COVID-19. Manual segmentation of lung CT images is time consuming and has several challenges, such as the high disparities in texture, size, and location of infections. Patchy ground-glass and consolidations, along with pathological changes, limit the accuracy of the existing deep learning-based CT slices segmentation methods. To cope with these issues, in this paper we propose a fully automated and efficient deep learning-based method, called LungINFseg, to segment the COVID-19 infections in lung CT images. Specifically, we propose the receptive-field-aware (RFA) module that can enlarge the receptive field of the segmentation models and increase the learning ability of the model without information loss. RFA includes convolution layers to extract COVID-19 features, dilated convolution consolidated with learnable parallel-group convolution to enlarge the receptive field, frequency domain features obtained by discrete wavelet transform, which also enlarges the receptive field, and an attention mechanism to promote COVID-19-related features. Large receptive fields could help deep learning models to learn contextual information and COVID-19 infection-related features that yield accurate segmentation results. In our experiments, we used a total of 1800+ annotated CT slices to build and test LungINFseg. We also compared LungINFseg with 13 state-of-the-art deep learning-based segmentation methods to demonstrate its effectiveness. LungINFseg achieved a dice score of and an intersection-over-union (IoU) score of —higher than the ones of the other 13 segmentation methods. Specifically, the dice and IoU scores of LungINFseg were better than those of the popular biomedical segmentation method U-Net.
- Research Article
- 10.1364/osac.440246
- Nov 10, 2021
- OSA Continuum
Illumination estimation is a fundamental prerequisite for many computer vision applications. Various statistics and deep learning-based estimation methods have been proposed, and further studies are ongoing. In this study, we first perform a comparative analysis of representative statistics and deep learning-based methods and subsequently investigate combining them to improve the illumination estimation accuracy. We use hyperspectral images as the training data and support vector regression to combine the methods. Based on the results, we confirm that their combination enhances their accuracy.
- Conference Article
9
- 10.1109/icccs49078.2020.9118556
- May 1, 2020
With the rapid development of information technology, various software applications are flooding our daily lives. The development of these application software inevitably generates a lot of source code. How to detect and analyze various defects in the source code, such as API/Function call errors, array misuse, and expression syntax error, etc., which is known as source code defect analysis (SCDA), has attracted the attention of many researchers in the academic field. Since artificial intelligence (AI) technology has achieved excellent results in the field of image processing and natural language processing, researchers have tried to use deep learning algorithms in AI to automatically extract and analyze features of source code. Therefore, we review the recent deep learning-based source code defect analysis methods, including abstract syntax tree-based methods, program dependency graph-based methods, and other deep learning-based methods. Compared to traditional methods, the deep learning-based code defect analysis methods can realize the automatic extraction of source code defect features. This means that there is no longer a need for human experts to pre-define code features, which avoids errors caused by humans to a certain extent. The application research of AI in the source code defect analysis is an interesting and challenging development direction, and we believe it has broad development prospects.
- Research Article
37
- 10.1109/tvt.2021.3065250
- Mar 10, 2021
- IEEE Transactions on Vehicular Technology
Intelligent transportation system (ITS) collects numerous data for analysis of the transportation system. The data can be used for providing services for travellers and traffic controllers in the ITS and optimizing it, for the purpose of making the transportation more efficient and safer. Due to the wide and flexible employment of video cameras in visual surveillance system (VSS), mature edge-cloud resource scheduling for data transmission and analysis, and the fast development of deep learning, computer vision (CV) methods have been employed in the visual-based ITS services successfully. In this paper, we discuss the edge-cloud surveillance resource scheduling for the CV methods and review the deep learning-based CV methods in the VSS, including detection, classification, and tracking methods, for better understanding of the relationship between the CV-based ITS services and these methods. We experimentally compare several state-of-the-art deep learning-based methods, which have been successfully applied in the CV fields under the ITS scenario, on their performance, inference speed, computational quantity, and model size. According to the comparisons, we propose four main challenges of the deep learning-based CV methods applied in the services, as a discussion of the future research directions. Code are available at https://github.com/PRIS-CV/DL-CV-ITS .
- Research Article
141
- 10.1109/tpami.2023.3261282
- Aug 1, 2023
- IEEE Transactions on Pattern Analysis and Machine Intelligence
Visible and infrared image fusion (VIF) has attracted a lot of interest in recent years due to its application in many tasks, such as object detection, object tracking, scene segmentation, and crowd counting.In addition to conventional VIF methods, an increasing number of deep learning-based VIF methods have been proposed in the last five years.Different types of methods, such as CNN-based, autoencoder-based, GAN-based, and transformer-based methods, have been proposed.Deep learning-based methods have undoubtedly become dominant methods for the VIF task.However, while much progress has been made, the field will benefit from a systematic review of these deep learning-based methods.In this paper we present a comprehensive review of deep learning-based VIF methods.We discuss motivation, taxonomy, recent development characteristics, datasets, and performance evaluation methods in detail.We also discuss future prospects of the VIF field.This paper can serve as a reference for VIF researchers and those interested in entering this fast-developing field.
- Research Article
18
- 10.1109/access.2020.3012893
- Jan 1, 2020
- IEEE Access
Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states). In this paper, we propose a novel fully supervised training method for extracting a speaker embedding vector disentangled from the variability caused by the nuisance attributes. The proposed framework was compared with the conventional deep learning-based embedding methods using the RSR2015 and VoxCeleb1 dataset. Experimental results show that the proposed approach can extract speaker embeddings robust to channel and emotional variability.
- Research Article
2
- 10.21037/qims-22-304
- Jul 1, 2023
- Quantitative imaging in medicine and surgery
Human brown adipose tissue (BAT), mostly located in the cervical/supraclavicular region, is a promising target in obesity treatment. Magnetic resonance imaging (MRI) allows for mapping the fat content quantitatively. However, due to the complex heterogeneous distribution of BAT, it has been difficult to establish a standardized segmentation routine based on magnetic resonance (MR) images. Here, we suggest using a multi-modal deep neural network to detect the supraclavicular fat pocket. A total of 50 healthy subjects [median age/body mass index (BMI) =36 years/24.3 kg/m2] underwent MRI scans of the neck region on a 3 T Ingenia scanner (Philips Healthcare, Best, Netherlands). Manual segmentations following fixed rules for anatomical borders were used as ground truth labels. A deep learning-based method (termed as BAT-Net) was proposed for the segmentation of BAT on MRI scans. It jointly leveraged two-dimensional (2D) and three-dimensional (3D) convolutional neural network (CNN) architectures to efficiently encode the multi-modal and 3D context information from multi-modal MRI scans of the supraclavicular region. We compared the performance of BAT-Net to that of 2D U-Net and 3D U-Net. For 2D U-Net, we analyzed the performance difference of implementing 2D U-Net in three different planes, denoted as 2D U-Net (axial), 2D U-Net (coronal), and 2D U-Net (sagittal). The proposed model achieved an average dice similarity coefficient (DSC) of 0.878 with a standard deviation of 0.020. The volume segmented by the network was smaller compared to the ground truth labels by 9.20 mL on average with a mean absolute increase in proton density fat fraction (PDFF) inside the segmented regions of 1.19 percentage points. The BAT-Net outperformed all implemented 2D U-Nets and the 3D U-Nets with average DSC enhancement ranging from 0.016 to 0.023. The current work integrates a deep neural network-based segmentation into the automated segmentation of supraclavicular fat depot for quantitative evaluation of BAT. Experiments show that the presented multi-modal method benefits from leveraging both 2D and 3D CNN architecture and outperforms the independent use of 2D or 3D networks. Deep learning-based segmentation methods show potential towards a fully automated segmentation of the supraclavicular fat depot.
- Research Article
1
- 10.1002/nbm.5169
- May 7, 2024
- NMR in biomedicine
In this study, our objective was to assess the performance of two deep learning-based hippocampal segmentation methods, SynthSeg and TigerBx, which are readily available to the public. We contrasted their performance with that of two established techniques, FreeSurfer-Aseg and FSL-FIRST, using three-dimensional T1-weighted MRI scans (n = 1447) procured from public databases. Our evaluation focused on the accuracy and reproducibility of these tools in estimating hippocampal volume. The findings suggest that both SynthSeg and TigerBx are on a par with Aseg and FIRST in terms of segmentation accuracy and reproducibility, but offer a significant advantage in processing speed, generating results in less than 1 min compared with several minutes to hours for the latter tools. In terms of Alzheimer's disease classification based on the hippocampal atrophy rate, SynthSeg and TigerBx exhibited superior performance. In conclusion, we evaluated the capabilities of two deep learning-based segmentation techniques. The results underscore their potential value in clinical and research environments, particularly when investigating neurological conditions associated with hippocampal structures.
- Research Article
3
- 10.3390/genes15010054
- Dec 29, 2023
- Genes
Hi-C is a widely used technique to study the 3D organization of the genome. Due to its high sequencing cost, most of the generated datasets are of a coarse resolution, which makes it impractical to study finer chromatin features such as Topologically Associating Domains (TADs) and chromatin loops. Multiple deep learning-based methods have recently been proposed to increase the resolution of these datasets by imputing Hi-C reads (typically called upscaling). However, the existing works evaluate these methods on either synthetically downsampled datasets, or a small subset of experimentally generated sparse Hi-C datasets, making it hard to establish their generalizability in the real-world use case. We present our framework-Hi-CY-that compares existing Hi-C resolution upscaling methods on seven experimentally generated low-resolution Hi-C datasets belonging to various levels of read sparsities originating from three cell lines on a comprehensive set of evaluation metrics. Hi-CY also includes four downstream analysis tasks, such as TAD and chromatin loops recall, to provide a thorough report on the generalizability of these methods. We observe that existing deep learning methods fail to generalize to experimentally generated sparse Hi-C datasets, showing a performance reduction of up to 57%. As a potential solution, we find that retraining deep learning-based methods with experimentally generated Hi-C datasets improves performance by up to 31%. More importantly, Hi-CY shows that even with retraining, the existing deep learning-based methods struggle to recover biological features such as chromatin loops and TADs when provided with sparse Hi-C datasets. Our study, through the Hi-CY framework, highlights the need for rigorous evaluation in the future. We identify specific avenues for improvements in the current deep learning-based Hi-C upscaling methods, including but not limited to using experimentally generated datasets for training.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.