New hierarchy-based segmentation layer: towards automatic marker proposal
Image segmentation is an ill-posed problem by definition, as it is not always possible to automatically select which object appearing in an image is the object of interest. To deal with this issue, prior knowledge in the form of human-given markers can be included in the segmentation pipeline. Even though user interaction can drastically improve segmentation results, it is an expensive resource, and finding ways to reduce human effort on an interactive segmentation loop is of great interest. In this work, we propose a new segmentation layer to be used with deep neural networks, which allows us to create and train in an end-to-end fashion a marker creation network. To train the network, we propose a loss function composed of: a segmentation loss using the proposed differentiable segmentation layer; and a set of regularization functions that enforce the desired characteristics on the produced markers. We showed that by using the proposed layer and loss function, we can train the network to automatically generate markers that recover a good segmentation and have desirable shape characteristics. This behavior is observed on the training dataset, as well as on four unseen datasets.
- Research Article
188
- 10.1109/tmi.2020.3015224
- Nov 30, 2020
- IEEE Transactions on Medical Imaging
Deep convolutional neural networks have significantly boosted the performance of fundus image segmentation when test datasets have the same distribution as the training datasets. However, in clinical practice, medical images often exhibit variations in appearance for various reasons, e.g., different scanner vendors and image quality. These distribution discrepancies could lead the deep networks to over-fit on the training datasets and lack generalization ability on the unseen test datasets. To alleviate this issue, we present a novel Domain-oriented Feature Embedding (DoFE) framework to improve the generalization ability of CNNs on unseen target domains by exploring the knowledge from multiple source domains. Our DoFE framework dynamically enriches the image features with additional domain prior knowledge learned from multi-source domains to make the semantic features more discriminative. Specifically, we introduce a Domain Knowledge Pool to learn and memorize the prior information extracted from multi-source domains. Then the original image features are augmented with domain-oriented aggregated features, which are induced from the knowledge pool based on the similarity between the input image and multi-source domain images. We further design a novel domain code prediction branch to infer this similarity and employ an attention-guided mechanism to dynamically combine the aggregated features with the semantic features. We comprehensively evaluate our DoFE framework on two fundus image segmentation tasks, including the optic cup and disc segmentation and vessel segmentation. Our DoFE framework generates satisfying segmentation results on unseen datasets and surpasses other domain generalization and network regularization methods.
- Research Article
3
- 10.3390/met14070761
- Jun 27, 2024
- Metals
This paper identifies and analyzes the microstructure of a carburized layer by using a deep convolutional neural network, selecting different carburizing processes to conduct surface treatment on 23CrNi3Mo steel, collecting many metallographic pictures of the carburized layer based on laser confocal microscopy, and building a microstructure dataset (MCLD) database for training and testing. Five algorithms—a full convolutional network (FCN), U-Net, DeepLabv3+, pyramid scene parsing network (PSPNet), and image cascade network (ICNet)—are used to segment the self-built microstructural dataset (MCLD). By comparing the five deep learning algorithms, a neural network model suitable for the MCLD database is identified and optimized. The research results achieve recognition, segmentation, and statistic verification of metallographic microstructure images through a deep convolutional neural network. This approach can replace the high cost and complicated process of experimental testing of retained austenite and martensite. This new method is provided to identify and calculate the content of residual austenite and martensite in the carburized layer of low-carbon steel, which lays a theoretical foundation for optimizing the carburizing process.
- Research Article
17
- 10.1167/tvst.5.2.14
- Apr 5, 2016
- Translational vision science & technology
PurposeTo automatically identify which spectral-domain optical coherence tomography (SD-OCT) scans will provide reliable automated layer segmentations for more accurate layer thickness analyses in population studies.MethodsSix hundred ninety macular SD-OCT image volumes (6.0 × 6.0 × 2.3 mm3) were obtained from one eyes of 690 subjects (74.6 ± 9.7 [mean ± SD] years, 37.8% of males) randomly selected from the population-based Rotterdam Study. The dataset consisted of 420 OCT volumes with successful automated retinal nerve fiber layer (RNFL) segmentations obtained from our previously reported graph-based segmentation method and 270 volumes with failed segmentations. To evaluate the reliability of the layer segmentations, we have developed a new metric, segmentability index SI, which is obtained from a random forest regressor based on 12 features using OCT voxel intensities, edge-based costs, and on-surface costs. The SI was compared with well-known quality indices, quality index (QI), and maximum tissue contrast index (mTCI), using receiver operating characteristic (ROC) analysis.ResultsThe 95% confidence interval (CI) and the area under the curve (AUC) for the QI are 0.621 to 0.805 with AUC 0.713, for the mTCI 0.673 to 0.838 with AUC 0.756, and for the SI 0.784 to 0.920 with AUC 0.852. The SI AUC is significantly larger than either the QI or mTCI AUC (P < 0.01).ConclusionsThe segmentability index SI is well suited to identify SD-OCT scans for which successful automated intraretinal layer segmentations can be expected.Translational RelevanceInterpreting the quantification of SD-OCT images requires the underlying segmentation to be reliable, but standard SD-OCT quality metrics do not predict which segmentations are reliable and which are not. The segmentability index SI presented in this study does allow reliable segmentations to be identified, which is important for more accurate layer thickness analyses in research and population studies.
- Conference Article
115
- 10.5220/0007347504380445
- Jan 1, 2019
In semantic segmentation tasks the Jaccard Index, or Intersection over Union (IoU), is often used as a measure of success. While this measure is more representative than per-pixel accuracy, state-of-the-art deep neural networks are still trained on accuracy by using Binary Cross Entropy loss. In this research, an alternative is used where deep neural networks are trained for a segmentation task of human faces by optimizing directly an approximation of IoU. When using this approximation, IoU becomes differentiable and can be used as a loss function. The comparison between IoU loss and Binary Cross Entropy loss is made by testing two deep neural network models on multiple datasets and data splits. The results show that training directly on IoU significantly increases performance for both models compared to training on conventional Binary Cross Entropy loss.
- Research Article
21
- 10.1016/j.cageo.2022.105251
- Oct 31, 2022
- Computers & Geosciences
Design of an optimized deep learning algorithm for automatic classification of high-resolution satellite dataset (LISS IV) for studying land-use patterns in a mining region
- Research Article
5
- 10.5194/isprs-archives-xlviii-4-w1-2022-51-2022
- Aug 5, 2022
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. This work aims to test the effectiveness of artificial intelligence for correcting water refraction in shallow inland water using very high-resolution images collected by Unmanned Aerial Systems (UAS) and processed through a total FOSS workflow. The tests focus on using synthetic information extracted from the visible component of the electromagnetic spectrum. An artificial neural network is created using data of three morphologically similar alpine rivers. The RGB information, the SfM depth and seven radiometric indices are calculated and stacked in an 11-bands raster (input dataset). The depths are calculated as the difference between the Up component of the bathymetry cross-sections and the water surface quotas and constitute the dependent variable of the regression. The dataset is then scaled. The observations of one of the analyzed case studies are used as the unseen dataset to test the generalization capability of the model. The remaining observations are divided into test (20%) and training (80%) datasets. The generated NN is a 3-layer MLP model with one hidden layer and the Rectified Linear Unit (ReLU) and sigmoid activation functions. The weights are initialized to small Gaussian random values, and kernel regularizers, L1 and L2, are added to reduce the overfitting. Weights are updated with the Adam search technique, and the mean squared error is the loss function. The importance and significance of 11 variables are assessed. The model has a 0.70 r-squared score on the test dataset and 0.77 on the training dataset. The MAE is 0.06 and the RMSE 0.08, similar results obtained from the unseen dataset. Although the good metrics, the model shows some difficulties generalizing swallow depths.
- Research Article
70
- 10.1016/j.autcon.2021.104016
- Nov 5, 2021
- Automation in Construction
Deep learning for detecting building façade elements from images considering prior knowledge
- Research Article
24
- 10.1016/j.asr.2023.08.057
- Sep 4, 2023
- Advances in Space Research
A comparative evaluation of deep convolutional neural network and deep neural network-based land use/land cover classifications of mining regions using fused multi-sensor satellite data
- Research Article
164
- 10.1109/tip.2019.2941265
- Sep 27, 2019
- IEEE Transactions on Image Processing
Recent state-of-the-art image segmentation algorithms are mostly based on deep neural networks, thanks to their high performance and fast computation time. However, these methods are usually trained in a supervised manner, which requires large number of high quality ground-truth segmentation masks. On the other hand, classical image segmentation approaches such as level-set methods are formulated in a self-supervised manner by minimizing energy functions such as Mumford-Shah functional, so they are still useful to help generation of segmentation masks without labels. Unfortunately, these algorithms are usually computationally expensive and often have limitation in semantic segmentation. In this paper, we propose a novel loss function based on Mumford-Shah functional that can be used in deep-learning based image segmentation without or with small labeled data. This loss function is based on the observation that the softmax layer of deep neural networks has striking similarity to the characteristic function in the Mumford-Shah functional. We show that the new loss function enables semi-supervised and unsupervised segmentation. In addition, our loss function can be also used as a regularized function to enhance supervised semantic segmentation algorithms. Experimental results on multiple datasets demonstrate the effectiveness of the proposed method.
- Research Article
1
- 10.1002/mp.17423
- Oct 1, 2024
- Medical Physics
BackgroundIn medical image segmentation, a domain gap often exists between training and testing datasets due to different scanners or imaging protocols, which leads to performance degradation in deep learning‐based segmentation models. Given the high cost of manual labeling and the need for privacy protection, it is often challenging to annotate the testing (target) domain data for model fine‐tuning or to collect data from different domains to train domain generalization models. Therefore, using only unlabeled target domain data for test‐time adaptation (TTA) presents a more practical but challenging solution.PurposeTo improve the segmentation accuracy of deep learning‐based models on unseen datasets, and especially to enhance the efficiency and stability of TTA for individual samples from heterogeneous domains.MethodsIn this study, we proposed to dynamically adapt a wavelet‐VNet (WaVNet) to unseen target domains with a hybrid objective function, based on each unlabeled test sample during the test time. We embedded multiscale wavelet coefficients into a V‐Net encoder and adaptively adjusted the spatial and spectral features according to the input, and the model parameters were optimized by three loss functions. We integrated a shape‐aware loss to focus on the foreground segmentations, a Refine loss to correct the incomplete and noisy segmentations caused by domain shifts, and an entropy loss to promote the global consistency of the segmentations. We evaluated the proposed method on multidomain liver and prostate segmentation datasets to assess its advantages over other TTA methods. For the source domain model training of the liver dataset, we used 15 3D MR image samples for training and 5 for validation. Correspondingly, for the prostate dataset, we used 22 3D MR image samples for training and 7 for validation. In the target domain, we used a single 3D MR image sample for adaptation and testing. The total number of testing samples is 60 in the liver dataset (for 3 different domains) and 116 in the prostate dataset (for 6 different domains).ResultsThe proposed method showed the highest segmentation accuracy among all methods, achieving a mean (± SD) Dice coefficient (DSC) of 78.10 ± 5.23% and a mean 95th Hausdorff distance (HD95) of 15.52 ± 5.84 mm on the liver dataset; and a mean DSC of 80.02 ± 3.89% and a mean HD95 of 9.18 ± 3.47 mm on the prostate dataset. The DSC is 11.67% (in absolute terms) and 15.27% higher than that of the baseline (no adaptation) method, for the liver and the prostate datasets, respectively.ConclusionsThe proposed adaptive WaVNet enhanced the image segmentation accuracy from unseen domains during the test time via unsupervised learning and multi‐objective optimization. It can benefit clinical applications where data are scarce or with changing data distributions, including online adaptive radiotherapy. The code will be released at: https://github.com/sanny1226/WaVNet.
- Research Article
9
- 10.1007/s13721-024-00481-2
- Aug 23, 2024
- Network Modeling Analysis in Health Informatics and Bioinformatics
Early detection of abnormal heartbeats is of great importance for cardiologists for early diagnosis of cardiac diseases. This will help patients to receive in time diagnosis and prevention. Conventionally, physicians provide cardiac diagnoses by visual examination of electrocardiograms (ECGs). However, this can be a very time consuming and demanding task and, in some cases, may lead to overlooking and wrong diagnosis of life-threatening heart diseases. Therefore, an intelligent model can help to automatically analyze these huge amount of ECGs captured by different devices in clinical practice. A deep transfer learning approach is used to utilize the capability of different trained deep neural networks and to test them on new unseen datasets without the need to fully re-train the model. Two deep neural networks, namely, Visual Geometry Group (VGG) and Residual Network (ResNet) are utilized for classification of ECGs heartbeats. The models are evaluated using two unseen ECG datasets (i.e., SVDB and INCARTDB) by only optimizing their last classification layers. The overall area under curve for receiver operating characteristic (AUCROC) of two VGG and ResNet models are 0.961 and 0.966 on the SVDB dataset, respectively, and both models achieve 0.981 on the INCARTDB. This paper proposes an accurate and explainable model to classify ECG heartbeats into five categories recommended by the ANSI/AAMI standard. The proposed method paves the way to use pre-trained deep neural networks in real-time monitoring of heart patients using ECG data and to help clinicians understand the decision made by the models on each case using an explainable approach.
- Conference Article
21
- 10.1109/isbi45749.2020.9098404
- Apr 1, 2020
Skin melanoma represents a major health issue. Today, diagnosis and follow-up can rely on computer-aided diagnosis tools, to help dermatologists segment and quantitatively describe the image content. In particular, deep convolutional neural networks (CNN) have lately been become the state-of-the-art in automated medical image segmentation. The loss function plays an important role in CNN in the backpropagation process. In this work, we propose a metric-inspired loss function, based on the Kappa index. Unlike the Dice loss, a standard loss used in image segmentation CNN, the Kappa loss takes into account all the pixels in the image, including the true negative - we believe this can improve the accuracy of the evaluation process between prediction and ground truth. We demonstrate the differentiability of the Kappa loss and present some results on six public datasets of skin lesion. Experiments have shown promising results in skin lesion segmentation.
- Conference Article
33
- 10.1109/icme.2017.8019397
- Jul 1, 2017
Foreground segmentation in video sequences is a classic topic in computer vision. Due to the lack of semantic and prior knowledge, it is difficult for existing methods to deal with sophisticated scenes well. Therefore, in this paper, we propose an end-to-end two-stage deep convolutional neural network (CNN) framework for foreground segmentation in video sequences. In the first stage, a convolutional encoder-decoder sub-network is employed to reconstruct the background images and encode rich prior knowledge of background scenes. In the second stage, the reconstructed background and current frame are input into a multi-channel fully-convolutional sub-network (MCFCN) for accurate foreground segmentation. In the two-stage CNN, the reconstruction loss and segmentation loss are jointly optimized. The background images and foreground objects are output simultaneously in an end-to-end way. Moreover, by incorporating the prior semantic knowledge of foreground and background in the pre-training process, our method could restrain the background noise and keep the integrity of foreground objects at the same time. Experiments on CDNet 2014 show that our method outperforms the state-of-the-art by 4.9%.
- Research Article
33
- 10.1186/s12880-021-00599-z
- Apr 13, 2021
- BMC Medical Imaging
BackgroundIn oncology, the correct determination of nodal metastatic disease is essential for patient management, as patient treatment and prognosis are closely linked to the stage of the disease. The aim of the study was to develop a tool for automatic 3D detection and segmentation of lymph nodes (LNs) in computed tomography (CT) scans of the thorax using a fully convolutional neural network based on 3D foveal patches.MethodsThe training dataset was collected from the Computed Tomography Lymph Nodes Collection of the Cancer Imaging Archive, containing 89 contrast-enhanced CT scans of the thorax. A total number of 4275 LNs was segmented semi-automatically by a radiologist, assessing the entire 3D volume of the LNs. Using this data, a fully convolutional neuronal network based on 3D foveal patches was trained with fourfold cross-validation. Testing was performed on an unseen dataset containing 15 contrast-enhanced CT scans of patients who were referred upon suspicion or for staging of bronchial carcinoma.ResultsThe algorithm achieved a good overall performance with a total detection rate of 76.9% for enlarged LNs during fourfold cross-validation in the training dataset with 10.3 false-positives per volume and of 69.9% in the unseen testing dataset. In the training dataset a better detection rate was observed for enlarged LNs compared to smaller LNs, the detection rate for LNs with a short-axis diameter (SAD) ≥ 20 mm and SAD 5–10 mm being 91.6% and 62.2% (p < 0.001), respectively. Best detection rates were obtained for LNs located in Level 4R (83.6%) and Level 7 (80.4%).ConclusionsThe proposed 3D deep learning approach achieves an overall good performance in the automatic detection and segmentation of thoracic LNs and shows reasonable generalizability, yielding the potential to facilitate detection during routine clinical work and to enable radiomics research without observer-bias.
- Research Article
4
- 10.1002/cem.2969
- Nov 26, 2017
- Journal of Chemometrics
Fischer‐Tropsch synthesis (FTS) is an important chemical process that produces a wide range of hydrocarbons. The exact mechanism of FTS is not yet fully understood, so prediction of the FTS products distribution is a not a trivial task. So far, artificial neural network (ANN) has been successfully applied for modeling varieties of chemical processes whenever sufficient and well‐distributed training patterns are available. However, for most chemical processes such as FTS, acquiring such amount of data is very time‐consuming and expensive. In such cases, neural network ensemble (NNE) has shown a significant generalization ability. An NNE is a set of diverse and accurate ANNs trained for the same task, and its output is a combination of outputs of these ANNs. This paper proposes a new NNE approach called NNE‐NSGA‐II that tries to prune this set by a modified nondominated sorting genetic algorithm to achieve an optimum subset according to 2 conflicting objectives, which are minimizing root‐mean‐square error in training and unseen data sets. Finally, a comparative study is performed on a single best ANN, a regular NNE, NNE‐NSGA, and 3 popular ensemble of decision trees called random forest, stochastic gradient boosting, and AdaBoost.R2. The results show that in training data set, stochastic gradient boosting and AdaBoost.R2 have better fitted the samples; however, for the predicted FTS products in unseen data set, NNEs methods specially NNE‐NSGA‐II have considerably improved the generalization ability in comparison with the other competing approaches.