BOOSTING SEGMENTATION ACCURACY OF THE DEEP LEARNING MODELS BASED ON THE SYNTHETIC DATA GENERATION

V V Danilov,D Y Kolpashchikov,L A Hérnandez-Gómez,R A Manakov,M J Ledesma-Carbayo,O M Gerget,F Alvarez,N V Laptev

doi:10.5194/isprs-archives-xliv-2-w1-2021-33-2021

Abstract

Abstract. In the era of data-driven machine learning algorithms, data represents a new oil. The application of machine learning algorithms shows they need large heterogeneous datasets that crucially are correctly labeled. However, data collection and its labeling are time-consuming and labor-intensive processes. A particular task we solve using machine learning is related to the segmentation of medical devices in echocardiographic images during minimally invasive surgery. However, the lack of data motivated us to develop an algorithm generating synthetic samples based on real datasets. The concept of this algorithm is to place a medical device (catheter) in an empty cavity of an anatomical structure, for example, in a heart chamber, and then transform it. To create random transformations of the catheter, the algorithm uses a coordinate system that uniquely identifies each point regardless of the bend and the shape of the object. It is proposed to take a cylindrical coordinate system as a basis, modifying it by replacing the Z-axis with a spline along which the h-coordinate is measured. Having used the proposed algorithm, we generated new images with the catheter inserted into different heart cavities while varying its location and shape. Afterward, we compared the results of deep neural networks trained on the datasets comprised of real and synthetic data. The network trained on both real and synthetic datasets performed more accurate segmentation than the model trained only on real data. For instance, modified U-net trained on combined datasets performed segmentation with the Dice similarity coefficient of 92.6±2.2%, while the same model trained only on real samples achieved the level of 86.5±3.6%. Using a synthetic dataset allowed decreasing the accuracy spread and improving the generalization of the model. It is worth noting that the proposed algorithm allows reducing subjectivity, minimizing the labeling routine, increasing the number of samples, and improving the heterogeneity.

Highlights

Many machine learning algorithms are fairly sensitive to the datasets used for training
When solving the problem of localization and segmentation of the distal end of the catheter inside the heart, we encountered the problem of insufficient data and weak representativeness. To solve this problem we propose a new algorithm for synthesizing echocardiography data with inserted medical devices
Once the real and synthetic datasets were obtained, the modified U-net was trained with different values of Real Data Ratio (RDR)

Summary

Introduction

Many machine learning algorithms are fairly sensitive to the datasets used for training. Training and test samples come from the same statistical distribution. Whilst the paucity of flexible and rich enough datasets limits the ability of machine learning or statistical modeling techniques and leaves the algorithm generalization capability superficial. Synthetic datasets that are generated programmatically can help immensely in the field of machine learning. These datasets are not collected by any reallife survey or experiment. Their main purpose, is to be flexible and rich enough to help in conducting experiments with various classification, segmentation, and object detection algorithms

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences	Publication Date: Apr 15, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

BOOSTING SEGMENTATION ACCURACY OF THE DEEP LEARNING MODELS BASED ON THE SYNTHETIC DATA GENERATION

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

Lead the way for us

Similar Papers

Machine learning models trained on synthetic datasets of multiple sample sizes for the use of predicting blood pressure from clinical data in a national dataset.
Anmol Arora ... Ananya Arora
PloS one | VOL. 18
Anmol Arora, et. al.Anmol Arora ... Ananya Arora
16 Mar 2023
PloS one | VOL. 18

Can synthetic data be a proxy for real clinical trial data? A validation study
Zahra Azizi ... Lucy Mosquera
BMJ Open | VOL. 11
Zahra Azizi, et. al.Zahra Azizi ... Lucy Mosquera
01 Apr 2021
BMJ Open | VOL. 11

Author response: Limitations of principal components in quantitative genetic association models for human studies
Yiqi Yao ... Alejandro Ochoa
-
Yiqi Yao, et. al.Yiqi Yao ... Alejandro Ochoa
25 Apr 2023
25 Apr 2023

Decision letter: Limitations of principal components in quantitative genetic association models for human studies
Magnus Nordborg ... Detlef Weigel
-
Magnus Nordborg, et. al.Magnus Nordborg ... Detlef Weigel
04 Jul 2022
04 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BOOSTING SEGMENTATION ACCURACY OF THE DEEP LEARNING MODELS BASED ON THE SYNTHETIC DATA GENERATION

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences