Abstract The lack of labeled, intraoperative patient data in medical scenarios poses a relevant challenge for machine learning applications. Given the apparent power of machine learning, this study examines how synthetically-generated data can help to reduce the amount of clinical data needed for robust liver surface segmentation in laparoscopic images. Here, we report the results of three experiments, using 525 annotated clinical images from 5 patients alongside 20,000 synthetic photo-realistic images from 10 patient models. The effectiveness of the use of synthetic data is compared to the use of data augmentation, a traditional performance-enhancing technique. For training, a supervised approach employing the U-Net architecture was chosen. The results of these experiments show a progressive increase in accuracy. Our base experiment on clinical data yielded an F1 score of 0.72. Applying data augmentation to this model increased the F1 score to 0.76. Our model pre-trained on synthetic data and fine-tuned with augmented data achieved an F1 score of 0.80, a 4% increase. Additionally, a model evaluation involving k-fold cross validation highlighted the dependency of the result on the test set. These results demonstrate that leveraging synthetic data has the ability of limiting the need for more patient data to increase the segmentation performance.
Read full abstract