Abstract

Data imbalance is a common problem in machine learning and image processing. The lack of training data for the rarest classes can lead to worse learning ability and negatively affect the quality of segmentation. In this paper, we focus on the problem of data balancing for the task of image segmentation. We review major trends in handling unbalanced data and propose a new method for data balancing, based on Distance Transform. This method is designed for using in segmentation convolutional neural networks (CNNs), but it is universal and can be used with any patch-based segmentation machine learning model. The evaluation of the proposed data balancing method is performed on two datasets. The first is medical dataset LiTS, containing CT images of liver with tumor abnormalities. The second one is a geological dataset, containing of photographs of polished sections of different ores. The proposed algorithm enhances the data balance between classes and improves the overall performance of CNN model.

Highlights

  • Data imbalance is a common issue in image segmentation [1]

  • The problem of data imbalance is very common in medical problems and, in particular, detecting liver tumors

  • In this paper we propose a data balancing method that focuses on modifying the class distribution in the dataset

Read more

Summary

Introduction

Data imbalance is a common issue in image segmentation [1]. If pixels corresponding to a particular “majority” class are far more numerous than pixels of one or more “minority” classes, the rarity of the “minority” class in the training data makes the training process less effective and worses the final results, as the learned model will tend to classify most pixels as members of the “majority” classes. The problem of data imbalance is very common in medical problems and, in particular, detecting liver tumors. One of these problems is segmentation of CT images, since the volume and area of different organs and abnormalities differs a lot. One common scheme involves assigning to each class a cost equal to the inverse of the proportion of this class in dataset. This leads to higher model penalization for rarest classes. The second category of methods is represented with so-called data-based methods They use sampling techniques to rebalance the distribution of classes during preprocessing. The proposed method is specially created for segmentation problems and has a wide range of applications

Proposed method
Class choice
Image choice
Patch choice
Used datasets
LiTS dataset
Polished sections of ores dataset
Experiments and results
Background
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call