Abstract
Breast cancer is one of the most common causes of all cancer deaths worldwide and presents a very high mortality rate when compared with the other types of cancer. Early diagnostic can significantly increase the chances of correct treatment and survival, but this process is very tedious, time-consuming and dependent on the experience of the pathologists. The major task associated with this work is automatic classification between cancerous and non-cancerous tissue histopathology microscopic images, which would be a valuable computer-aided diagnosis tool for the clinician. In this paper, we conduct experiments on two public dataset one available at http://andrewjanowczyk.com/wp-static/IDC_regular_ps50idx5.zip/ and other is BreakHis available at https://web.inf.ufpr.br/vri/databases/. Both of these datasets are heavily imbalanced among classes. Existing works in literature shows that Convolutional Neural Network (CNN) is today the state-of-the-art approach to solving several complexes problems, including medical image analysis, and in particular on histopathology image classification. However, class-imbalanced data can potentially put a negative impact on the performance of CNN during the parameter learning. In order to overcome the class imbalanced problem, we utilize different over-sampling and under-sampling techniques on the imbalanced training dataset such that the performances of the CNN based classifiers can be improved to those of the class balanced dataset. Extensive experiments demonstrate that synthetic over-sampling can be an effective way to counter the impact of class imbalance of training data.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have