A study on using deep autoencoders for imbalanced binary classification

Vlad-Ioan Tomescu,Gabriela Czibula,Ştefan Niţică

doi:10.1016/j.procs.2021.08.013

Vlad-Ioan Tomescu, Gabriela Czibula + Show 1 more

Open Access

https://doi.org/10.1016/j.procs.2021.08.013

Copy DOI

Journal: Procedia computer science	Publication Date: Jan 1, 2021
Citations: 4	License type: cc-by-nc-nd

Affiliation: Babeș-Bolyai University

Abstract

Imbalanced classification represents a challenge for supervised learning, as an unequal distribution of classes in the training data set is mainly connected to poor predictive performance for the minority class. However, usually the minority class is the most relevant one, from a practical perspective. But due to the imbalancement of the training data, the classification errors for the minority class are higher, as the classifiers are usually biased to predict the majority class. In this paper we investigate the use of autoencoders for improving the predictive performance for imbalanced binary classification problems. As an application domain we consider breast cancer detection, that is an imbalanced classification problem of great interest in the medical domain. According to the World Health Organisation, breast cancer represents the primary cause of cancer mortality in women. Nowadays there is an increasing interest in applying conventional machine learning and more recently deep learning techniques in the breast cancer detection field by helping medical experts in the early detection of the disease. One of the paper’s goal is to investigate the ability of deep autoencoders to learn patterns within the classes of benign and malignant instances. Secondly, we propose and compare two autoencoders-based classification models for breast cancer detection. The performances of the proposed models were empirically assessed on data sets previously used in the breast cancer detection literature. The results show that our best model compares favourably with the results of most of the classifiers used for comparison and that it is able to handle well the data imbalancement.

Full Text