Abstract

SummaryBig data is data collected with huge dimensions and continuous exponential growth over time. In recent years, big data in health care has been commonly used to predict diseases. Breast cancer is one of the most common diseases and the secondary cause of death among women. Early diagnosis of breast cancer can prevent the risk of death. Few types of research have been done on breast cancer prediction on big data. However, the traditional prediction models have less efficient in terms of accuracy and error rate. The Optimized U‐Net Convolutional neural network (OU‐NetCNN) model is proposed in this paper to overcome these challenges. Hadoop is the storage system generated to store the datasets samples for big data. The data samples from two datasets, namely BreakHis and Kaggle (Breast Histopathology Images), are preserved in this storage system. The BreakHis data are considered for further processes like pre‐processing, segmentation, feature extraction, feature selection, and classification from the stored data samples. The noise is removed from the histopathological breast images in pre‐processing using the adaptive fast peer‐group filtering (AFPGF) approach. Then the morphological operations such as erosion and dilation are used to eradicate unwanted portions and the quality of the image is enhanced using the improved balance contrast enhancement method (IBCE). Next, edges of breast images are detected using an adaptive artificial ecosystem optimization (AAEO) algorithm‐based edge detection approach in the segmentation process. The features are extracted using the Spatial Gray Level Dependence Matrix (SGLDM) and the optimal features are selected by the modified selfish herd optimization (MSHO) algorithm. Finally, the selected features are fed into the proposed OU‐NetCNN model to classify the histopathology images as benign and malignant images. This hybridization minimizes the error rate, computational complexity and over fitting issues. The simulation analysis is performed in the PYTHON tool. Two datasets, namely BreakHis and Kaggle (Breast Histopathology Images), are considered. Some of the measures such as precision, sensitivity, accuracy and F‐measure are considered to evaluate the performance of the proposed model and compared with existing approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call