Abstract

Breast cancer is the most common cancer in women. Classification of cancer/non-cancer patients with clinical records requires high sensitivity and specificity for an acceptable diagnosis test. The state-of-the-art classification model—convolutional neural network (CNN), however, cannot be used with such kind of tabular clinical data that are represented in 1-D format. CNN has been designed to work on a set of 2-D matrices whose elements show some correlation with neighboring elements such as in image data. Conversely, the data examples represented as a set of 1-D vectors—apart from the time series data—cannot be used with CNN, but with other classification models such as Recurrent Neural Networks for tabular data or Random Forest. We have proposed three novel preprocessing methods of data wrangling that transform a 1-D data vector, to a 2-D graphical image with appropriate correlations among the fields to be processed on CNN. We tested our methods on Wisconsin Original Breast Cancer (WBC) and Wisconsin Diagnostic Breast Cancer (WDBC) datasets. To our knowledge, this work is novel on non-image tabular data to image data transformation for the non-time series data. The transformed data processed with CNN using VGGnet-16 shows competitive results for the WBC dataset and outperforms other known methods for the WDBC dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call