Abstract

Molecular biology studies on cancer, using gene expression datasets, have revealed that the datasets have a very small number of samples. Obtaining medical data is difficult and expensive due to privacy constraints. Accuracy of classifiers depends greatly on the quality and quantity of input data. The problem of small sample size or small data size has been addressed by augmentation. Owing to the sensitivity of synthetic data samples for the cancer data classification for gene expression data, this paper is motivated to investigate data augmentation using GAN. GAN is based on the principle of two blocks (generator and discriminator) working in a collaborative yet adversarial way. This paper proposes modified generator GAN (MG-GAN) where the generator is fed with original data and multivariate noise to generate data with Gaussian distribution. As the generated data lie within latent space, we reach saddle point faster. GAN has been widely used in data augmentation for image datasets. As per our understanding, this is the first attempt of using GAN for augmentation on gene expression dataset. The performance merit of proposed MG-GAN was compared with KNN and Basic GAN. As compared to KNN and GAN, MG-GAN improves classification accuracy by 18.8% and 11.9%, respectively. The loss value of the error function for MG-GAN is drastically reduced, from 0.6978 to 0.0082, ensuring sensitivity of the generated data. Improved classification accuracy and reduction in the loss value make our improved MG-GAN method better suited for critical applications with sensitive data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.