Abstract

The classification of patients as cancer and normal patients by applying the computational methods on their gene expression profiles is an extremely important task. Recently, deep learning models, mainly multilayer perceptron and convolutional neural networks, have gained popularity for being applied on this type of datasets. This paper aims to analyze the performance of deep learning models on different types of cancer gene expression datasets as no such consolidated work is available. For this purpose, three deep learning models along with two feature selection method and four cancer gene expression datasets have been considered. It has resulted in a total of 24 different combinations to be analyzed. Out of four datasets, two are imbalanced and two are balanced in terms of number of normal and cancer samples. Experimental results show that the deep learning models have performed well in terms of true positive rate, precision, F1-score, and accuracy.

Highlights

  • Gene expression is the process by which genetic information encoded in DNA is converted into functional products such as proteins

  • Understanding the genetics underlying different types of cancer is an important step in the direction of understanding the disease itself. is has demanded for computational techniques to be applied on gene expression data for accomplishing systems biology task of classifying cancer patients and normal patients by examining gene expression profiles of patients

  • Analysis of Variance (ANOVA) and Information Gain have been considered as feature selection methods

Read more

Summary

Introduction

Gene expression is the process by which genetic information encoded in DNA is converted into functional products such as proteins. E advances in microarray technology and the recent Generation Sequencing (NGS) have made gene expression profiling of patients widely available [2, 3] It has resulted in collection of gene expression datasets corresponding to different disease. One of the main reasons for undertaking this work is to see how DL models perform on gene expression datasets though there is a contradiction between the nature of gene expression datasets and requirements of DL models For this purpose, the following contributions have been made: Four datasets, namely, colon cancer, pancreatic cancer, breast cancer, and lung cancer, have been selected, out of which the first two are imbalanced datasets, and the second two are balanced datasets.

Related Work
Datasets
Feature Selection
Model Training Models
Results and Analysis
Experimental Setup and Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.