Abstract

Most of the recent cancer classification methods use gene expression profile as features because it can provide very important information regarding tumor characteristics. Motivated by their success in the computer vision area now deep learning has been successfully applied to medical data because it can read non-linear patterns in a complex feature and can allow the leverage of information from unlabeled data of problems that do not belong to the problem being handled. In this paper, we implement transfer learning, which refers to the use of a model trained on one task to perform classification on another task to classify five cancer types that most commonly affect women. We used VGG16, Xception, DenseNet, and ResNet50 as base models and then added a dense layer to reflect our five-class classification problem. To avoid training over-fitting that can result in a very high training accuracy and a low cross-validation accuracy we used L2-regularization. We retrained (fine-tuned) these models using a five-fold cross-validation approach on RNA-Seq gene expression data after transforming it into 2D-image like data. We used the softmax activation function with the prediction dense layer and adam as optimizer in the model fit for all four architectures. The highest performance is obtained when fine-tuning Xception architecture, which achieved classification accuracy = 98.6%, precision = 98.6%, recall = 97.8%, and F1-score = 98% on five-fold cross-validation training and testing approach.

Highlights

  • Every cell in multicellular organisms has the same genes and every gene is not transcriptionally active in every cell, the patterns of gene expression differ from cell top another

  • Five project codes corresponding to our five types of cancer, which are TCGA-BRCA, TCGA-OV, TCGA-colon adenocarcinoma (COAD), TCGA-lung adenocarcinoma (LUAD), and TCGA-thyroid cancer (THCA) were used as project argument

  • To ensure that we can infer the level of expression correctly without biases, we applied a normalization process on the obtained gene expression profile using TCGAanalyze Normalization function [25]–[28]

Read more

Summary

Introduction

Every cell in multicellular organisms has the same genes and every gene is not transcriptionally active in every cell, the patterns of gene expression differ from cell top another. These variations may play a major role in the difference between disease and health [1]. The transcription of specific genes is measured by RNA-Seq, which converts long RNAs into a library of complementary DNA (cDNA) fragments, which generate the expression profile. The high dimensionality of the gene expression data that is associated with a small number of samples revealed other challenges to the use of computational techniques. The used computational techniques include the deep learning methods which are popularly used in computer vision problems [9], [10]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.