Abstract

Deep learning has proven advantageous in solving cancer diagnostic or classification problems. However, it cannot explain the rationale behind human decisions. Biological pathway databases provide well-studied relationships between genes and their pathways. As pathways comprise knowledge frameworks widely used by human researchers, representing gene-to-pathway relationships in deep learning structures may aid in their comprehension. Here, we propose a deep neural network (PathDeep), which implements gene-to-pathway relationships in its structure. We also provide an application framework measuring the contribution of pathways and genes in deep neural networks in a classification problem. We applied PathDeep to classify cancer and normal tissues based on the publicly available, large gene expression dataset. PathDeep showed higher accuracy than fully connected neural networks in distinguishing cancer from normal tissues (accuracy = 0.994) in 32 tissue samples. We identified 42 pathways related to 32 cancer tissues and 57 associated genes contributing highly to the biological functions of cancer. The most significant pathway was G-protein-coupled receptor signaling, and the most enriched function was the G1/S transition of the mitotic cell cycle, suggesting that these biological functions were the most common cancer characteristics in the 32 tissues.

Highlights

  • Cancer is one of the most aggressive diseases worldwide, accounting for nearly nine million deaths universally

  • We aimed to study the performance of PathDeep in classifying cancers from normal tissues based on gene expression data

  • We assumed that the gene-to-pathway relationship is random (PathDeep random linked), connecting the gene nodes to a pathway node at random but with the same number of edges that a pathway database provides

Read more

Summary

Introduction

Cancer is one of the most aggressive diseases worldwide, accounting for nearly nine million deaths universally. Deep neural network (DNN)-based research helps to resolve goal-oriented problems, its use in biological interpretation is limited [6]. To provide comprehensive biological interpretation, here have been several attempts to apply pathway data structures in the deep learning approach. Colorectal cancer (n = 626) RNA-seq data from The Cancer Genome Atlas (TCGA) were used as training datasets [9]. Microarray gene expression data from the Gene Expression Omnibus (GEO; colorectal cancer, 13 datasets, n = 2952) were used for validation [10]. DeepCC exhibited higher efficacy with GEO than other machine learning methods, proving that using GSEA as normalized input for the DNN may help reduce the limitation of data usability, likely due to the platform difference between RNA-seq and microarray data [11]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call