Abstract

Cancer genome projects are characterizing the genome, epigenome and transcriptome of a large number of samples using the latest high-throughput sequencing assays. The generated data sets pose several challenges for traditional statistical and machine learning methods. In this work we are interested in the task of deriving the most informative genes from a cancer gene expression data set. For that goal we built denoising autoencoders (DAE) and stacked denoising autoencoders and we studied the influence of the input nodes on the final representation of the DAE. We have also compared these deep learning approaches with other existing approaches. Our study is divided into two main tasks. First, we built and compared the performance of several feature extraction methods as well as data sampling methods using classifiers that were able to distinguish the samples of thyroid cancer patients from samples of healthy persons. In the second task, we have investigated the possibility of building comprehensible descriptions of gene expression data by using Denoising Autoencoders and Stacked Denoising Autoencoders as feature extraction methods. After extracting information related to the description built by the network, namely the connection weights, we devised post-processing techniques to extract comprehensible and biologically meaningful descriptions out of the constructed models. We have been able to build high accuracy models to discriminate thyroid cancer from healthy patients but the extraction of comprehensible models is still very limited.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call