Abstract

Prediction algorithms for protein or gene structures, including transcription factor binding from sequence information, have been transformative in understanding gene regulation. Here we ask whether human transcriptomic profiles can be predicted solely from the expression of transcription factors (TFs). We find that the expression of 1600 TFs can explain >95% of the variance in 25,000 genes. Using the light-up technique to inspect the trained NN, we find an over-representation of known TF-gene regulations. Furthermore, the learned prediction network has a hierarchical organization. A smaller set of around 125 core TFs could explain close to 80% of the variance. Interestingly, reducing the number of TFs below 500 induces a rapid decline in prediction performance. Next, we evaluated the prediction model using transcriptional data from 22 human diseases. The TFs were sufficient to predict the dysregulation of the target genes (rho = 0.61, P < 10−216). By inspecting the model, key causative TFs could be extracted for subsequent validation using disease-associated genetic variants. We demonstrate a methodology for constructing an interpretable neural network predictor, where analyses of the predictors identified key TFs that were inducing transcriptional changes during disease.

Highlights

  • Bridging the gap between genome sequences and phenotypes is a core challenge in genomics and personalized medicine

  • The final factors and ask whether training a Deep Neural Network (DNN) on gene expression data section of the results asks whether the prediction model could could learn a predictive transcription factors (TFs) network

  • We constraint the training such that the resulting predictive model is interpretable. We refer to this as a methodology for advancing beyond black-box machine learning models, which in turn is a first step towards what could be referred to as white-box fully interpretable models. We find that such models can predict the expression of genes based on TFs and that the predicted relationships between TFs and their target genes largely overlap with known TF bindings

Read more

Summary

Introduction

Bridging the gap between genome sequences and phenotypes is a core challenge in genomics and personalized medicine. To this end, it is essential to characterize intermediate levels, such as cells, tissues, and organs, using a suite of molecular technologies. Genetic variants associated with diseases exert their effects through the modulation of these intermediate levels. The transcribed mRNA expression is one of the most accessible and important windows into the cell’s regulatory machinery and changes in tissues and organs. The analysis of mRNA expression is crucial for the study of diseases[1]. The elucidation of gene regulatory mechanisms is central since gene regulatory networks maintain cellular identity and mediate interactions between genetic variants and the environment of humans

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.