Abstract
Dozens of previous studies in the field have dealt with the relations between transcript features and their expression. Indeed, understanding the way gene expression is encoded in transcripts should not only contribute to disciplines, such as functional genomics and molecular evolution, but also to biotechnology and human health. Previous studies in the field mainly aimed at predicting protein levels of genes based on their transcript features. Most of the models employed in this context assume that the effect of each transcript feature on gene expression is monotonic. In the current study we aim to understand, for the first time, if indeed the relations between transcript features (i.e., the UTRs and ORF) and measurements related to the different stages of gene expression is monotonic. To this end, we analyze 5432 transcript features and perform gene expression measurements (mRNA levels, ribosomal densities, protein levels, etc.) of 4367 S. cerevisiae genes. We use the Maximal Information Coefficient (MIC) in order to identify potential relations that are not necessarily linear or monotonic. Our analyses demonstrate that the relation between most transcript features and the examined gene expression measurements is monotonic (only up to 1-5% of the variables, with significance levels of 0.001, are non-monotonic); in addition, in the cases of deviation from monotonicity the relation/deviation is very weak. These results should help in guiding the development of computational gene expression modeling and engineering, and improve the understanding of this process. Furthermore, the relatively simple relations between a transcript's nucleotide composition and its expression should contribute towards better understanding of transcript evolution at the molecular level.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have