Abstract

BackgroundThe regulation of all gene expression steps (e.g., Transcription, RNA processing, Translation, and mRNA Degradation) is known to be primarily encoded in different parts of genes and in genomic regions in proximity to genes (e.g., promoters, untranslated regions, coding regions, introns, etc.). However, the entire gene expression codes and the genomic regions where they are encoded are still unknown.ResultsHere, we employ an unsupervised approach to estimate the concentration of gene expression codes in different non-coding parts of genes and transcripts, such as introns and untranslated regions, focusing on three model organisms (Escherichia coli, Saccharomyces cerevisiae, and Schizosaccharomyces pombe). Our analyses support the conjecture that regions adjacent to the beginning and end of ORFs and the beginning and end of introns tend to include higher concentration of gene expression information relatively to regions further away. In addition, we report the exact regions with elevated concentration of gene expression codes. Furthermore, we demonstrate that the concentration of these codes in different genetic regions is correlated with the expression levels of the corresponding genes, and with splicing efficiency measurements and meiotic stage gene expression measurements in S. cerevisiae.ConclusionWe suggest that these discoveries improve our understanding of gene expression regulation and evolution; they can also be used for developing improved models of genome/gene evolution and for engineering gene expression in various biotechnological and synthetic biology applications.

Highlights

  • The regulation of all gene expression steps (e.g., Transcription, RNA processing, Translation, and mRNA Degradation) is known to be primarily encoded in different parts of genes and in genomic regions in proximity to genes

  • Evidence that high dimensional gene expression codes appear in various transcript regions First, we analyzed the pre-mRNA transcript, dividing it into separate regions: 5’UnTranslated region (UTR), Open reading frame (ORF), introns, 3’UTRs, and the 250 nt flaking upstream and downstream sequences from the 5’UTR start and the 3’UTR end, respectively

  • We calculated its Average repetitive substring index (ARSI) score, which is the mean over the maximum substring length of each of its nucleotide positions that can be found in all the other genetic regions

Read more

Summary

Introduction

The regulation of all gene expression steps (e.g., Transcription, RNA processing, Translation, and mRNA Degradation) is known to be primarily encoded in different parts of genes and in genomic regions in proximity to genes (e.g., promoters, untranslated regions, coding regions, introns, etc.). Gene expression codes are known to be partially encoded in various genomic regions [1,2,3,4,5,6] and are related to all gene expression steps (e.g., Transcription, RNA processing, Translation, Post-translation modifications, and Degradation) These codes are encoded in different parts of the genome such as promoters, untranslated regions (UTRs), coding sequence (CDS) regions, introns, etc. The Average Repetitive Substring Index, or ARSI, is an unsupervised approach for exploiting unexplored high dimensional information and codes related to the way gene expression is encoded in the ORF This method, based solely on the genomic sequence of the analyzed organism, computes the tendency of each coding region (or any other genetic element for that matter) to include long substrings that appear in other CDSs of the organism [25]. The reference set can include (or be related to) the highly expressed genes or all the genes in a given organism; see Methods and Fig. 1b

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.