Abstract

Non-coding RNA (ncRNA) play an important and varied role in cellular function. A significant amount of research has been devoted to computational prediction of these genes from genomic sequence, but the ability to do so has remained elusive due to a lack of apparent genomic features. In this work, thermodynamic stability of ncRNA structural elements, as summarized in a Z-score, is used to predict ncRNA in the yeast Saccharomyces cerevisiae. This analysis was coupled with comparative genomics to search for ncRNA genes on chromosome six of S. cerevisiae and S. bayanus. Sets of positive and negative control genes were evaluated to determine the efficacy of thermodynamic stability for discriminating ncRNA from background sequence. The effect of window sizes and step sizes on the sensitivity of ncRNA identification was also explored. Non-coding RNA gene candidates, common to both S. cerevisiae and S. bayanus, were verified using northern blot analysis, rapid amplification of cDNA ends (RACE), and publicly available cDNA library data. Four ncRNA transcripts are well supported by experimental data (RUF10, RUF11, RUF12, RUF13), while one additional putative ncRNA transcript is well supported but the data are not entirely conclusive. Six candidates appear to be structural elements in 5′ or 3′ untranslated regions of annotated protein-coding genes. This work shows that thermodynamic stability, coupled with comparative genomics, can be used to predict ncRNA with significant structural elements.

Highlights

  • Non-coding RNA are functional RNA transcripts that are not translated into protein

  • Once a genome is sequenced, it becomes necessary to identify the set of genes and other functional elements within the genome

  • Experimental methods have been developed for this purpose but they are time-consuming, expensive, and often provide an incomplete picture

Read more

Summary

Introduction

Non-coding RNA (ncRNA) are functional RNA transcripts that are not translated into protein (i.e., not messenger RNAs). Large-scale cDNA libraries, and serial analysis of gene expression (SAGE) experiments have all shown transcription from many locations in the genome that appear to be unannotated ncRNA genes [11,12,13,14]. This along with recent identification of new protein coding genes such as YPR010C-A in 2006 shows that even in this best-studied Eukaryote, we still do not know the complete gene set [13]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call