Non-Coding RNA Prediction and Verification in Saccharomyces cerevisiae

Laura A Kavanaugh,Fred S Dietrich,Yoshihide Hayashizaki

doi:10.1371/journal.pgen.1000321

Laura A Kavanaugh, Fred S Dietrich + Show 1 more

Open Access

https://doi.org/10.1371/journal.pgen.1000321

Copy DOI

Journal: PLoS Genetics	Publication Date: Jan 2, 2009
Citations: 56	License type: CC BY 4.0

Affiliation: Duke University Hospital, Duke Medical Center

Abstract

Non-coding RNA (ncRNA) play an important and varied role in cellular function. A significant amount of research has been devoted to computational prediction of these genes from genomic sequence, but the ability to do so has remained elusive due to a lack of apparent genomic features. In this work, thermodynamic stability of ncRNA structural elements, as summarized in a Z-score, is used to predict ncRNA in the yeast Saccharomyces cerevisiae. This analysis was coupled with comparative genomics to search for ncRNA genes on chromosome six of S. cerevisiae and S. bayanus. Sets of positive and negative control genes were evaluated to determine the efficacy of thermodynamic stability for discriminating ncRNA from background sequence. The effect of window sizes and step sizes on the sensitivity of ncRNA identification was also explored. Non-coding RNA gene candidates, common to both S. cerevisiae and S. bayanus, were verified using northern blot analysis, rapid amplification of cDNA ends (RACE), and publicly available cDNA library data. Four ncRNA transcripts are well supported by experimental data (RUF10, RUF11, RUF12, RUF13), while one additional putative ncRNA transcript is well supported but the data are not entirely conclusive. Six candidates appear to be structural elements in 5′ or 3′ untranslated regions of annotated protein-coding genes. This work shows that thermodynamic stability, coupled with comparative genomics, can be used to predict ncRNA with significant structural elements.

Highlights

Non-coding RNA are functional RNA transcripts that are not translated into protein
Once a genome is sequenced, it becomes necessary to identify the set of genes and other functional elements within the genome
Experimental methods have been developed for this purpose but they are time-consuming, expensive, and often provide an incomplete picture

Summary

Introduction

Non-coding RNA (ncRNA) are functional RNA transcripts that are not translated into protein (i.e., not messenger RNAs). Large-scale cDNA libraries, and serial analysis of gene expression (SAGE) experiments have all shown transcription from many locations in the genome that appear to be unannotated ncRNA genes [11,12,13,14]. This along with recent identification of new protein coding genes such as YPR010C-A in 2006 shows that even in this best-studied Eukaryote, we still do not know the complete gene set [13]

Methods

Results

Discussion

Conclusion