Evolutionary modeling and prediction of non-coding RNAs in Drosophila.

Robert K Bradley,Mitchell E Skinner,Ian Holmes,Yuri R Bendaña,Lars Barquist,Andrew V Uzilov

doi:10.1371/journal.pone.0006478

Robert K Bradley, Mitchell E Skinner + Show 4 more

Open Access

PDF Available

https://doi.org/10.1371/journal.pone.0006478

Copy DOI

Export

Save

Cite

Journal: PloS one	Publication Date: Aug 11, 2009
Citations: 14	License type: CC BY 4.0

Affiliation: University of California, Berkeley

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

We performed benchmarks of phylogenetic grammar-based ncRNA gene prediction, experimenting with eight different models of structural evolution and two different programs for genome alignment. We evaluated our models using alignments of twelve Drosophila genomes. We find that ncRNA prediction performance can vary greatly between different gene predictors and subfamilies of ncRNA gene. Our estimates for false positive rates are based on simulations which preserve local islands of conservation; using these simulations, we predict a higher rate of false positives than previous computational ncRNA screens have reported. Using one of the tested prediction grammars, we provide an updated set of ncRNA predictions for D. melanogaster and compare them to previously-published predictions and experimental data. Many of our predictions show correlations with protein-coding genes. We found significant depletion of intergenic predictions near the 3′ end of coding regions and furthermore depletion of predictions in the first intron of protein-coding genes. Some of our predictions are colocated with larger putative unannotated genes: for example, 17 of our predictions showing homology to the RFAM family snoR28 appear in a tandem array on the X chromosome; the 4.5 Kbp spanned by the predicted tandem array is contained within a FlyBase-annotated cDNA.

Highlights

The number of non-coding RNAs in eukaryotic genomes is one of the pressing open questions of genomics
Our predictions may be associated with introns of unannotated protein-coding genes. 19 of our predictions scoring as small nucleolar RNAs (snoRNAs) correspond to the single RFAM family snoR28, and 17 of these appear in a tandem array on the X chromosome
As a first step towards functional characterization of proteincoding genes with predicted structurally-conserved elements in their 39 and 59 untranslated regions (UTRs) and introns, we identified enriched Gene Ontology (GO) terms with GO::TermFinder [42]

Summary

Introduction

The number of non-coding RNAs (ncRNAs) in eukaryotic genomes is one of the pressing open questions of genomics. This program, xrate, allows the grammar structure to be specified in a configuration file; the parameters can be automatically estimated from training data and the parameterized phylo-grammar used to annotate new alignments This program implements a wide variety of models and can be used for measurement of evolutionary rates, or prediction of RNA (or protein) secondary structure. Using one of the grammars, we scan a multiple alignment of twelve Drosophila genomes for novel ncRNAs. As well as reproducing many of the predictions of earlier bioinformatics screens in Drosophila [11,13,28], our screen predicts numerous novel structured RNAs, lending support to the hypothesis that eukaryotic genomes are dense with ncRNAs. the simulation procedure that we use (which includes locally conserved regions that are not ncRNAs) suggests that false positive rates for ncRNA prediction are higher than previously reported. Our methods point the way to further evidence-based evaluations of whole-genome bioinformatics screens

Results

Methods

Discussion