Abstract

BackgroundNon-coding RNAs (ncRNAs) are known to be involved in many critical biological processes, and identification of ncRNAs is an important task in biological research. A popular software, Infernal, is the most successful prediction tool and exhibits high sensitivity. The application of Infernal has been mainly focused on small suspected regions. We tried to apply Infernal on a chromosome level; the results have high sensitivity, yet contain many false positives. Further enhancing Infernal for chromosome level or genome wide study is desirable.MethodologyBased on the conjecture that adjacent nucleotide dependence affects the stability of the secondary structure of an ncRNA, we first conduct a systematic study on human ncRNAs and find that adjacent nucleotide dependence in human ncRNA should be useful for identifying ncRNAs. We then incorporate this dependence in the SCFG model and develop a new order-1 SCFG model for identifying ncRNAs.ConclusionsWith respect to our experiments on human chromosomes, the proposed new model can eliminate more than 50% false positives reported by Infernal while maintaining the same sensitivity. The executable and the source code of programs are freely available at http://i.cs.hku.hk/~kfwong/order1scfg.

Highlights

  • A non-coding RNA is a RNA molecule which is not translated into a protein

  • The structural model is the core of the computational method as it should capture the characteristics of a given ncRNA family and should be powerful enough to distinguish members in the family from other sequences

  • Due to large number of false positives, it is time-consuming and expensive to verify each of predicted candidates in order to identify the true positives. It reveals that the stochastic context free grammar (SCFG) model may not be powerful enough to differentiate the false positives from the real ncRNA members. After studying this issue in details, we found that the SCFG model used in all these software tools does not consider the dependence between the nucleotides in the ncRNA sequence

Read more

Summary

Introduction

A non-coding RNA (ncRNA) is a RNA molecule which is not translated into a protein. Identifying ncRNAs is an important problem in biological study. It is known that the structure (both the primary and the secondary structure) of an ncRNA molecule usually plays an important role in its biological functions. The region which results in high score will be regarded as a potential member of the family. The structural model is the core of the computational method as it should capture the characteristics of a given ncRNA family and should be powerful enough to distinguish members in the family from other sequences. Non-coding RNAs (ncRNAs) are known to be involved in many critical biological processes, and identification of ncRNAs is an important task in biological research. Further enhancing Infernal for chromosome level or genome wide study is desirable

Objectives
Methods
Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.