Abstract

BackgroundNCRNAs (noncoding RNAs) play important roles in many biological processes. Existing genome-scale ncRNA search tools identify ncRNAs in local sequence alignments generated by conventional sequence comparison methods. However, some types of ncRNA lack strong sequence conservation and tend to be missed or mis-aligned by conventional sequence comparison.ResultsIn this paper, we propose an ncRNA identification framework that is complementary to existing sequence comparison tools. By integrating a filtration step based on Hamming distance and ncRNA alignment programs such as FOLDALIGN or PLAST-ncRNA, the proposed ncRNA search framework can identify ncRNAs that lack strong sequence conservation. In addition, as the ratio of transition and transversion mutation is often used as a discriminative feature for functional ncRNA identification, we incorporate this feature into the filtration step using a coding strategy. We apply Hamming distance seeds to ncRNA search in the intergenic regions of human and mouse genomes and between the Burkholderia cenocepacia J2315 genome and the Ralstonia solanacearum genome. The experimental results demonstrate that a carefully designed Hamming distance seed can achieve better sensitivity in searching for poorly conserved ncRNAs than conventional sequence comparison tools.ConclusionsHamming distance seeds provide better sensitivity as a filtration strategy for genome-wide ncRNA homology search than the existing seeding strategies used in BLAST-like tools. By combining Hamming distance seeds matching and ncRNA alignment, we are able to find ncRNAs with sequence similarities below 60%.

Highlights

  • Identifying ncRNAs, which function directly as RNAs rather than being translated into proteins, has drawn tremendous attention recently for two main reasons

  • As we are only interested in ncRNA homologs with low sequence similarities, we examine the PLASTncRNA probabilities for tRNA and SECIS homologs between human and mouse because these two have low sequence conservations

  • Our experimental results show that HD seed matching provides an effective and efficient filtration step for genome-scale ncRNA search

Read more

Summary

Introduction

Identifying ncRNAs (non-coding RNAs), which function directly as RNAs rather than being translated into proteins, has drawn tremendous attention recently for two main reasons. Existing genome-scale ncRNA identification methods [2,3,4] first employ conventional sequence comparison tools such as BLAST [5] to generate an initial set of alignments for further screening. Features such as secondary structure conservation, minimum free energy (MFE), sequence conservation, GC content, base or basepair substitution patterns etc. BLAST-like sequence comparison tools have been successfully used for finding protein-coding genes, segment duplications, and other genomic features, they are not well suited for comprehensive ncRNA search. Existing genome-scale ncRNA search tools identify ncRNAs in local sequence alignments generated by conventional sequence comparison methods. Some types of ncRNA lack strong sequence conservation and tend to be missed or mis-aligned by conventional sequence comparison

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.