Abstract

The problem of predicting non-long terminal repeats (LTR) like long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs) from the DNA sequence is still an open problem in bioinformatics. To elevate the quality of annotations of LINES and SINEs an automated tool "RetroPred" was developed. The pipeline allowed rapid and thorough annotation of non-LTR retrotransposons. The non-LTR retrotransposable elements were initially predicted by Pairwise Aligner for Long Sequences (PALS) and Parsimonious Inference of a Library of Elementary Repeats (PILER). Predicted non-LTR elements were automatically classified into LINEs and SINEs using ANN based on the position specific probability matrix (PSPM) generated by Multiple EM for Motif Elicitation (MEME). The ANN model revealed a superior model (accuracy = 78.79 +/- 6.86 %, Q(pred) = 74.734 +/- 17.08 %, sensitivity = 84.48 +/- 6.73 %, specificity = 77.13 +/- 13.39 %) using four-fold cross validation. As proof of principle, we have thoroughly annotated the location of LINEs and SINEs in rice and Arabidopsis genome using the tool and is proved to be very useful with good accuracy. Our tool is accessible at http://www.juit.ac.in/RepeatPred/home.html.

Highlights

  • Long interspersed elements (LINEs) and short interspersed elements (SINEs) are non-long terminal repeats (LTR) retrotransposons that reside within cells of a host organism, copying and inserting themselves into the host genome

  • Repetitive sequences are an important feature of eukaryotic genomes accounting for a large proportion of the genome; at least 50% of the human [1] and about 80% in some plants [2] genome seems to be composed by repetitive elements

  • The ANN model develop in this study (200-7-2) is trained with the position specific probability matrix (PSPM) matrix calculated using Multiple EM for Motif Elicitation (MEME)

Read more

Summary

Introduction

Long interspersed elements (LINEs) and short interspersed elements (SINEs) are non-LTR retrotransposons that reside within cells of a host organism, copying and inserting themselves into the host genome. The annotation of genomic repeats, typically relies on the results of a single computational program, RepeatMasker

Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.