Abstract

Inverted repeats are present in abundance in both prokaryotic and eukaryotic genomes and can form DNA secondary structures – hairpins and cruciforms that are involved in many important biological processes. Bioinformatics tools for efficient and accurate detection of inverted repeats are desirable, because existing tools are often less accurate and time consuming, sometimes incapable of dealing with genome-scale input data. Here, we present a MATLAB-based program called detectIR for the perfect and imperfect inverted repeat detection that utilizes complex numbers and vector calculation and allows genome-scale data inputs. A novel algorithm is adopted in detectIR to convert the conventional sequence string comparison in inverted repeat detection into vector calculation of complex numbers, allowing non-complementary pairs (mismatches) in the pairing stem and a non-palindromic spacer (loop or gaps) in the middle of inverted repeats. Compared with existing popular tools, our program performs with significantly higher accuracy and efficiency. Using genome sequence data from HIV-1, Arabidopsis thaliana, Homo sapiens and Zea mays for comparison, detectIR can find lots of inverted repeats missed by existing tools whose outputs often contain many invalid cases. detectIR is open source and its source code is freely available at: https://sourceforge.net/projects/detectir.

Highlights

  • An inverted repeat is a nucleotide sequence fragment that can form self-complementary pairing between its two halves

  • Using atomic force microscopy images, the DNA-binding protein PARP1 was shown to bind the cruciform structure generated by a 106-nt inverted repeat within an E. coli plasmid [5]

  • PARP-1 was found to participate in chromatin structure coordination and gene expression regulation [11], and it did show a binding preference to cruciform structures than loops or linear DNAs [9]

Read more

Summary

Introduction

An inverted repeat is a nucleotide sequence fragment that can form self-complementary pairing between its two halves. In comparison with findIR, the novelty of detectIR lies in a novel mapping schema that utilizes complex numbers, a distinctive and effective strategy of search and validation to evaluate candidates of both perfect and imperfect inverted repeats, and the utilization of MATLAB builtin vector calculation power that enables simultaneous detection of inverted repeats of same length to improve the program efficiency significantly.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.