Abstract

We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected.

Highlights

  • The similarity between the frequency of complementary nucleotides in a single strand of DNA is known as Chargaff ’s second parity rule[1]

  • We suggest that patterns of overrepresentation of short distances between reversed complements may be related to the occurrence of cruciform structures, and we evaluate this hypothesis in the human genome

  • The study addressed in this paper shows yet another use of inter-word distances and distance distributions, which may lead to a deeper understanding of intra-strand symmetry and its connection with secondary DNA structures

Read more

Summary

Introduction

The similarity between the frequency of complementary nucleotides in a single strand of DNA is known as Chargaff ’s second parity rule[1] An extension to this parity rule suggests that, for each DNA strand, the proportion of an oligonucleotide (a sequence of adjacent nucleotides, referred to as a genomic word) should be similar to that of its reversed complement, a property that has been studied both for prokaryotes and eukaryotes[2, 3]. Cruciforms are structures with four arms that can be formed at sites containing reversed complementary words They are relevant in biological processes, including those of replication and transcription, recombination and translocation[5]. The study addressed in this paper shows yet another use of inter-word distances and distance distributions, which may lead to a deeper understanding of intra-strand symmetry and its connection with secondary DNA structures

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.