Abstract

BackgroundThe generalization of the second Chargaff rule states that counts of any string of nucleotides of length k on a single chromosomal strand equal the counts of its inverse (reverse-complement) k-mer. This Inversion Symmetry (IS) holds for many species, both eukaryotes and prokaryotes, for ranges of k which may vary from 7 to 10 as chromosomal lengths vary from 2Mbp to 200 Mbp. The existence of IS has been demonstrated in the literature, and other pair-wise candidate symmetries (e.g. reverse or complement) have been ruled out.ResultsStudying IS in the human genome, we find that IS holds up to k = 10. It holds for complete chromosomes, also after applying the low complexity mask. We introduce a numerical IS criterion, and define the k-limit, KL, as the highest k for which this criterion is valid. We demonstrate that chromosomes of different species, as well as different human chromosomal sections, follow a universal logarithmic dependence of KL ~ 0.7 ln(L), where L is the length of the chromosome.We introduce a statistical IS-Poisson model that allows us to apply confidence measures to our numerical findings. We find good agreement for large k, where the variance of the Poisson distribution determines the outcome of the analysis. This model predicts the observed logarithmic increase of KL with length. The model allows us to conclude that for low k, e.g. k = 1 where IS becomes the 2nd Chargaff rule, IS violation, although extremely small, is significant. Studying this violation we come up with an unexpected observation for human chromosomes, finding a meaningful correlation with the excess of genes on particular strands.ConclusionsOur IS-Poisson model agrees well with genomic data, and accounts for the universal behavior of k-limits. For low k we point out minute, yet significant, deviations from the model, including excess of counts of nucleotides T vs A and G vs C on positive strands of human chromosomes. Interestingly, this correlates with a significant (but small) excess of genes on the same positive strands.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-3012-8) contains supplementary material, which is available to authorized users.

Highlights

  • The generalization of the second Chargaff rule states that counts of any string of nucleotides of length k on a single chromosomal strand equal the counts of its inverse k-mer

  • We denote by X(S,S*) the normalized difference X(S,S*) = |N(S)-N(S*)|/(N(S) + N(S*)) where S is one of the Mk different k-mers over the 4 nucleotides, which are being counted on the chromosome at least once, i.e. N(S) > 0 and/or N(S*) > 0

  • For low and moderate k, we find that Mk = 4k, but for large k-values, such as k > 10 in the human genome, many of the k-mers may not be realized empirically, leading to lower Mk

Read more

Summary

Introduction

The generalization of the second Chargaff rule states that counts of any string of nucleotides of length k on a single chromosomal strand equal the counts of its inverse (reverse-complement) k-mer. The second Chargaff rule [4] states that the same sets of identities of nucleotide pairs hold for each long enough single DNA strand. This rule has been tested [5]. Not being derived from a compelling principle, such as the one underlying the first rule, it remains a mystery This is even more so, when one studies extended versions of Chargaff ’s second rule. Albrecht-Buehler [6] observed that for triplet oligonucleotides, or 3-mers, it remains true that their chromosomewide frequencies are almost equal to those of their

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.