Abstract
Due to different replication mechanisms between the leading and lagging strands, nucleotide composition asymmetries widely exist in bacterial genomes. A general consideration reveals that the leading strand is enriched in Guanine (G) and Thymine (T), and the lagging strand shows richness in Adenine (A) and Cytosine (C). However, some bacteria like Bacillus subtilis have been discovered composing more A than T in the leading strand. To investigate the difference, we analyze the nucleotide asymmetry from the aspect of AT and GC bias correlations. In this study, we propose a windowless method, the Z-curve Correlation Coefficient (ZCC) index, based on the Z-curve method, and analyzed more than 2000 bacterial genomes. We find that the majority of bacteria reveal negative correlations between AT and GC biases, while most genomes in Firmicutes and Tenericutes have positive ZCC indexes. The presence of PolC, purine asymmetry and stronger genes preference in the leading strand are not confined to Firmicutes, but also likely to happen in other phyla dominated by positive ZCC indexes. This method also provides a new insight into other relevant features like aerobism, and can be applied to analyze the correlation between RY (Purine and Pyrimidine) and MK (Amino and Keto) bias and so on.
Highlights
According to Chargaff’s second parity rule, bases tend to share equal percentages in the scale of whole DNA strand, i.e., Adenine (A) = Thymine (T) and Guanine (G) = Cytosine (C), only under an ideal circumstance without mutation or selection [1]
We find that the majority of bacteria reveal negative correlations between AT and GC biases, while most genomes in Firmicutes and Tenericutes have positive Z-curve Correlation Coefficient (ZCC) indexes
The presence of PolC, purine asymmetry and stronger genes preference in the leading strand are not confined to Firmicutes, and likely to happen in other phyla dominated by positive ZCC indexes
Summary
According to Chargaff’s second parity rule, bases tend to share equal percentages in the scale of whole DNA strand, i.e., Adenine (A) = Thymine (T) and Guanine (G) = Cytosine (C), only under an ideal circumstance without mutation or selection [1]. As for selective pressure, a preference in the third codon position for G over C and T over A, and the unequal distribution of coding regions have been revealed between the leading and lagging strands [4, 6].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.