Abstract

DNA sequences seen in the normal character-based representation appear to have a formidable mixing of the four nucleotides without any apparent order. Nucleotide frequencies and distributions in the sequences have been studied extensively, since the simple rule given by Chargaff almost a century ago that equates the total number of purines to the pyrimidines in a duplex DNA sequence. While it is difficult to trace any relationship between the bases from studies in the character representation of a DNA sequence, graphical representations may provide a clue. These novel representations of DNA sequences have been useful in providing an overview of base distribution and composition of the sequences and providing insights into many hidden structures. We report here our observation based on a graphical representation that the intra-purine and intra-pyrimidine differences in sequences of conserved genes generally follow a quadratic distribution relationship and show that this may have arisen from mutations in the sequences over evolutionary time scales. From this hitherto undescribed relationship for the gene sequences considered in this report we hypothesize that such relationships may be characteristic of these sequences and therefore could become a barrier to large scale sequence alterations that override such characteristics, perhaps through some monitoring process inbuilt in the DNA sequences. Such relationship also raises the possibility of intron sequences playing an important role in maintaining the characteristics and could be indicative of possible intron-late phenomena.

Highlights

  • The apparent lack of pattern of composition and distribution of bases in DNA sequences have been one of the enduring problems of molecular biology

  • No clear relationship has been found as yet between the occurrences of these four bases in an individual strand of a gene sequence, much work have been done on understanding nucleotide frequencies and base distributions in DNA sequences

  • A 61661 matrix of codon substitution rates is used, assuming that mutations occur at the three codon positions independently and only single-nucleotide substitutions are permitted to occur at any instant

Read more

Summary

Introduction

The apparent lack of pattern of composition and distribution of bases in DNA sequences have been one of the enduring problems of molecular biology. No clear relationship has been found as yet between the occurrences of these four bases in an individual strand of a gene sequence, much work have been done on understanding nucleotide frequencies and base distributions in DNA sequences. Goldman and Yang [2] proposed a codon-based model for the evolution of protein-coding DNA sequences using a Markov process to describe substitutions between codons. They used codon level information to model synonymous and asynonymous nucleotide substitution applicable to homologous sequences with no insertion/deletion gaps or with gaps removed. While the model is useful for pairwise distance measures and for phylogenies, a relationship defining base composition in a DNA sequence is not clearly realised

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.