Consistent Clustering Pattern of Prokaryotic Genes Based on Base Frequency at the Second Codon Position and its Association with Functional Category Preference

Yan-Ting Jin,Kai-Yue Zhang,Feng-Biao Guo,Ju Wang,Xin Wang,Shu-Xuan Wang,Zixin Deng,Wen-Xin Zheng,Cong Ma

doi:10.1007/s12539-021-00493-w

Abstract

In 2002, our research group observed a gene clustering pattern based on the base frequency of A versus T at the second codon position in the genome of Vibrio cholera and found that the functional category distribution of genes in the two clusters was different. With the availability of a large number of sequenced genomes, we performed a systematic investigation of A2–T2 distribution and found that 2694 out of 2764 prokaryotic genomes have an optimal clustering number of two, indicating a consistent pattern. Analysis of the functional categories of the coding genes in each cluster in 1483 prokaryotic genomes indicated, that 99.33% of the genomes exhibited a significant difference (p < 0.01) in function distribution between the two clusters. Specifically, functional category P was overrepresented in the small cluster of 98.65% of genomes, whereas categories J, K, and L were overrepresented in the larger cluster of over 98.52% of genomes. Lineage analysis uncovered that these preferences appear consistently across all phyla. Overall, our work revealed an almost universal clustering pattern based on the relative frequency of A2 versus T2 and its role in functional category preference. These findings will promote the understanding of the rationality of theoretical prediction of functional classes of genes from their nucleotide sequences and how protein function is determined by DNA sequence.Graphical abstract

Highlights

The genetic code is a set of rules that defines how the fourletter code of DNA is translated into the 20-letter code of protein [1, 2]
All coding genes could be divided into two unequal clusters according to the relative base frequencies of A and T at the second codon position, and the coding genes in the two clusters exhibited significant difference in protein functions [29]
Comparison of proteins encoded in numbers of complete genomes from many major phylogenetic lineages and elucidation of consistent patterns of sequence similarities allows the delineation of many clusters of orthologous groups (COGs) [31]

Summary

Introduction

The genetic code is a set of rules that defines how the fourletter code of DNA is translated into the 20-letter code of protein [1, 2]. Codon degeneracy mainly manifests at this position and among the synonymous codons, the one matching the most abundant tRNA usually has the highest frequency [13] This coupling pattern has been thought to benefit translation efficiency [14]. All coding genes could be divided into two unequal clusters according to the relative base frequencies of A and T at the second codon position, and the coding genes in the two clusters exhibited significant difference in protein functions [29]. We hypothesized that this pattern might appear widely in the prokaryotic domain and that it could be connected with gene function. We revealed a consistent of A 2–T2 associated clustering pattern and consistent functional influence in prokaryotes

Genome Data Collection

K‐Means Algorithm

Silhouette Coefficient Analysis

Chi‐Squared Test

F Plarge

Bacteria Taxonomy

Grouping Protein Coding Genes by Base Frequency at the Second Codon Position

Silhouette Coefficient to Measure the Optimal Cluster Number

Biased Functional Distribution of Genes in the Two Clusters

Discussions

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Consistent Clustering Pattern of Prokaryotic Genes Based on Base Frequency at the Second Codon Position and its Association with Functional Category Preference

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Interdisciplinary Sciences: Computational Life Sciences

Lead the way for us

Journal: Interdisciplinary Sciences: Computational Life Sciences	Publication Date: Nov 24, 2021
License type: open-access

Similar Papers

Joint scaling laws in functional and evolutionary categories in prokaryotic genomes
J Grilli ... B Bassetti
Nucleic Acids Research | VOL. 40
J Grilli, et. al.J Grilli ... B Bassetti
21 Sep 2011
Nucleic Acids Research | VOL. 40

Quantitative elucidation of associations between nucleotide identity and physicochemical properties of amino acids and the functional insight
Yan-Ting Jin ... Ju Wang
Computational and Structural Biotechnology Journal | VOL. 19
Yan-Ting Jin, et. al.Yan-Ting Jin ... Ju Wang
01 Jan 2020
Computational and Structural Biotechnology Journal | VOL. 19

Intragenic Spatial Patterns of Codon Usage Bias in Prokaryotic and Eukaryotic Genomes
Hong Qin ... Wei Biao Wu
Genetics | VOL. 168
Hong Qin, et. al.Hong Qin ... Wei Biao Wu
01 Dec 2004
Genetics | VOL. 168

A universal compositional correlation among codon positions
Giuseppe D'Onofrio ... Giorgio Bernardi
Gene | VOL. 110
Giuseppe D'Onofrio, et. al.Giuseppe D'Onofrio ... Giorgio Bernardi
01 Jan 1992
Gene | VOL. 110

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Consistent Clustering Pattern of Prokaryotic Genes Based on Base Frequency at the Second Codon Position and its Association with Functional Category Preference

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Interdisciplinary Sciences: Computational Life Sciences