This article proposes a methodology for establishing a relationship between the change rate of a given gene (relative to a given taxon) together with the amino acid composition of the proteins encoded by this gene and the traits of the species containing this gene. The methodology is illustrated based on the mammalian genes responsible for regulating the circadian rhythms that underlie a number of human disorders, particularly those associated with aging. The methods used are statistical and bioinformatic ones. A systematic search for orthologues, pseudogenes, and gene losses was performed using our previously developed methods. It is demonstrated that the least conserved Fbxl21 gene in the Euarchontoglires superorder exhibits a statistically significant connection of genomic characteristics (the median of dN/dS for a gene relative to all the other orthologous genes of a taxon, as well as the preference or avoidance of certain amino acids in its protein) with species-specific lifespan and body weight. In contrast, no such connection is observed for Fbxl21 in the Laurasiatheria superorder. This study goes beyond the protein-coding genes, since the accumulation of amino acid substitutions in the course of evolution leads to pseudogenization and even gene loss, although the relationship between the genomic characteristics and the species traits is still preserved. The proposed methodology is illustrated using the examples of circadian rhythm genes and proteins in placental mammals, e.g., longevity is connected with the rate of Fbxl21 gene change, pseudogenization or gene loss, and specific amino acid substitutions (e.g., asparagine at the 19th position of the CRY-binding domain) in the protein encoded by this gene.
Read full abstract