New algorithm for the analysis of nucleotide and amino acid evolutionary relationships based on Klein four-group

Nikola Štambuk,Paško Konjevoda,Krunoslav Brčić-Kostić,Josip Baković,Albert Štambuk

doi:10.1016/j.biosystems.2023.105030

Nikola Štambuk, Paško Konjevoda + Show 3 more

https://doi.org/10.1016/j.biosystems.2023.105030

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Phylogenetics is the study of ancestral relationships among biological species. Such sequence analyses are often represented as phylogenetic trees. The branching pattern of each tree and its topology reflect the evolutionary relatedness between analyzed sequences. We present a Klein four-group algorithm (K4A) for the evolutionary analysis of nucleotide and amino acid sequences. Klein four-group set of operators consists of: identity e (U), and three elements—a = transition (C), b = transversion (G) and c = transition-transversion or complementarity (A). We generated Klein four-group based distance matrices of: 1. Cayley table (CK4), 2. Table rows (K4R), 3. Table columns (K4C), and 4. Euclidean 2D distance (K4E). The performance of the matrices was tested on a dataset of RecA proteins in bacteria, eukaryotes (Rad51 homolog) and archaea (RadA homolog). RecA and its functional homologs are found in all species, and are essential for the repair and maintenance of DNA. Consequently, they represent a good model for the study of evolutionary relationship of protein and nucleotide sequences. The ancestral relationship between the sequences was correctly classified by all K4A matrices concerning general topology. All distance matrices exhibited small variations among species, and overall results of tree classification were in agreement with the general patterns obtained by standard BLOSUM and PAM substitution matrices. During the evolution of a code there is a phase of optimization of system rules, the ambiguity of a code is eliminated, and the system starts producing specific components. Klein four-group algorithm is consistent with the concept of ambiguity reduction. It also enables the use of different genetic code table variants optimized for particular transitions in evolution based on biological specificity.

Full Text