Amino Acid Substitution Matrices Research Articles

The family Halomonadaceae is the largest family composed of halophilic bacteria, with more than 160 species with validly published names as of July 2023. Several classifications to circumscribe this family are available in major resources, such as those provided by the List of Prokaryotic names with Standing in Nomenclature (LPSN), NCBI Taxonomy, Genome Taxonomy Database (GTDB), and Bergey's Manual of Systematics of Archaea and Bacteria (BMSAB), with some degree of disagreement between them. Moreover, regardless of the classification adopted, the genus Halomonas is not phylogenetically consistent, likely because it has been used as a catch-all for newly described species within the family Halomonadaceae that could not be clearly accommodated in other Halomonadaceae genera. In the past decade, some taxonomic rearrangements have been conducted on the Halomonadaceae based on ribosomal and alternative single-copy housekeeping gene sequence analysis. High-throughput technologies have enabled access to the genome sequences of many type strains belonging to the family Halomonadaceae; however, genome-based studies specifically addressing its taxonomic status have not been performed to date. In this study, we accomplished the genome sequencing of 17 missing type strains of Halomonadaceae species that, together with other publicly available genome sequences, allowed us to re-evaluate the genetic relationship, phylogeny, and taxonomy of the species and genera within this family. The approach followed included the estimate of the Overall Genome Relatedness Indexes (OGRIs) such as the average amino acid identity (AAI), phylogenomic reconstructions using amino acid substitution matrices customized for the family Halomonadaceae, and the analysis of clade-specific signature genes. Based on our results, we conclude that the genus Halovibrio is obviously out of place within the family Halomonadaceae, and, on the other hand, we propose a division of the genus Halomonas into seven separate genera and the transfer of seven species from Halomonas to the genus Modicisalibacter, together with the emendation of the latter. Additionally, data from this study demonstrate the existence of various synonym species names in this family.

Read full abstract

BackgroundSequence matching is extremely important for applications throughout biology, particularly for discovering information such as functional and evolutionary relationships, and also for discriminating between unimportant and disease mutants. At present the functions of a large fraction of genes are unknown; improvements in sequence matching will improve gene annotations. Universal amino acid substitution matrices such as Blosum62 are used to measure sequence similarities and to identify distant homologues, regardless of the structure class. However, such single matrices do not take into account important structural information evident within the different topologies of proteins and treats substitutions within all protein folds identically. Others have suggested that the use of structural information can lead to significant improvements in sequence matching but this has not yet been very effective. Here we develop novel substitution matrices that include not only general sequence information but also have a topology specific component that is unique for each CATH topology. This novel feature of using a combination of sequence and structure information for each protein topology significantly improves the sequence matching scores for the sequence pairs tested. We have used a novel multi-structure alignment method for each homology level of CATH in order to extract topological information.ResultsWe obtain statistically significant improved sequence matching scores for 73 % of the alpha helical test cases. On average, 61 % of the test cases showed improvements in homology detection when structure information was incorporated into the substitution matrices. On average z-scores for homology detection are improved by more than 54 % for all cases, and some individual cases have z-scores more than twice those obtained using generic matrices. Our topology specific similarity matrices also outperform other traditional similarity matrices and single matrix based structure methods. When default amino acid substitution matrix in the Psi-blast algorithm is replaced by our structure-based matrices, the structure matching is significantly improved over conventional Psi-blast. It also outperforms results obtained for the corresponding HMM profiles generated for each topology.ConclusionsWe show that by incorporating topology-specific structure information in addition to sequence information into specific amino acid substitution matrices, the sequence matching scores and homology detection are significantly improved. Our topology specific similarity matrices outperform other traditional similarity matrices, single matrix based structure methods, also show improvement over conventional Psi-blast and HMM profile based methods in sequence matching. The results support the discriminatory ability of the new amino acid similarity matrices to distinguish between distant homologs and structurally dissimilar pairs.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1198-z) contains supplementary material, which is available to authorized users.

Read full abstract

Amino Acid Substitution Matrices Research Articles

Articles published on Amino Acid Substitution Matrices

LearnMSA2: deep protein multiple alignments with large language and hidden Markov models.

A long-awaited taxogenomic investigation of the family Halomonadaceae.

Idpr: A package for profiling and analyzing Intrinsically Disordered Proteins in R.

Protsubs: a series of substitution matrices reflecting relationships between protein evolution and structure

A thermodynamic model of protein structure evolution explains empirical amino acid substitution matrices.

Evolutionary models of amino acid substitutions based on the tertiary structure of their neighborhoods.

Evolutionary and functional lessons from human-specific amino acid substitution matrices.

Substitution scoring matrices for proteins - An overview.

Antigenicity prediction and vaccine recommendation of human influenza virus A (H3N2) using convolutional neural networks

Construction and Analysis of Amino Acid Substitution Matrices for Optimal Alignment of Microbial Rhodopsin Sequences

3D deep convolutional neural networks for amino acid environment similarity analysis

Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix.

Fold-specific sequence scoring improves protein sequence matching.

The tangled bank of amino acids.

Systematic Exploration of an Efficient Amino Acid Substitution Matrix: MIQS.

PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids.

遠縁タンパク質検索に適した新規アミノ酸置換行列

An Approach for a Substitution Matrix Based on Protein Blocks and Physicochemical Properties of Amino Acids through PCA

3D representations of amino acids—applications to protein sequence comparison and classification

Revisiting amino acid substitution matrices for identifying distantly related proteins

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Amino Acid Substitution Matrices Research Articles

Articles published on Amino Acid Substitution Matrices

LearnMSA2: deep protein multiple alignments with large language and hidden Markov models.

A long-awaited taxogenomic investigation of the family Halomonadaceae.

Idpr: A package for profiling and analyzing Intrinsically Disordered Proteins in R.

Protsubs: a series of substitution matrices reflecting relationships between protein evolution and structure

A thermodynamic model of protein structure evolution explains empirical amino acid substitution matrices.

Evolutionary models of amino acid substitutions based on the tertiary structure of their neighborhoods.

Evolutionary and functional lessons from human-specific amino acid substitution matrices.

Substitution scoring matrices for proteins - An overview.

Antigenicity prediction and vaccine recommendation of human influenza virus A (H3N2) using convolutional neural networks

Construction and Analysis of Amino Acid Substitution Matrices for Optimal Alignment of Microbial Rhodopsin Sequences

3D deep convolutional neural networks for amino acid environment similarity analysis

Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix.

Fold-specific sequence scoring improves protein sequence matching.

The tangled bank of amino acids.

Systematic Exploration of an Efficient Amino Acid Substitution Matrix: MIQS.

PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids.

遠縁タンパク質検索に適した新規アミノ酸置換行列

An Approach for a Substitution Matrix Based on Protein Blocks and Physicochemical Properties of Amino Acids through PCA

3D representations of amino acids—applications to protein sequence comparison and classification

Revisiting amino acid substitution matrices for identifying distantly related proteins