Abstract

The use of sequence alignments to understand protein families is ubiquitous in molecular biology. High quality alignments are difficult to build and protein alignment remains one of the largest open problems in computational biology. Misalignments can lead to inferential errors about protein structure, folding, function, phylogeny, and residue importance. Identifying alignment errors is difficult because alignments are built and validated on the same primary criteria: sequence conservation. Local covariation identifies systematic misalignments and is independent of conservation. We demonstrate an alignment curation tool, LoCo, that integrates local covariation scores with the Jalview alignment editor. Using LoCo, we illustrate how local covariation is capable of identifying alignment errors due to the reduction of positional independence in the region of misalignment. We highlight three alignments from the benchmark database, BAliBASE 3, that contain regions of high local covariation, and investigate the causes to illustrate these types of scenarios. Two alignments contain sequential and structural shifts that cause elevated local covariation. Realignment of these misaligned segments reduces local covariation; these alternative alignments are supported with structural evidence. We also show that local covariation identifies active site residues in a validated alignment of paralogous structures. Loco is available at https://sourceforge.net/projects/locoprotein/files/

Highlights

  • Multiple sequence alignments are critical for generating and testing hypotheses based on protein structure, function, and phylogeny

  • Positions 1 and 7 were not shifted and so were always randomly assorting relative to the other positions. This alignment was loaded into the LoCo alignment viewer, which uses the existing Jalview codebase but replaces the Quality score with Local Covariation (Materials and Methods)

  • We identified a contiguous segment of high local covariation in the BB40047 alignment of BAliBASE 3

Read more

Summary

Introduction

Multiple sequence alignments are critical for generating and testing hypotheses based on protein structure, function, and phylogeny. Protein alignments are built based on the assumption that each position (column) in the alignment is homologous [1]. Homology is typically validated by demonstrating that two residues occupy the same location in 3D space since structural homology implies sequential homology [2]. If only sequence information is available, positions are assigned based on the conservation of residue identity or properties, which is inherently less reliable than structural inference. The logic of interpreting sequence alignments is, circular: alignments are built, validated, and used based on a single criterion, conservation. A conservation-independent property of sequence alignments is a valuable adjunct to validate a sequence alignment

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.