DivA: detection of non-homologous and very divergent regions in protein sequence alignments.

Marie Lisandra Zepeda Mendoza,Rute R Da Fonseca,Sanne Nygaard

doi:10.1186/1756-0500-7-806

Marie Lisandra Zepeda Mendoza, Rute R Da Fonseca + Show 1 more

Open Access

https://doi.org/10.1186/1756-0500-7-806

Copy DOI

Journal: BMC Research Notes	Publication Date: Nov 18, 2014
Citations: 24	License type: cc-by

Affiliation: University of Copenhagen

Abstract

BackgroundSequence alignments are used to find evidence of homology but sometimes contain regions that are difficult to align which can interfere with the quality of the subsequent analyses. Although it is possible to remove problematic regions manually, this is non-practical in large genome scale studies, and the results suffer from irreproducibility arising from subjectivity. Some automated alignment trimming methods have been developed to remove problematic regions in alignments but these mostly act by removing complete columns or complete sequences from the MSA, discarding a lot of informative sites.FindingsHere we present a tool that identifies Divergent windows in protein sequence Alignments (DivA). DivA makes no assumptions on evolutionary models, and it is ideal for detecting incorrectly annotated segments within individual gene sequences. DivA works with a sliding-window approach to estimate four divergence-based parameters and their outlier values. It then classifies a window of a sequence of an alignment as very divergent (potentially non-homologous) if it presents a combination of outlier values for the four parameters it calculates. The windows classified as very divergent can optionally be masked in the alignment.ConclusionsDivA automatically identifies very divergent and incorrectly annotated genic regions in MSAs avoiding the subjective and time-consuming problem of manual annotation. The output is clear to interpret and allows the user to take more informed decisions for reducing the amount of sequence discarded but still finding the potentially erroneous and non-homologous regions.Electronic supplementary materialThe online version of this article (doi:10.1186/1756-0500-7-806) contains supplementary material, which is available to authorized users.

Highlights

Sequence alignments are used to find evidence of homology but sometimes contain regions that are difficult to align which can interfere with the quality of the subsequent analyses
To test the impact of the size of the dataset, Divergent windows in protein sequence Alignments (DivA) was applied to the datasets of 50, 100, and 200 only-birds alignments
Efficiency tests were applied to the classified windows and the results show that the model performs best with big datasets (Table 2), as expected in a phylogenomics analysis where up to thousands of alignments are concatenated (Additional file 2: Figure S2)

Summary

Introduction

Sequence alignments are used to find evidence of homology but sometimes contain regions that are difficult to align which can interfere with the quality of the subsequent analyses. It is possible to remove problematic regions manually, this is non-practical in large genome scale studies, and the results suffer from irreproducibility arising from subjectivity. Some automated alignment trimming methods have been developed to remove problematic regions in alignments but these mostly act by removing complete columns or complete sequences from the MSA, discarding a lot of informative sites

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DivA: detection of non-homologous and very divergent regions in protein sequence alignments.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Research Notes

Lead the way for us

Similar Papers

Selection of Phylogenetic Models of Molecular Evolution
David Posada
-
David PosadaDavid Posada
15 Jun 2012
15 Jun 2012

Modeling residue usage in aligned protein sequences via maximum likelihood.
W J Bruno
Molecular Biology and Evolution | VOL. 13
W J BrunoW J Bruno
01 Dec 1996
Molecular Biology and Evolution | VOL. 13

Site-Specific Amino Acid Distributions Follow a Universal Shape.
Mackenzie M Johnson ... Claus O Wilke
Journal of molecular evolution | VOL. 88
Mackenzie M Johnson, et. al.Mackenzie M Johnson ... Claus O Wilke
24 Nov 2020
Journal of molecular evolution | VOL. 88

The twilight zone of cis element alignments
Alvaro Sebastian ... Bruno Contreras-Moreira
Nucleic Acids Research | VOL. 41
Alvaro Sebastian, et. al.Alvaro Sebastian ... Bruno Contreras-Moreira
24 Dec 2012
Nucleic Acids Research | VOL. 41

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DivA: detection of non-homologous and very divergent regions in protein sequence alignments.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Research Notes