LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.

Renaud Vanhoutreve,Baptiste Legrand,Olivier Poch,Hélène Gass,Arnaud Kress,Julie D Thompson

doi:10.1186/s12859-016-1146-y

Renaud Vanhoutreve, Baptiste Legrand + Show 4 more

Open Access

https://doi.org/10.1186/s12859-016-1146-y

Copy DOI

Abstract

BackgroundA standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference. Applications include 3D structure modelling, protein functional annotation, prediction of molecular interactions, etc. These applications, however sophisticated, are generally highly sensitive to the alignment used, and neglecting non-homologous or uncertain regions in the alignment can lead to significant bias in the subsequent inferences.ResultsHere, we present a new method, LEON-BIS, which uses a robust Bayesian framework to estimate the homologous relations between sequences in a protein multiple alignment. Sequences are clustered into sub-families and relations are predicted at different levels, including ‘core blocks’, ‘regions’ and full-length proteins. The accuracy and reliability of the predictions are demonstrated in large-scale comparisons using well annotated alignment databases, where the homologous sequence segments are detected with very high sensitivity and specificity.ConclusionsLEON-BIS uses robust Bayesian statistics to distinguish the portions of multiple sequence alignments that are conserved either across the whole family or within subfamilies. LEON-BIS should thus be useful for automatic, high-throughput genome annotations, 2D/3D structure predictions, protein-protein interaction predictions etc.

Highlights

A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference
The alignments have a high proportion of sequences with ‘discrepancies’ that may correspond to naturally occurring variants or may be the result of artifacts, including proteins translated from partially sequenced genomes or ESTs, or badly predicted protein sequences
Sequence-level homology analysis To evaluate the accuracy of LEON-BIS for the detection of related and unrelated sequences, we constructed a large scale test set, based on the latest multiple alignments in the BAliBASE benchmark suite [23]

Summary

Introduction

A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference. Applications include 3D structure modelling, protein functional annotation, prediction of molecular interactions, etc These applications, sophisticated, are generally highly sensitive to the alignment used, and neglecting non-homologous or uncertain regions in the alignment can lead to significant bias in the subsequent inferences. Multiple alignments of protein sequences are a fundamental tool in many areas of molecular biology, including phylogenetic studies, prediction of 2D/3D structure, or propagation of structural/functional information from annotated to non-annotated sequences. All of these applications rely on the identification of the conserved regions in the alignments, suggesting potential homologous relations between the sequences. Numerous column scores have been defined that attempt to distinguish the positions that are conserved in all the sequences from the unreliable positions, e.g. [4, 5].

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jul 7, 2016
Citations: 38	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment
Hiroyuki Fukuda ... Kentaro Tomii
BMC Bioinformatics | VOL. 21
Hiroyuki Fukuda, et. al.Hiroyuki Fukuda ... Kentaro Tomii
09 Jan 2020
BMC Bioinformatics | VOL. 21

Assessing the Discordance of Multiple Sequence Alignments
A Prakash ... M Tompa
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 6
A Prakash, et. al.A Prakash ... M Tompa
01 Oct 2009
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 6

Matt: local flexibility aids protein multiple structure alignment.
Matthew Menke ... Lenore Cowen
PLoS Computational Biology | VOL. 4
Matthew Menke, et. al.Matthew Menke ... Lenore Cowen
01 Jan 2008
PLoS Computational Biology | VOL. 4

Comprehensive Study of Instable Regions in Pseudomonas Aeruginosa
Dan Wang ... Lusheng Wang
-
Dan Wang, et. al.Dan Wang ... Lusheng Wang
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics