PaPI: pseudo amino acid composition to score human protein-coding variants.

Ivan Limongelli,Riccardo Bellazzi,Simone Marini

doi:10.1186/s12859-015-0554-8

Ivan Limongelli, Riccardo Bellazzi + Show 1 more

Open Access

https://doi.org/10.1186/s12859-015-0554-8

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Apr 19, 2015
Citations: 91	License type: CC BY 4.0

Affiliation: University of Pavia

Abstract

BackgroundHigh throughput sequencing technologies are able to identify the whole genomic variation of an individual. Gene-targeted and whole-exome experiments are mainly focused on coding sequence variants related to a single or multiple nucleotides. The analysis of the biological significance of this multitude of genomic variant is challenging and computational demanding.ResultsWe present PaPI, a new machine-learning approach to classify and score human coding variants by estimating the probability to damage their protein-related function. The novelty of this approach consists in using pseudo amino acid composition through which wild and mutated protein sequences are represented in a discrete model. A machine learning classifier has been trained on a set of known deleterious and benign coding variants with the aim to score unobserved variants by taking into account hidden sequence patterns in human genome potentially leading to diseases. We show how the combination of amphiphilic pseudo amino acid composition, evolutionary conservation and homologous proteins based methods outperforms several prediction algorithms and it is also able to score complex variants such as deletions, insertions and indels.ConclusionsThis paper describes a machine-learning approach to predict the deleteriousness of human coding variants. A freely available web application (http://papi.unipv.it) has been developed with the presented method, able to score up to thousands variants in a single run.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0554-8) contains supplementary material, which is available to authorized users.

Highlights

High throughput sequencing technologies are able to identify the whole genomic variation of an individual
While Random Forest (RF) classifiers have been already used in Genomics, from GWAS to RNA-protein binding prediction [38], to our knowledge, this is the first time that pseudo amino acid composition (PseAAC) is applied to protein variant prediction
Area under the curve (AUC), accuracy with 95% confidence interval, sensitivity (Sens), specificity (Spec), Positive Predictive Value (PPV), Negative Predictive Value (NPV), F-measure (F-m) and Matthews correlation coefficient (MCC) are reported for each method

Summary

Results

We present PaPI, a new machine-learning approach to classify and score human coding variants by estimating the probability to damage their protein-related function. The novelty of this approach consists in using pseudo amino acid composition through which wild and mutated protein sequences are represented in a discrete model. A machine learning classifier has been trained on a set of known deleterious and benign coding variants with the aim to score unobserved variants by taking into account hidden sequence patterns in human genome potentially leading to diseases. We show how the combination of amphiphilic pseudo amino acid composition, evolutionary conservation and homologous proteins based methods outperforms several prediction algorithms and it is able to score complex variants such as deletions, insertions and indels

Conclusions

Background

Results and discussion

Methods

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PaPI: pseudo amino acid composition to score human protein-coding variants.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes.
Kuo-Chen Chou
Bioinformatics | VOL. 21
Kuo-Chen ChouKuo-Chen Chou
12 Aug 2004
Bioinformatics | VOL. 21

Prediction of Protein Subcellular Multi-Localization Based on the General form of Chou’s Pseudo Amino Acid Composition
Li-Qi Li ... Yuan Zhang
Protein & Peptide Letters | VOL. 19
Li-Qi Li, et. al.Li-Qi Li ... Yuan Zhang
01 Apr 2012
Protein & Peptide Letters | VOL. 19

Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology
Kuo-Chen Chou
Current Proteomics | VOL. 6
Kuo-Chen ChouKuo-Chen Chou
01 Dec 2009
Current Proteomics | VOL. 6

CE-PLoc: An ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition
Asifullah Khan ... Maqsood Hayat
Computational Biology and Chemistry | VOL. 35
Asifullah Khan, et. al.Asifullah Khan ... Maqsood Hayat
26 May 2011
Computational Biology and Chemistry | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PaPI: pseudo amino acid composition to score human protein-coding variants.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics