Abstract

BackgroundThe association between aberrant signal processing by protein kinases and human diseases such as cancer was established long time ago. However, understanding the link between sequence variants in the protein kinase superfamily and the mechanistic complex traits at the molecular level remains challenging: cells tolerate most genomic alterations and only a minor fraction disrupt molecular function sufficiently and drive disease.ResultsKinMutRF is a novel random-forest method to automatically identify pathogenic variants in human kinases. Twenty six decision trees implemented as a random forest ponder a battery of features that characterize the variants: a) at the gene level, including membership to a Kinbase group and Gene Ontology terms; b) at the PFAM domain level; and c) at the residue level, the types of amino acids involved, changes in biochemical properties, functional annotations from UniProt, Phospho.ELM and FireDB. KinMutRF identifies disease-associated variants satisfactorily (Acc: 0.88, Prec:0.82, Rec:0.75, F-score:0.78, MCC:0.68) when trained and cross-validated with the 3689 human kinase variants from UniProt that have been annotated as neutral or pathogenic. All unclassified variants were excluded from the training set. Furthermore, KinMutRF is discussed with respect to two independent kinase-specific sets of mutations no included in the training and testing, Kin-Driver (643 variants) and Pon-BTK (1495 variants). Moreover, we provide predictions for the 848 protein kinase variants in UniProt that remained unclassified.A public implementation of KinMutRF, including documentation and examples, is available online (http://kinmut2.bioinfo.cnio.es). The source code for local installation is released under a GPL version 3 license, and can be downloaded from https://github.com/Rbbt-Workflows/KinMut2.ConclusionsKinMutRF is capable of classifying kinase variation with good performance. Predictions by KinMutRF compare favorably in a benchmark with other state-of-the-art methods (i.e. SIFT, Polyphen-2, MutationAssesor, MutationTaster, LRT, CADD, FATHMM, and VEST). Kinase-specific features rank as the most elucidatory in terms of information gain and are likely the improvement in prediction performance. This advocates for the development of family-specific classifiers able to exploit the discriminatory power of features unique to individual protein families.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2723-1) contains supplementary material, which is available to authorized users.

Highlights

  • The association between aberrant signal processing by protein kinases and human diseases such as cancer was established long time ago

  • KinMutRF is capable of classifying kinase variation with good performance

  • The SPRING [34] method is based on six functional effect scores calculated by existing methods (SIFT, Polyphen2, LRT, MutationTaster, GERP and PhyloP) and five association scores derived from a variety of genomic data sources (Gene Ontology, protein protein interactions, protein sequences, protein domain annotations and gene pathway annotations)

Read more

Summary

Introduction

The association between aberrant signal processing by protein kinases and human diseases such as cancer was established long time ago. A second group of methods (e.g. PMUT [15], SNAP [16], PolyPhen-2 [17], NetDiseaseSNP [18], LS-SNP [19], PhD-SNP [20], MutationTaster [21], VEST [22], SNPs&GO [23], SNPs3D [24], MuD [25], CanPredict [26], CADD [27], PON-P2 [28] and nsSNPAnalyzer [29]) rely on advanced automatic machine learning approaches that integrate prior knowledge in the form of both sequence-based and structure-based features, under the assumption that pathogenic variants will disrupt normal protein function and structural stability. The combination of the predictions from the classifier with annotations extracted from the literature and other sources, facilitates the mechanistical interpretation of the consequences of the variants [45]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call