A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins

Ruchi Verma,Ulrich Melcher

doi:10.1186/1471-2105-13-s15-s9

Abstract

BackgroundMembers of the phylum Proteobacteria are most prominent among bacteria causing plant diseases that result in a diminution of the quantity and quality of food produced by agriculture. To ameliorate these losses, there is a need to identify infections in early stages. Recent developments in next generation nucleic acid sequencing and mass spectrometry open the door to screening plants by the sequences of their macromolecules. Such an approach requires the ability to recognize the organismal origin of unknown DNA or peptide fragments. There are many ways to approach this problem but none have emerged as the best protocol. Here we attempt a systematic way to determine organismal origins of peptides by using a machine learning algorithm. The algorithm that we implement is a Support Vector Machine (SVM).ResultThe amino acid compositions of proteobacterial proteins were found to be different from those of plant proteins. We developed an SVM model based on amino acid and dipeptide compositions to distinguish between a proteobacterial protein and a plant protein. The amino acid composition (AAC) based SVM model had an accuracy of 92.44% with 0.85 Matthews correlation coefficient (MCC) while the dipeptide composition (DC) based SVM model had a maximum accuracy of 94.67% and 0.89 MCC. We also developed SVM models based on a hybrid approach (AAC and DC), which gave a maximum accuracy 94.86% and a 0.90 MCC. The models were tested on unseen or untrained datasets to assess their validity.ConclusionThe results indicate that the SVM based on the AAC and DC hybrid approach can be used to distinguish proteobacterial from plant protein sequences.

Highlights

Members of the phylum Proteobacteria are most prominent among bacteria causing plant diseases that result in a diminution of the quantity and quality of food produced by agriculture
We developed an Support Vector Machine (SVM) model based on amino acid and dipeptide compositions to distinguish between a proteobacterial protein and a plant protein
The amino acid composition (AAC) based SVM model had an accuracy of 92.44% with 0.85 Matthews correlation coefficient (MCC) while the dipeptide composition (DC) based SVM

Summary

Introduction

Members of the phylum Proteobacteria are most prominent among bacteria causing plant diseases that result in a diminution of the quantity and quality of food produced by agriculture. To ameliorate these losses, there is a need to identify infections in early stages. Recent developments in generation nucleic acid sequencing and mass spectrometry open the door to screening plants by the sequences of their macromolecules Such an approach requires the ability to recognize the organismal origin of unknown DNA or peptide fragments. Half of the bacterial species causing major food losses in the world belong to the major phylum Proteobacteria (Figure 1) They are found predominantly in the class Gammaproteobacteria (Xanthomonas, Pseudomonas and Erwinia) and in the class Betaproteobacteria (Ralstonia). Deltaproteobacteria and Epsilonproteobacteria have aerobic genera and curved to spirilloid Wolinella spp., respectively

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Sep 1, 2012
Citations: 55	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

GPCRsclass: a web tool for the classification of amine type of G-protein-coupled receptors
M Bhasin ... G P S Raghava
Nucleic Acids Research | VOL. 33
M Bhasin, et. al.M Bhasin ... G P S Raghava
27 Jun 2005
Nucleic Acids Research | VOL. 33

Using the concept of Chou's pseudo amino acid composition to predict protein solubility: An approach with entropies in information theory
Niu Xiaohui ... Wang Zengzhen
Journal of Theoretical Biology | VOL. 332
Niu Xiaohui, et. al.Niu Xiaohui ... Wang Zengzhen
21 Mar 2013
Journal of Theoretical Biology | VOL. 332

Classification of G-protein coupled receptors at four levels
Q.-B Gao ... Z.-Z Wang
Protein Engineering Design and Selection | VOL. 19
Q.-B Gao, et. al.Q.-B Gao ... Z.-Z Wang
02 Sep 2006
Protein Engineering Design and Selection | VOL. 19

Classification of enzyme functional classes and subclasses using support vector machine
Sanjeev Kumar Yadav ... Amit Bhola
-
Sanjeev Kumar Yadav, et. al.Sanjeev Kumar Yadav ... Amit Bhola
01 Feb 2015
01 Feb 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics