Abstract

Computational methods for predicting the subcellular localization of bacterial proteins play a crucial role in the ongoing efforts to annotate the function of these proteins and to suggest potential drug targets. These methods, used in combination with other experimental and computational methods, can play an important role in biomedical research by annotating the proteomes of a wide variety of bacterial species. We use the ngLOC method, a Bayesian classifier that pre- dicts the subcellular localization of a protein based on the distribution of n-grams in a curated dataset of experimentally- determined proteins. Subcellular localization was predicted with an overall accuracy of 89.7% and 89.3% for Gram- negative and Gram-positive bacteria protein sequences, respectively. Through the use of a confidence score threshold, we improve the precision to 96.6% while covering 84.4% of Gram-negative bacterial data, and 96.0% while covering 87.9% of Gram-positive data. We use this method to estimate the subcellular proteomes of ten Gram-negative species and five Gram-positive species, covering an average of 74.7% and 80.6% of the proteome for Gram-negative and Gram-positive sequences, respectively. The current method is useful for large-scale analysis and annotation of the subcellular proteomes of bacterial species. We demonstrate that our method has excellent predictive performance while achieving superior pro- teome coverage compared to other popular methods such as PSORTb and PLoc.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call