Using the concept of Chou's pseudo amino acid composition to predict protein solubility: An approach with entropies in information theory

Niu Xiaohui,Li Nana,Xia Jingbo,Chen Dingyan,Peng Yuehua,Xiao Yang,Wei Weiquan,Wang Dongming,Wang Zengzhen

doi:10.1016/j.jtbi.2013.03.010

Abstract

Protein solubility plays a major role and has strong implication in the proteomics. Predicting the propensity of a protein to be soluble or to form inclusion body is a fundamental and not fairly resolved problem. In order to predict the protein solubility, almost 10,000 protein sequences were downloaded from NCBI. Then the sequences were eliminated for the high homologous similarity by CD-HIT. Thus, there were 5692 sequences remained. Based on protein sequences, amino acid and dipeptide compositions were generally extracted to predict protein solubility. In this study, the entropy in information theory was introduced as another predictive factor in the model. Experiments involving nine different feature vector combinations, including the above-mentioned three kinds of factors, were conducted with support vector machines (SVMs) as prediction engine. Each combination was evaluated by re-substitution test and 10-fold cross-validation test. According to the evaluation results, the accuracies and Matthew's Correlation Coefficient (MCC) values were boosted by the introduction of the entropy. The best combination was the one with amino acid, dipeptide compositions and their entropies. Its accuracy reached 90.34% and Matthew's Correlation Coefficient (MCC) value was 0.7494 in re-substitution test, while 88.12% and 0.7945 respectively for 10-fold cross-validation. In conclusion, the introduction of the entropy significantly improved the performance of the predictive method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Using the concept of Chou's pseudo amino acid composition to predict protein solubility: An approach with entropies in information theory

Abstract

Talk to us

Similar Papers

More From: Journal of Theoretical Biology

Lead the way for us

Journal: Journal of Theoretical Biology	Publication Date: Mar 21, 2013
Citations: 31

Similar Papers

Predicting the protein solubility by integrating chaos games representation and entropy in information theory
Niu Xiaohui ... Li Nana
Expert Systems With Applications | VOL. 41
Niu Xiaohui, et. al.Niu Xiaohui ... Li Nana
31 Aug 2013
Expert Systems With Applications | VOL. 41

Combing ontologies and dipeptide composition for predicting DNA-binding proteins
Loris Nanni ... Alessandra Lumini
Amino Acids | VOL. 34
Loris Nanni, et. al.Loris Nanni ... Alessandra Lumini
04 Jan 2008
Amino Acids | VOL. 34

Prediction of GABA A receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine
Hassan Mohabatkar ... Abolghasem Esmaeili
Journal of Theoretical Biology | VOL. 281
Hassan Mohabatkar, et. al.Hassan Mohabatkar ... Abolghasem Esmaeili
28 Apr 2011
Journal of Theoretical Biology | VOL. 281

Prediction of metalloproteinase family based on the concept of Chou’s pseudo amino acid composition using a machine learning approach
Majid Mohammad Beigi ... Hassan Mohabatkar
Journal of Structural and Functional Genomics | VOL. 12
Majid Mohammad Beigi, et. al.Majid Mohammad Beigi ... Hassan Mohabatkar
01 Dec 2011
Journal of Structural and Functional Genomics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using the concept of Chou's pseudo amino acid composition to predict protein solubility: An approach with entropies in information theory

Abstract

Talk to us

Similar Papers

More From: Journal of Theoretical Biology