Amino acid classification based spectrum kernel fusion for protein subnuclear localization.

Suyu Mei,Wang Fei

doi:10.1186/1471-2105-11-s1-s17

Abstract

BackgroundPrediction of protein localization in subnuclear organelles is more challenging than general protein subcelluar localization. There are only three computational models for protein subnuclear localization thus far, to the best of our knowledge. Two models were based on protein primary sequence only. The first model assumed homogeneous amino acid substitution pattern across all protein sequence residue sites and used BLOSUM62 to encode k-mer of protein sequence. Ensemble of SVM based on different k-mers drew the final conclusion, achieving 50% overall accuracy. The simplified assumption did not exploit protein sequence profile and ignored the fact of heterogeneous amino acid substitution patterns across sites. The second model derived the PsePSSM feature representation from protein sequence by simply averaging the profile PSSM and combined the PseAA feature representation to construct a kNN ensemble classifier Nuc-PLoc, achieving 67.4% overall accuracy. The two models based on protein primary sequence only both achieved relatively poor predictive performance. The third model required that GO annotations be available, thus restricting the model's applicability.MethodsIn this paper, we only use the amino acid information of protein sequence without any other information to design a widely-applicable model for protein subnuclear localization. We use K-spectrum kernel to exploit the contextual information around an amino acid and the conserved motif information. Besides expanding window size, we adopt various amino acid classification approaches to capture diverse aspects of amino acid physiochemical properties. Each amino acid classification generates a series of spectrum kernels based on different window size. Thus, (I) window expansion can capture more contextual information and cover size-varying motifs; (II) various amino acid classifications can exploit multi-aspect biological information from the protein sequence. Finally, we combine all the spectrum kernels by simple addition into one single kernel called SpectrumKernel+ for protein subnuclear localization.ResultsWe conduct the performance evaluation experiments on two benchmark datasets: Lei and Nuc-PLoc. Experimental results show that SpectrumKernel+ achieves substantial performance improvement against the previous model Nuc-PLoc, with overall accuracy 83.47% against 67.4%; and 71.23% against 50% of Lei SVM Ensemble, against 66.50% of Lei GO SVM Ensemble.ConclusionThe method SpectrumKernel+ can exploit rich amino acid information of protein sequence by embedding into implicit size-varying motifs the multi-aspect amino acid physiochemical properties captured by amino acid classification approaches. The kernels derived from diverse amino acid classification approaches and different sizes of k-mer are summed together for data integration. Experiments show that the method SpectrumKernel+ significantly outperforms the existing models for protein subnuclear localization.

Highlights

Prediction of protein localization in subnuclear organelles is more challenging than general protein subcelluar localization
The method SpectrumKernel+ can exploit rich amino acid information of protein sequence by embedding into implicit size-varying motifs the multi-aspect amino acid physiochemical properties captured by amino acid classification approaches
The kernels derived from diverse amino acid classification approaches and different sizes of k-mer are summed together for data integration

Summary

Introduction

Prediction of protein localization in subnuclear organelles is more challenging than general protein subcelluar localization. The second model derived the PsePSSM feature representation from protein sequence by averaging the profile PSSM and combined the PseAA feature representation to construct a kNN ensemble classifier Nuc-PLoc, achieving 67.4% overall accuracy. The characteristic difference (e.g. amino acid composition, phylogenetic history, etc.) among the proteins in nucleus is far less distinct than that among proteins from different macro cell compartments, making it hard to achieve satisfactory predictive performance. Shen H et al (2007) [2] derived the PsePSSM feature representation from protein sequence by averaging the profile PSSM and combined the PseAA feature representation to construct a kNN ensemble classifier Nuc-PLoc. Nuc-PLoc divided nucleus into 9 subnuclear locations and achieved 67.4% overall accuracy. We can see that the prediction for subnuclear localization is more difficult than general subcelluar localization

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jan 1, 2010
Citations: 58	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Amino acid classification based spectrum kernel fusion for protein subnuclear localization.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Richard V. Eck (1922–2006): Bioinformatics: In the beginning
John Lee
Protein Science | VOL. 16
John LeeJohn Lee
01 Jul 2007
Richard V. Eck (1922–2006): Bioinformatics: In the beginning
John Lee

Interconnection Between the Protein Solubility and Amino Acid and Dipeptide Compositions
Xiaohui Niu ... Dinyan Chen
Protein & Peptide Letters | VOL. 20
Xiaohui Niu, et. al.Xiaohui Niu ... Dinyan Chen
01 Nov 2012
Protein & Peptide Letters | VOL. 20

Distinguishing Structural and Functional Restraints in Evolution in Order to Identify Interaction Sites
Vijayalakshmi Chelliah ... Simon C Lovell
Journal of Molecular Biology | VOL. 342
Vijayalakshmi Chelliah, et. al.Vijayalakshmi Chelliah ... Simon C Lovell
21 Aug 2004
Journal of Molecular Biology | VOL. 342

PROTEIN LOCAL TERTIARY STRUCTURE PREDICTION BY SUPER GRANULE SUPPORT VECTOR MACHINES WITH CHOU-FASMAN PARAMETER
Bernard Chen ... Yi Pan
International Journal for Computational Biology | VOL. 1
Bernard Chen, et. al.Bernard Chen ... Yi Pan
07 Feb 2012
International Journal for Computational Biology | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Amino acid classification based spectrum kernel fusion for protein subnuclear localization.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics