Multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization

Suyu Mei

doi:10.1016/j.jtbi.2011.10.015

Abstract

Protein sub-organelle localization, e.g. submitochondria, seems more challenging than general protein subcellular localization, because the determination of protein's micro-level localization within organelle by fluorescent imaging technique would face up with more difficulties. Up to present, there are far few computational methods for protein submitochondria localization, and the existing sequence-based predictive models demonstrate moderate or unsatisfactory performance. Recent researches have demonstrated that gene ontology (GO) is a convincingly effective protein feature for protein subcellular localization. However, the GO information may not be available for novel proteins or sparsely annotated protein subfamilies. In allusion to the problem, we transfer the homology's GO information to the target protein and propose a multi-kernel transfer learning model for protein submitochondria localization (MK-TLM), which substantially extends our previously published work (gene ontology based transfer learning model for protein subcellular localization, GO-TLM). To reduce the risk of performance overestimation, we conduct a more comprehensive survey of the model performance in optimistic case, moderate case and pessimistic case according to the abundance of target protein's GO information. The experiments on submitochondria benchmark datasets show that MK-TLM significantly outperforms the baseline models, and demonstrates excellent performance for novel mitochondria proteins and those mitochondria proteins that belong to the subfamily we know little about.

Full Text