Prediction Of Protein Subcellular Localization Research Articles

BackgroundAdvances in sequencing technology over the past decade have resulted in an abundance of sequenced proteins whose function is yet unknown. As such, computational systems that can automatically predict and annotate protein function are in demand. Most computational systems use features derived from protein sequence or protein structure to predict function. In an earlier work, we demonstrated the utility of biomedical literature as a source of text features for predicting protein subcellular location. We have also shown that the combination of text-based and sequence-based prediction improves the performance of location predictors. Following up on this work, for the Critical Assessment of Function Annotations (CAFA) Challenge, we developed a text-based system that aims to predict molecular function and biological process (using Gene Ontology terms) for unannotated proteins. In this paper, we present the preliminary work and evaluation that we performed for our system, as part of the CAFA challenge.ResultsWe have developed a preliminary system that represents proteins using text-based features and predicts protein function using a k-nearest neighbour classifier (Text-KNN). We selected text features for our classifier by extracting key terms from biomedical abstracts based on their statistical properties. The system was trained and tested using 5-fold cross-validation over a dataset of 36,536 proteins. System performance was measured using the standard measures of precision, recall, F-measure and overall accuracy. The performance of our system was compared to two baseline classifiers: one that assigns function based solely on the prior distribution of protein function (Base-Prior) and one that assigns function based on sequence similarity (Base-Seq). The overall prediction accuracy of Text-KNN, Base-Prior, and Base-Seq for molecular function classes are 62%, 43%, and 58% while the overall accuracy for biological process classes are 17%, 11%, and 28% respectively. Results obtained as part of the CAFA evaluation itself on the CAFA dataset are reported as well.ConclusionsOur evaluation shows that the text-based classifier consistently outperforms the baseline classifier that is based on prior distribution, and typically has comparable performance to the baseline classifier that uses sequence similarity. Moreover, the results suggest that combining text features with other types of features can potentially lead to improved prediction performance. The preliminary results also suggest that while our text-based classifier can be used to predict both molecular function and biological process in which a protein is involved, the classifier performs significantly better for predicting molecular function than for predicting biological process. A similar trend was observed for other classifiers participating in the CAFA challenge.

It has been a dream that theoretical biology can be extensively applied in experimental biology to accelerate the understanding of the sophiscated movements in living organisms. A brave assay and an excellent example were represented by enzymology, in which the well-established physico-chemistry is used to describe, to fit, to predict and to improve enzyme reactions. Before the modern bioinformatics, the developments of the combination of theoretical biology and experimental biology have been mainly limited to various classic formulations. The systematic use of graphic rules by Prof. Kuo-Chen Chou and his co-workers has significantly facilitated to deal with complicated enzyme systems. With the recent fast progress of bioinformatics, prediction of protein structures and various protein attributes have been well established by Chou and co-workers, stimulating the experimental biology. For example, their recent method for predicting protein subcellular localization (one of the important attributes of proteins) has been extensively applied by scientific colleagues, yielding many new results with thousands of citations. The research by Prof. Chou is characterized by introducing novel physical concepts as well as powerful and elegant mathematical methods into important biomedical problems, a focus throughout his career, even when facing enormous difficulties. His efforts in 50 years have greatly helped us to realize the dream to make “theoretical and experimental biology in one”. Prof. Richard Giege is well known for his multi-disciplinary research combining physics, chemistry, enzymology and molecular biology. His major focus of study is on the identity of tRNAs and their interactions with aminoacyl-tRNA synthetases (aaRS), which are of critical importance to the fidelity of protein biosynthesis. He and his colleagues have carried out the first crystallization of a tRNA/aaRS complex, that between tRNAAsp and AspRS from yeast. The determination of the complex structure contributed significantly to under- stand the interaction of protein and RNA. From his fine research, they have also found other biological function of these small RNAs. He has developed in parallel appropriate methods for his research, of which the protein crystallogenesis, a name he has coined, is an excellent example. Now macromolecular crystallogenesis has become a developed science. In fact, such contribution has accelerated the development of protein crystallography, stimulating the study of macromolecular structure and function.

Prediction Of Protein Subcellular Localization Research Articles

Related Topics

Articles published on Prediction Of Protein Subcellular Localization

A novel approach for protein subcellular location prediction using amino acid exposure

Image-based classification of protein subcellular location patterns in human reproductive tissue by ensemble learning global and local features

SCLpredT: Ab initio and homology-based prediction of subcellular localization by N-to-1 neural networks

Predicting Protein Subcellular Localization Using the Algorithm of Diversity Finite Coefficient Combined with Artificial Neural Network

Predicting Protein Subcellular Localization Using the Algorithm of Increment of Diversity Combined with Weighted K-Nearest Neighbor

Predicting protein subchloroplast locations with both single and multiple sites via three different modes of Chou's pseudo amino acid compositions

Mining Proteins with Non-Experimental Annotations Based on an Active Sample Selection Strategy for Predicting Protein Subcellular Localization

An image-based multi-label human protein subcellular localization predictor (iLocator) reveals protein mislocalizations in cancer tissues

Predicting multisite protein subcellular locations: progress and challenges

Using radial basis function on the general form of Chou's pseudo amino acid composition and PSSM to predict subcellular locations of proteins with both single and multiple sites

Protein localization prediction using random walks on graphs.

Identifying the singleplex and multiplex proteins based on transductive learning for protein subcellular localization prediction

A Compact Hybrid Feature Vector for an Accurate Prediction of Protein Subcellular Location

Multilabel Learning via Random Label Selection for Protein Subcellular Multilocations Prediction

Protein Function Prediction using Text-based Features extracted from the Biomedical Literature: The CAFA Challenge

GOASVM: A subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou's pseudo-amino acid composition

ILoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins

Virus-ECC-mPLoc: A Multi-Label Predictor for Predicting the Subcellular Localization of Virus Proteins with Both Single and Multiple Sites Based on a General Form of Chou&apos;s Pseudo Amino Acid Composition

EuLoc: a web-server for accurately predict protein subcellular localization in eukaryotes by incorporating various features of sequence segments into the general form of Chou’s PseAAC

Theoretical and experimental biology in one&lt;br&gt;—A symposium in honour of Professor Kuo-Chen Chou’s 50th anniversary and Professor Richard Gieg&#233;’s 40th anniversary of their scientific careers

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Prediction Of Protein Subcellular Localization Research Articles

Related Topics

Articles published on Prediction Of Protein Subcellular Localization

A novel approach for protein subcellular location prediction using amino acid exposure

Image-based classification of protein subcellular location patterns in human reproductive tissue by ensemble learning global and local features

SCLpredT: Ab initio and homology-based prediction of subcellular localization by N-to-1 neural networks

Predicting Protein Subcellular Localization Using the Algorithm of Diversity Finite Coefficient Combined with Artificial Neural Network

Predicting Protein Subcellular Localization Using the Algorithm of Increment of Diversity Combined with Weighted K-Nearest Neighbor

Predicting protein subchloroplast locations with both single and multiple sites via three different modes of Chou's pseudo amino acid compositions

Mining Proteins with Non-Experimental Annotations Based on an Active Sample Selection Strategy for Predicting Protein Subcellular Localization

An image-based multi-label human protein subcellular localization predictor (iLocator) reveals protein mislocalizations in cancer tissues

Predicting multisite protein subcellular locations: progress and challenges

Using radial basis function on the general form of Chou's pseudo amino acid composition and PSSM to predict subcellular locations of proteins with both single and multiple sites

Protein localization prediction using random walks on graphs.

Identifying the singleplex and multiplex proteins based on transductive learning for protein subcellular localization prediction

A Compact Hybrid Feature Vector for an Accurate Prediction of Protein Subcellular Location

Multilabel Learning via Random Label Selection for Protein Subcellular Multilocations Prediction

Protein Function Prediction using Text-based Features extracted from the Biomedical Literature: The CAFA Challenge

GOASVM: A subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou's pseudo-amino acid composition

ILoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins

Virus-ECC-mPLoc: A Multi-Label Predictor for Predicting the Subcellular Localization of Virus Proteins with Both Single and Multiple Sites Based on a General Form of Chou&amp;apos;s Pseudo Amino Acid Composition

EuLoc: a web-server for accurately predict protein subcellular localization in eukaryotes by incorporating various features of sequence segments into the general form of Chou’s PseAAC

Theoretical and experimental biology in one&amp;lt;br&amp;gt;—A symposium in honour of Professor Kuo-Chen Chou’s 50th anniversary and Professor Richard Gieg&amp;#233;’s 40th anniversary of their scientific careers

Virus-ECC-mPLoc: A Multi-Label Predictor for Predicting the Subcellular Localization of Virus Proteins with Both Single and Multiple Sites Based on a General Form of Chou's Pseudo Amino Acid Composition

Theoretical and experimental biology in one<br>—A symposium in honour of Professor Kuo-Chen Chou’s 50th anniversary and Professor Richard Giegé’s 40th anniversary of their scientific careers