Abstract

Given a regulatory pathway system consisting of a set of proteins, can we predict which pathway class it belongs to? Such a problem is closely related to the biological function of the pathway in cells and hence is quite fundamental and essential in systems biology and proteomics. This is also an extremely difficult and challenging problem due to its complexity. To address this problem, a novel approach was developed that can be used to predict query pathways among the following six functional categories: (i) “Metabolism”, (ii) “Genetic Information Processing”, (iii) “Environmental Information Processing”, (iv) “Cellular Processes”, (v) “Organismal Systems”, and (vi) “Human Diseases”. The prediction method was established trough the following procedures: (i) according to the general form of pseudo amino acid composition (PseAAC), each of the pathways concerned is formulated as a 5570-D (dimensional) vector; (ii) each of components in the 5570-D vector was derived by a series of feature extractions from the pathway system according to its graphic property, biochemical and physicochemical property, as well as functional property; (iii) the minimum redundancy maximum relevance (mRMR) method was adopted to operate the prediction. A cross-validation by the jackknife test on a benchmark dataset consisting of 146 regulatory pathways indicated that an overall success rate of 78.8% was achieved by our method in identifying query pathways among the above six classes, indicating the outcome is quite promising and encouraging. To the best of our knowledge, the current study represents the first effort in attempting to identity the type of a pathway system or its biological function. It is anticipated that our report may stimulate a series of follow-up investigations in this new and challenging area.

Highlights

  • During the past decade, much information on different organisms has been accumulated at both the genetic and metabolic levels; many specific databases, such as KEGG/LIGAND [1,2,3,4], ENZYME [5], BRENDA [6], EcoCyc and MetaCyc [7,8], have been developed

  • In our previous work [15], we developed a model to predict whether a regulatory pathway can be formed for a system consisting of certain number of different proteins

  • It was run with default parameters and two feature lists were obtained by executing minimum redundancy maximum relevance (mRMR) program: (i) MaxRel features list; (ii) mRMR features list

Read more

Summary

Introduction

Much information on different organisms has been accumulated at both the genetic and metabolic levels; many specific databases, such as KEGG/LIGAND [1,2,3,4], ENZYME [5], BRENDA [6], EcoCyc and MetaCyc [7,8], have been developed. Biological meaningful pathways, such as the regulatory pathway and metabolic pathway, are still poorly understood. KEGG (Kyoto Encyclopedia of Genes and Genomes) [1,2,3,4] is a collection of online databases for dealing with genomes, enzymatic pathways, and biological chemicals. KEGG contains five main databases [4]: (i) KEGG Atlas, (ii) KEGG Pathway, (iii) KEGG Genes, (iv) KEGG Ligand, and (v) KEGG BRITE.

Objectives
Methods
Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.