Abstract

Given a protein-forming system, i.e., a system consisting of certain number of different proteins, can it form a biologically meaningful pathway? This is a fundamental problem in systems biology and proteomics. During the past decade, a vast amount of information on different organisms, at both the genetic and metabolic levels, has been accumulated and systematically stored in various specific databases, such as KEGG, ENZYME, BRENDA, EcoCyc and MetaCyc. These data have made it feasible to address such an essential problem. In this paper, we have analyzed known regulatory pathways in humans by extracting different (biological and graphic) features from each of the 17,069 protein-formed systems, of which 169 are positive pathways, i.e., known regulatory pathways taken from KEGG; while 16,900 were negative, i.e., not formed as a biologically meaningful pathway. Each of these protein-forming systems was represented by 352 features, of which 88 are graph features and 264 biological features. To analyze these features, the “Minimum Redundancy Maximum Relevance” and the “Incremental Feature Selection” techniques were utilized to select a set of 22 optimal features to query whether a protein-forming system is able to form a biologically meaningful pathway or not. It was found through cross-validation that the overall success rate thus obtained in identifying the positive pathways was 79.88%. It is anticipated that, this novel approach and encouraging result, although preliminary yet, may stimulate extensive investigations into this important topic.

Highlights

  • During the past decade, the continuous development of high-throughput experimental technologies has increased the sizes of large-scale datasets, including both metagenomes and personal genomes, which necessitate renewed efforts to develop computational technologies for better biological interpretation of all this data

  • KEGG (Kyoto Encyclopedia of Genes and Genomes) [1,2,7] is a widely used knowledge database for the systematic analysis of gene functions in terms of the interactions between genes and molecules; it consists of graphical diagrams of biochemical pathways, including most of the known metabolic pathways and some of the known regulatory pathways

  • Of the 352 features, 88 were graph ones, meaning that each pathway was treated as a graph; and 264 were derived from protein biological properties

Read more

Summary

Introduction

The continuous development of high-throughput experimental technologies has increased the sizes of large-scale datasets, including both metagenomes and personal genomes, which necessitate renewed efforts to develop computational technologies for better biological interpretation of all this data. KEGG BRITE is an ontology database, which represents functional hierarchies of various biological objects, including molecules, cells, organisms, diseases and drugs, as well as relationships among them [8,9]. In these databases, experimental knowledge is organized and diagramed as smaller networks, and web interfaces and visualization tools have been developed to overview and analyze computationally generated global networks [10,11,12]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.