Abstract

BackgroundRecent genomic and bioinformatic advances have motivated the development of numerous network models intending to describe graphs of biological, technological, and sociological origin. In most cases the success of a model has been evaluated by how well it reproduces a few key features of the real-world data, such as degree distributions, mean geodesic lengths, and clustering coefficients. Often pairs of models can reproduce these features with indistinguishable fidelity despite being generated by vastly different mechanisms. In such cases, these few target features are insufficient to distinguish which of the different models best describes real world networks of interest; moreover, it is not clear a priori that any of the presently-existing algorithms for network generation offers a predictive description of the networks inspiring them.ResultsWe present a method to assess systematically which of a set of proposed network generation algorithms gives the most accurate description of a given biological network. To derive discriminative classifiers, we construct a mapping from the set of all graphs to a high-dimensional (in principle infinite-dimensional) "word space". This map defines an input space for classification schemes which allow us to state unambiguously which models are most descriptive of a given network of interest. Our training sets include networks generated from 17 models either drawn from the literature or introduced in this work. We show that different duplication-mutation schemes best describe the E. coli genetic network, the S. cerevisiae protein interaction network, and the C. elegans neuronal network, out of a set of network models including a linear preferential attachment model and a small-world model.ConclusionsOur method is a first step towards systematizing network models and assessing their predictability, and we anticipate its usefulness for a number of communities.

Highlights

  • Recent genomic and bioinformatic advances have motivated the development of numerous network models intending to describe graphs of biological, technological, and sociological origin

  • Researchers studying the topology of the Internet [1] and the World Wide Web [2] attempted to summarize these topologies via statistical quantities, primarily the distribution P(k) over nodes of given connectivity or degree k, which was found to be completely unlike that of a "random" or Erdös-Rényi graph

  • We apply our method to three different real data sets: the E. coli genetic network [10], the S. cerevisiae pro

Read more

Summary

Introduction

Recent genomic and bioinformatic advances have motivated the development of numerous network models intending to describe graphs of biological, technological, and sociological origin. As a consequence many mathematicians concentrated on (i) measuring the degree distributions of many technological, sociological, and biological graphs (which generically, it turned out, obeyed such power-law distributions) and (ii) proposing various models of randomly-generated graph topologies which could reproduce these degree distributions (cf [3] for a thorough review). The success of these latter efforts reveals a conundrum for mathematical modeling: a metric which is universal (rather than discriminative) cannot be used for choosing the model which best describes a network of interest. The question posed is one of classification, meaning the construction of an algorithm, based on training data from multiple classes, which can place data of interest within one of the classes with small test loss

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call