Abstract

MotivationOur work is motivated by an interest in constructing a protein–protein interaction network that captures key features associated with Parkinson’s disease. While there is an abundance of subnetwork construction methods available, it is often far from obvious which subnetwork is the most suitable starting point for further investigation.ResultsWe provide a method to assess whether a subnetwork constructed from a seed list (a list of nodes known to be important in the area of interest) differs significantly from a randomly generated subnetwork. The proposed method uses a Monte Carlo approach. As different seed lists can give rise to the same subnetwork, we control for redundancy by constructing a minimal seed list as the starting point for the significance test. The null model is based on random seed lists of the same length as a minimum seed list that generates the subnetwork; in this random seed list the nodes have (approximately) the same degree distribution as the nodes in the minimum seed list. We use this null model to select subnetworks which deviate significantly from random on an appropriate set of statistics and might capture useful information for a real world protein–protein interaction network.Availability and implementationThe software used in this paper are available for download at https://sites.google.com/site/elliottande/. The software is written in Python and uses the NetworkX library. Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • Network sampling is used in many different fields, such as biology (Lim et al, 2006) and sociology (Bernard et al, 2010; Frank and Snijders, 1994)

  • The null model is based on random seed lists of the same length as a minimum seed list that generates the subnetwork; in this random seed list the nodes have the same degree distribution as the nodes in the minimum seed list

  • We find that the networks generated from the expression data seed list under the ‘all shortest paths between seed nodes’ sampling scheme and under the ‘all paths up to length 2’ sampling scheme have significant results under our null model, and may have interesting properties for further analysis for our work on Parkinson’s disease (PD)

Read more

Summary

Introduction

Network sampling is used in many different fields, such as biology (Lim et al, 2006) and sociology (Bernard et al, 2010; Frank and Snijders, 1994). Protein–protein interaction (PPI) networks are sampled to form subnetworks that are associated with the disease or cellular processes of interest e.g. Hwang et al (2008); Lim et al (2006); Gao et al (2011); Goehler et al (2004); Chuang et al. (2007); Sharma et al (2015); Ghiassian et al (2015) An advantage of such sampling is that on a small network an in-depth analysis, such as verifying existing links, may be feasible. Network sampling can reflect empirical limitations such as the availability of partial data for a given network (Bernard et al, 2010; Frank and Snijders, 1994), or the exclusion of vertices that cannot be detected (Salganik, 2006), with consequences for measured network statistics (Kossinets et al, 2006).

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.