Abstract

Network alignment (NA) compares networks with the goal of finding a node mapping that uncovers highly similar (conserved) network regions. Existing NA methods are homogeneous, i.e., they can deal only with networks containing nodes and edges of one type. Due to increasing amounts of heterogeneous network data with nodes or edges of different types, we extend three recent state-of-the-art homogeneous NA methods, WAVE, MAGNA++, and SANA, to allow for heterogeneous NA for the first time. We introduce several algorithmic novelties. Namely, these existing methods compute homogeneous graphlet-based node similarities and then find high-scoring alignments with respect to these similarities, while simultaneously maximizing the amount of conserved edges. Instead, we extend homogeneous graphlets to their heterogeneous counterparts, which we then use to develop a new measure of heterogeneous node similarity. Also, we extend S3, a state-of-the-art measure of edge conservation for homogeneous NA, to its heterogeneous counterpart. Then, we find high-scoring alignments with respect to our heterogeneous node similarity and edge conservation measures. In evaluations on synthetic and real-world biological networks, our proposed heterogeneous NA methods lead to higher-quality alignments and better robustness to noise in the data than their homogeneous counterparts. The software and data from this work is available at https://nd.edu/~cone/colored_graphlets/.

Highlights

  • Due to advancements of biotechnologies for data collection, increasing amounts of biological network data are becoming available[1,2,3,4]

  • Given node- or edge-colored graphlets, analogous to the graphlet degree vector (GDV) of a node in a homogeneous network, we summarize the extended neighborhood of a node in a heterogeneous network with its node-colored GDV (NCGDV) or edge-colored GDV (ECGDV)

  • We compare each of homogeneous WAVE, MAGNA++, and SANA to its heterogeneous counterpart

Read more

Summary

Introduction

Due to advancements of biotechnologies for data collection, increasing amounts of biological network data are becoming available[1,2,3,4]. GNA aims to find an overall node mapping between compared networks, which often results in the aligned network regions being large but suboptimally conserved[19,20,21,22,23,24,25,26,27,28,29,30,31]. Both LNA and GNA have (dis)advantages[32,33]. Computationally complex than PNA34, and because current PNA methods are more accurate than current MNA methods[35], we focus on PNA, but our work can be generalized to MNA as well

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.