Abstract
In the last decade, plenty of biological networks are built from the large scale experimental data produced by the rapidly developing high-throughput techniques as well as literature and other sources. But the huge amount of network data have not been fully utilized due to the limited biological network analysis tools. As a basic and essential bioinformatics method, biological network alignment and querying have been applied in many fields such as predicting new protein-protein interactions (PPI). Although many algorithms were published, the network alignment and querying problems are not solved satisfactorily. In this paper, we extended CNetQ, a novel network querying method based on the conditional random fields model, to solve network alignment problem, by adopting an iterative bi-directional mapping strategy. The new method, called CNetA, was compared with other four methods on fifty simulated and three real PPI network alignment instances by using four structural and five biological measures. The computational experiments on the simulated data, which were generated from a biological network evolutionary model to validate the effectiveness of network alignment methods, show that CNetA gets the best accuracy in terms of both nodes and networks. For the real data, larger biological conserved subnetworks and larger connected subnetworks were identified, compared with the structural-dominated methods and the biological-dominated methods, respectively, which suggests that CNetA can better balances the biological and structural similarities. Further, CNetQ and CNetA have been implemented in a new R package Corbi (http://doc.aporc.org/wiki/Corbi), and freely accessible and easy used web services for CNetQ and CNetA have also been constructed based on the R package. The simulated and real datasets used in this paper are available for downloading at http://doc.aporc.org/wiki/CNetA/.
Highlights
In the systems biology era, more and more biologists focus on the biological systems instead of individual molecules
Since the true alignments for simulated data are known, we evaluated the alignment results by two types of accuracy which are computed as the fractions of correctly aligned nodes in duplicated nodes and all nodes respectively, and the structural measures
Three GO domains biological process, cellular component, and molecular function are abbreviated as BP, CC, MF respectively
Summary
In the systems biology era, more and more biologists focus on the biological systems instead of individual molecules. The biological networks such as proteinprotein interaction (PPI) networks are the most natural and efficient approaches for studying and modeling the complex biological systems. Based on the different trade off strategies between two objectives, we categorized the network alignment methods into three groups: structural-dominated, biological-dominated, and balanced. Due to the large sizes of biological networks, the computational complexity becomes the most important issue for the network alignment methods. New models that can appropriately balance the biological and structural similarities and algorithms that can efficiently and effectively solve the large scale problem are extremely demanded in the fields of systems biology
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.