Abstract
The generation of protein-protein interactions (PPIs) has created the need for efficient computational approaches that can discover highly modular clusters of good quality. These clusters represent protein complexes or functional modules. There are a number of seed-growth style algorithms that exist to identify protein complexes from the genome-wide PPI networks. However, these methods lose accuracy when the networks are comparatively large and have complex connectivity. To combat the noise that exists in these large PPI networks, we propose an improvement to the graph entropy approach which is one of the seed-growth style algorithms. As a novel information-theoretic definition, Graph Entropy is a measure of the structural complexity of a graph. For example, the loss of entropy represents an increase in modularity of the graph. The original algorithm only considers the interconnected nature of vertices, but the new modified definition now considers edge weights. These edge weights are achieved by measuring the semantic similarity of PPIs. The weighted graph entropy approach is applied to the S. cerevisiae PPI data set from BioGRID. The output clusters are compared with known protein complexes so that we can calculate /-scores and use them to evaluate the clusters accuracy. The proposed improvement to the graph entropy approach proves to enhance the quality of clusters as potential protein complexes when compared to the other seed-growth style algorithms.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.