Abstract

Background: A common task in molecular network analysis is the detection of community structures or modules. Such modules are frequently associated with shared biological functions and are often disrupted in disease. Detection of community structure entails clustering nodes in the graph, and many algorithms apply a clustering algorithm on an input node embedding. Graph representation learning offers a powerful framework to learn node embeddings to perform various downstream tasks such as clustering. Deep embedding methods based on graph neural networks can have substantially better performance on machine learning tasks on graphs, including module detection; however, existing studies have focused on social and citation networks. It is currently unclear if deep embedding methods offer any advantage over shallow embedding methods for detecting modules in molecular networks. Methods: Here, we investigated deep and shallow graph representation learning algorithms on synthetic and real cell-type specific gene interaction networks to detect gene modules and identify pathways affected by sequence nucleotide polymorphisms. We used multiple criteria to assess the quality of the clusters based on connectivity as well as overrepresentation of biological processes. Results: On synthetic networks, deep embedding based on a variational graph autoencoder had superior performance as measured by modularity metrics, followed closely by shallow methods, node2vec and Graph Laplacian embedding. However, the performance of the deep methods worsens when the overall connectivity between clusters increases. On real molecular networks, deep embedding methods did not have a clear advantage and the performance depended upon the properties of the graph and the metrics. Conclusions: Deep graph representation learning algorithms for module detection-based tasks can be beneficial for some biological networks, but the performance depends upon the metrics and graph properties. Across different network types, Graph Laplacian embedding followed by node2vec are the best performing algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call