Abstract
Hypergraphs are generalizations of graphs where the (hyper)edges can connect any number of vertices. They are powerful tools for representing complex and non-pairwise relationships. However, existing graph computation frameworks cannot accommodate hypergraphs without converting them into graphs, because they do not offer APIs that support (hyper)edges directly. This graph conversion may create excessive replicas and result in very large graphs, causing difficulties in workload balancing. A few tools have been developed for hypergraph partitioning, but they are not general-purpose frameworks for hypergraph processing. In this paper, we propose HyperX, a general-purpose distributed hypergraph processing framework built on top of Spark. HyperX is based on the computation paradigm “Pregel”, which is user-friendly and has been widely adopted by popular graph computation frameworks. To help create balanced workloads for distributed hypergraph processing, we further investigate the hypergraph partitioning problem and propose a novel label propagation partitioning (LPP) algorithm. We conduct extensive experiments using both real and synthetic data. The result shows that HyperX achieves an order of magnitude improvement for running hypergraph learning algorithms compared with graph conversion based approaches in terms of running time, network communication costs, and memory consumption. For hypergraph partitioning, LPP outperforms the baseline algorithms significantly in these measures as well.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Knowledge and Data Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.