Abstract

Streaming graph data mining has become a significant issue in high performance graph mining due to the increasing appearance of graph data sets as streams. In this paper we propose Acacia-Stream which is a scalable distributed streaming graph database engine developed with X10 programming language. Graph streams are partitioned using a streaming graph partitioner algorithm in Acacia-Stream and streaming graph processing queries are run on the graph streams. The partitioned data sets are persisted on secondary storage across X10 places. We investigate on the use of three different streaming graph partitioner algorithms called hash, Linear Deterministic Greedy, and Fennel algorithms and report their performance. Furthermore, to demonstrate Acacia-Stream's streaming graph processing capabilities we implement streaming triangle counting with Acacia-Stream. We present performance results gathered from Acacia-Stream with different large scale streaming data sets in both horizontal and vertical scalability experiments. Furthermore, we compare streaming graph loading performance of Acacia-Stream with Neo4j and Oracle's PGX graph database servers. From these experiments we observed that Acacia-Stream's Fennel partitioner based graph uploader can upload a 948MB rmat22 graph in 1283.42 seconds which is 38% faster than PGX graph database server and 12.8 times faster than Neo4j database server. Acacia-Stream's Streaming Partitioner's batch size adjustments based optimizations reduced the time used by the network communications almost by half.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.