Abstract
Data from emerging applications, such as cybersecurity and social networking, can be abstracted as graphs whose edges are updated sequentially in the form of a stream. The challenging problem of interactive graph stream analytics is the quick response of the queries on terabyte and beyond graph stream data from end users. In this paper, a succinct and efficient double index data structure is designed to build the sketch of a graph stream to meet general queries. A single pass stream model, which includes general sketch building, distributed sketch based analysis algorithms and regression based approximation solution generation, is developed, and a typical graph algorithm—triangle counting—is implemented to evaluate the proposed method. Experimental results on power law and normal distribution graph streams show that our method can generate accurate results (mean relative error less than 4%) with a high performance. All our methods and code have been implemented in an open source framework, Arkouda, and are available from our GitHub repository, Bader-Research. This work provides the large and rapidly growing Python community with a powerful way to handle terabyte and beyond graph stream data using their laptops.
Highlights
In order to evaluate the accuracy of our method, we conducted experiments on nine normal distribution and 12 power law distribution benchmark graphs with shrink factors
We will use the exact number of triangles in the three sub-sketches to provide the approximate number of triangles in the given graph stream
High productivity means that the end users can use a popular data science language such as Python to explore different graph streams
Summary
The sheer volume of a stream could be petabytes, or more like network traffic analysis in the IPv6 network, which has 2128 nodes These applications motivate the challenging problem of designing succinct data structures and real-time algorithms to return approximate solutions. Different stream models [2,3,4,5,6,7], which allow access to the data in only one pass (or multi-pass in semi-streaming algorithms), are developed to answer different queries using a significantly reduced sublinear space The accuracy of such models’ approximate solutions is often guaranteed by a pair of user specified parameters, e > 0 and 0 < δ < 1. This literature provides the theoretical foundation to solve the challenging large scale stream data problem
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.