Abstract

Due to their promise of delivering real-time network insights, today's streaming analytics platforms are increasingly being used in the communications networks where the impact of the insights go beyond sentiment and trend analysis to include real-time detection of security attacks and prediction of network state (i.e., is the network transitioning towards an outage). Current streaming analytics platforms operate under the assumption that arriving traffic is to the order of kilobytes produced at very high frequencies. However, communications networks, especially the telecommunication networks, challenge this assumption because some of the arriving traffic in these networks is to the order of gigabytes, but produced at medium to low velocities. Furthermore, these large datasets may need to be ingested in their entirety to render network insights in real-time. Our interest is to subject today's streaming analytics platforms --- constructed from state-of-the art software components (Kafka, Spark, HDFS, ElasticSearch) --- to traffic densities observed in such communications networks. We find that filtering on such large datasets is best done in a common upstream point instead of being pushed to, and repeated, in downstream components. To demonstrate the advantages of such an approach, we modify Apache Kafka to perform limited native data transformation and filtering, relieving the downstream Spark application from doing this. Our approach outperforms four prevalent analytics pipeline architectures with negligible overhead compared to standard Kafka. (Our modifications to Apache Kafka are publicly available at https://github.com/Esquive/queryable-kafka.git)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.