Abstract

Network connection logs have long been recognized as integral to proper network security, maintenance, and performance management. This paper provides a development of distributed systems and write optimized databases: However, even a somewhat sizable network will generate large amounts of logs at very high rates. This paper explains why many storage methods are insufficient for providing real-time analysis on sizable datasets and examines database techniques attempt to address this challenge. We argue that sufficient methods include distributing storage, computation, and write optimized datastructures (WOD). Diventi, a project developed by Sandia National Laboratories, is here used to evaluate the potential of WODs to manage large datasets of network connection logs. It can ingest billions of connection logs at rates over 100,000 events per second while allowing most queries to complete in under one second. Storage and computation distribution are then evaluated using Elastic-search, an open-source distributed search and analytics engine. Then, to provide an example application of these databases, we develop a simple analytic which collects statistical information and classifies IP addresses based upon behavior. Finally, we examine the results of running the proposed analytic in real-time upon broconn (now Zeek) flow data collected by Diventi at IEEE/ACM Supercomputing 2019.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call