Abstract

Abstract Large information systems comprise different interconnected hardware and software components, that collectively generate large volumes of data. Furthermore, the run-time analysis of such data involves computationally expensive algorithms, and is pivotal to a number of software engineering activities such as, system understanding, diagnostics, and root cause analysis. In a quest to increase the performance of run-time analysis for large sets of logged data, we present an approach that allows for the real time reduction of one or more event streams by utilizing a set of filtering criteria. More specifically, the approach employs a similarity measure that is based on information theory principles, and is applied between the features of the incoming events, and the features of a set of retrieved or constructed events, that we refer to as beacons. The proposed approach is domain and event schema agnostic, can handle infinite data streams using a caching algorithm, and can be parallelized in order to tractably process high volume, high frequency, and high variability data. Experimental results obtained using the KDD’99 and CTU-13 labeled data sets, indicate that the approach is scalable, and can yield highly reduced sets with high recall values with respect to a use case.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.