Streaming data are present all around us. From traditional radio systems streaming audio to today’s connected end-user devices constantly sending information or accessing services, data are flowing constantly between nodes across various networks. The demand for appropriate outlier detection (OD) methods in the fields of fault detection, special events detection, and malicious activities detection and prevention is not only persistent over time but increasing, especially with the recent developments in Telecommunication systems such as Fifth Generation (5G) networks facilitating the expansion of the Internet of Things (IoT). The process of selecting a computationally efficient OD method, adapted for a specific field and accounting for the existence of empirical data, or lack thereof, is non-trivial. This paper presents a thorough survey of OD methods, categorized by the applications they are implemented in, the basic assumptions that they use according to the characteristics of the streaming data, and a summary of the emerging challenges, such as the evolving structure and nature of the data and their dimensionality and temporality. A categorization of commonly used datasets in the context of streaming data is produced to aid data source identification for researchers in this field. Based on this, guidelines for OD method selection are defined, which consider flexibility and sample size requirements and facilitate the design of such algorithms in Telecommunications and other industries.
Read full abstract