Testing of Streaming Data Clustering Algorithm Effectiveness

Anis Fuatovich Galimyanov,Chulpan Bakievna Minnegalieva,Nurgayaz Farhatovich Garifyanov

doi:10.29042/2020-10-5-124-128

Abstract

This article describes the task of streaming data clustering. The task of streaming data processing becomes more and more urgent with the device number increase that produces and process new data. Such devices create endless streams of data at tremendous speed. This article gives the examples of such data streams and the rationale for their processingneed. Cluster flow analysis algorithms differ from classical algorithms due to RAM limitations of a computing device. Both artificial data sets and experimental observations were chosen for stream algorithm testing. The data of chemical gas sensors, as well as information about network connections in the local network, were chosen as such observations. Means and tools were chosen for comparisons between the algorithms. For these purposes, the WEKA and Massive Online Analysis software packages were selected. The article describes the process of working with this software. The data preprocessing process is demonstrated using WEKA. Several algorithms have been tested working with data streams. Clustering results were evaluated using an external quality measure. At the end of the work, they presentedthe graphs of this indicator changes during flow clustering.

Full Text