Dynamic Sparse Subspace Clustering for Evolving High-Dimensional Data Streams.

Jinping Sui,Alexander Jung,Xiang Li,Li Liu,Zhen Liu

doi:10.1109/tcyb.2020.3023973

Jinping Sui, Alexander Jung + Show 3 more

Open Access

https://doi.org/10.1109/tcyb.2020.3023973

Copy DOI

Abstract

In an era of ubiquitous large-scale evolving data streams, data stream clustering (DSC) has received lots of attention because the scale of the data streams far exceeds the ability of expert human analysts. It has been observed that high-dimensional data are usually distributed in a union of low-dimensional subspaces. In this article, we propose a novel sparse representation-based DSC algorithm, called evolutionary dynamic sparse subspace clustering (EDSSC). It can cope with the time-varying nature of subspaces underlying the evolving data streams, such as subspace emergence, disappearance, and recurrence. The proposed EDSSC consists of two phases: 1) static learning and 2) online clustering. During the first phase, a data structure for storing the statistic summary of data streams, called EDSSC summary, is proposed which can better address the dilemma between the two conflicting goals: 1) saving more points for accuracy of subspace clustering (SC) and 2) discarding more points for the efficiency of DSC. By further proposing an algorithm to estimate the subspace number, the proposed EDSSC does not need to know the number of subspaces. In the second phase, a more suitable index, called the average sparsity concentration index (ASCI), is proposed, which dramatically promotes the clustering accuracy compared to the conventionally utilized SCI index. In addition, the subspace evolution detection model based on the Page-Hinkley test is proposed where the appearing, disappearing, and recurring subspaces can be detected and adapted. Extinct experiments on real-world data streams show that the EDSSC outperforms the state-of-the-art online SC approaches.

Highlights

H IGH-DIMENSIONAL data streams are generated at an unprecedented scale in various realms, such as media, communication, finance, meteorology, etc., [1]–[4]
On the ExYaleB data stream, EDSSC achieves 75.01% accuracy and 86.47% normalized mutual information (NMI) compared with 57.14% accuracy and 74.43% NMI of OLRSC which has the best performance among all baseline algorithms
The goal of this article is to perform data stream clustering (DSC) on the evolving high-dimensional data streams, that is, providing a timevarying Subspace clustering (SC) result St at each timestamp t which reflects the partition of received points Xt such that the points belonging to the same subspace can be assigned to the same cluster

Summary

Introduction

H IGH-DIMENSIONAL data streams are generated at an unprecedented scale in various realms, such as media, communication, finance, meteorology, etc., [1]–[4]. These data streams are often high dimensional, unlabeled, large scale, and evolving, which present huge challenges for data stream clustering (DSC). Representation-based SC (RBSC) approaches have been dominating the field and represent the state of the art They are based on the hypothesis that each data point in a union of subspaces can be represented as a linear combination of other points, that is, the so-called selfexpressiveness property. Popular RBSC approaches include sparse SC (SSC) [1], low-rank representation (LRR) [16], and their variants

Objectives

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE transactions on cybernetics	Publication Date: Nov 24, 2020
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Dynamic Sparse Subspace Clustering for Evolving High-Dimensional Data Streams.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE transactions on cybernetics

Lead the way for us

Similar Papers

A Clustering Algorithm for Evolving Data Streams Using Temporal Spatial Hyper Cube
Redhwan Al-Amri ... Yahia Baashar
Applied Sciences | VOL. 12
Redhwan Al-Amri, et. al.Redhwan Al-Amri ... Yahia Baashar
27 Jun 2022
Applied Sciences | VOL. 12

Sparse Subspace Clustering for Evolving Data Streams
Jinping Sui ... Alexander Jung
-
Jinping Sui, et. al.Jinping Sui ... Alexander Jung
01 May 2019
01 May 2019

Subspace Clustering in High-Dimensional Data Streams: A Systematic Literature Review
Nur Laila Ab Ghani ... Said Jadid Abdulkadir
Computers, Materials & Continua | VOL. 75
Nur Laila Ab Ghani, et. al.Nur Laila Ab Ghani ... Said Jadid Abdulkadir
01 Jan 2023
Computers, Materials & Continua | VOL. 75

A Systematic Review of Density Grid-Based Clustering for Data Streams
Mustafa Tareq ... Azuraliza Abu Bakar
IEEE Access | VOL. 10
Mustafa Tareq, et. al.Mustafa Tareq ... Azuraliza Abu Bakar
01 Jan 2021
IEEE Access | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dynamic Sparse Subspace Clustering for Evolving High-Dimensional Data Streams.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE transactions on cybernetics