Dynamic Feature Selection for Clustering High Dimensional Data Streams

Conor Fahy,Shengxiang Yang

doi:10.1109/access.2019.2932308

Conor Fahy, Shengxiang Yang

Open Access

https://doi.org/10.1109/access.2019.2932308

Copy DOI

Abstract

Change in a data stream can occur at the concept level and at the feature level. Change at the feature level can occur if new, additional features appear in the stream or if the importance and relevance of a feature changes as the stream progresses. This type of change has not received as much attention as concept-level change. Furthermore, a lot of the methods proposed for clustering streams (density-based, graph-based, and grid-based) rely on some form of distance as a similarity metric and this is problematic in high-dimensional data where the curse of dimensionality renders distance measurements and any concept of “density” difficult. To address these two challenges we propose combining them and framing the problem as a feature selection problem, specifically a dynamic feature selection problem. We propose a dynamic feature mask for clustering high dimensional data streams. Redundant features are masked and clustering is performed along unmasked, relevant features. If a feature's perceived importance changes, the mask is updated accordingly; previously unimportant features are unmasked and features which lose relevance become masked. The proposed method is algorithm-independent and can be used with any of the existing density-based clustering algorithms which typically do not have a mechanism for dealing with feature drift and struggle with high-dimensional data. We evaluate the proposed method on four density-based clustering algorithms across four high-dimensional streams; two text streams and two image streams. In each case, the proposed dynamic feature mask improves clustering performance and reduces the processing time required by the underlying algorithm. Furthermore, change at the feature level can be observed and tracked.

Highlights

Along with time and memory constraints, change is an important consideration in data stream mining
Redundant features are masked and clustering is performed along unmasked, relevant features
The method proposed in this paper aims to address these two challenges: tracking change at the feature level and dynamically clustering in high dimensions

Summary

Introduction

Along with time and memory constraints, change is an important consideration in data stream mining. One possible type of change is concept evolution. Concept evolution occurs when an entirely new cluster ym appears in the stream, ym ∈ Y. Another type of change in a data stream can occur in the form of concept drift. This occurs if the characteristics of the data change, i.e., if the underlying process generating x changes. We evaluate three existing static methods for maintaining the dynamic feature mask. Each method is described below along with the clustering algorithms we use to evaluate the proposed dynamic feature mask.

Objectives

Methods

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 21	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Dynamic Feature Selection for Clustering High Dimensional Data Streams

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Dynamic Correlation-Based Feature Selection for Feature Drifts in Data Streams
Jorge C Chamby-Diaz ... Mariana Recamonde-Mendoza
-
Jorge C Chamby-Diaz, et. al.Jorge C Chamby-Diaz ... Mariana Recamonde-Mendoza
01 Oct 2019
01 Oct 2019

A dynamic feature selection and intelligent model serving for hybrid batch-stream processing
Boshra Pishgoo ... Bijan Raahemi
Knowledge-Based Systems | VOL. 256
Boshra Pishgoo, et. al.Boshra Pishgoo ... Bijan Raahemi
24 Aug 2022
Knowledge-Based Systems | VOL. 256

MOPSO for dynamic feature selection problem based big data fusion
Ahlem Aboud ... Adel M Alimi
-
Ahlem Aboud, et. al.Ahlem Aboud ... Adel M Alimi
01 Oct 2016
01 Oct 2016

A Survey on Feature Drift Adaptation
Jean Paul Barddal ... Fabricio Enembreck
-
Jean Paul Barddal, et. al.Jean Paul Barddal ... Fabricio Enembreck
01 Nov 2015
01 Nov 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dynamic Feature Selection for Clustering High Dimensional Data Streams

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access