Proposing a Dimensionality Reduction Technique With an Inequality for Unsupervised Learning from High-Dimensional Big Data

Hassan Ismkhan,Mohammad Izadi

doi:10.1109/tsmc.2023.3234227

Abstract

Data-clustering task can be considered as the most important unsupervised learning algorithms. For about all clustering algorithms, finding the Nearest Neighbors of a point within a certain radius <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">r (NN- <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">r ), is a critical task. For a high-dimensional dataset, this task becomes too time consuming. This article proposes a simple dimensionality reduction (DR) technique. For point <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p in <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">d -dimensional space, it produces point <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p^{\prime}$</tex-math> </inline-formula> in <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$d^{\prime}$</tex-math> </inline-formula> -dimensional space, where <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$d^{\prime}$</tex-math> </inline-formula> <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$<<$</tex-math> </inline-formula> <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">d . In addition, for any pair of points <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">q , and their maps <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p^{\prime}$</tex-math> </inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$q^{\prime}$</tex-math> </inline-formula> in the target space, it is proved that <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$|p, q| > |p^{\prime}, q^{\prime}|$</tex-math> </inline-formula> is preserved, where <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$|,|$</tex-math> </inline-formula> used to denote the Euclidean distance between a pair of points. This property can speed up finding NN- <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">r . For a certain radius <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">r , and a pair of points <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">q , whenever <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$|p^{\prime}, q^{\prime}| > r$</tex-math> </inline-formula> , then <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">q can not be in NN- <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">r of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p . Using this trick, the task of finding the NN- <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">r is speeded up. Then, as a case study, it is applied to accelerate the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k -means, one of the most famous unsupervised learning algorithms, where it can automatically determine the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$d^{\prime}$</tex-math> </inline-formula> . The proposed NN- <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">r method and the accelerated <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k -means are compared with recent state-of-the-arts, and both yield favorable results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Proposing a Dimensionality Reduction Technique With an Inequality for Unsupervised Learning from High-Dimensional Big Data

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Systems, Man, and Cybernetics: Systems

Lead the way for us

Similar Papers

HB-File: An efficient and effective high-dimensional big data storage structure based on US-ELM
Linlin Ding ... Baoyan Song
Neurocomputing | VOL. 261
Linlin Ding, et. al.Linlin Ding ... Baoyan Song
16 Feb 2017
Neurocomputing | VOL. 261

An Efficient High-Dimensional Big Data Storage Structure Based on US-ELM
Linlin Ding ... Junchang Xin
-
Linlin Ding, et. al.Linlin Ding ... Junchang Xin
01 Jan 2015
01 Jan 2015

QoE-driven big data management in pervasive edge computing environment
Qianyu Meng ... Xiaoming He
Big Data Mining and Analytics | VOL. 1
Qianyu Meng, et. al.Qianyu Meng ... Xiaoming He
01 Sep 2018
Big Data Mining and Analytics | VOL. 1

QoE-Based Big Data Analysis with Deep Learning in Pervasive Edge Environment
Qianyu Meng ... Bo Liu
-
Qianyu Meng, et. al.Qianyu Meng ... Bo Liu
01 May 2018
01 May 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Proposing a Dimensionality Reduction Technique With an Inequality for Unsupervised Learning from High-Dimensional Big Data

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Systems, Man, and Cybernetics: Systems