Feature selection with graph mining technology

Thosini Bamunu Mudiyanselage,Yanqing Zhang

doi:10.26599/bdma.2018.9020032

Thosini Bamunu Mudiyanselage, Yanqing Zhang

Open Access

https://doi.org/10.26599/bdma.2018.9020032

Copy DOI

Journal: Big Data Mining and Analytics	Publication Date: Jun 1, 2019
Citations: 7	License type: cc-by

Affiliation: Georgia State University

Abstract

Many real world applications have problems with high dimensionality, which existing algorithms cannot overcome. A critical data preprocessing problem is feature selection, whereby its non-scalability negatively influences both the efficiency and performance of big data applications. In this research, we developed a new algorithm to reduce the dimensionality of a problem using graph-based analysis, which retains the physical meaning of the original high-dimensional feature space. Most existing feature-selection methods are based on a strong assumption that features are independent of each other. However, if the feature-selection algorithm does not take into consideration the interdependencies of the feature space, the selected data fail to correctly represent the original data. We developed a new feature-selection method to address this challenge. Our aim in this research was to examine the dependencies between features and select the optimal feature set with respect to the original data structure. Another important factor in our proposed method is that it can perform even in the absence of class labels. This is a more difficult problem that many feature-selection algorithms fail to address. In this case, they only use wrapper techniques that require a learning algorithm to select features. It is important to note that our experimental results indicates, this proposed simple ranking method performs better than other methods, independent of any particular learning algorithm used.

Highlights

Data processing and decision-making in today’s world have become more complex with the continuously expanding volume of data
Big Data Mining and Analytics, June 2019, 2(2): 73–82 we find that the vector of probabilities for visiting each node is fixed
We provide specifics about the data we used and our experimental results

Summary

Introduction

Data processing and decision-making in today’s world have become more complex with the continuously expanding volume of data. Big data applications are much bigger and more complex than traditional data processing applications can handle. Most existing feature-selection algorithms are based on a strong assumption that the features are independent of each other and are identically distributed. As such, since these algorithms neglect the structure or intrinsic dependencies among features, the selected feature set may not effectively represent the data. If there are m nodes V D fV1; V2; : : : ; Vmg and a set of n edges E D fE1; E2; : : : ; Eng in G.V; E/, node Vi corresponds to the i -th feature and Ej represents a pairwise dependency. Learning the representations of nodes in a network or graph with preserving certain properties of the network is beneficial for various analysis tasks and has attracted significant attention in recent years[2]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Feature selection with graph mining technology

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data Mining and Analytics

Lead the way for us

Similar Papers

Research on Feature Selection and kNN Classification Method in Chinese Text Classification
Chao Xiao ... Ping Wu
-
Chao Xiao, et. al.Chao Xiao ... Ping Wu
01 Jan 2015
01 Jan 2015

Novel artificial bee colony based feature selection method for filtering redundant information
Youwei Wang ... Lizhou Feng
Applied Intelligence | VOL. 48
Youwei Wang, et. al.Youwei Wang ... Lizhou Feng
04 Aug 2017
Applied Intelligence | VOL. 48

Adaptive structure learning for low-rank supervised feature selection
Yonghua Zhu ... Guoqiu Wen
Pattern Recognition Letters | VOL. 109
Yonghua Zhu, et. al.Yonghua Zhu ... Guoqiu Wen
16 Aug 2017
Pattern Recognition Letters | VOL. 109

A three-stage unsupervised dimension reduction method for text clustering
Kusum Kumari Bharti ... P.K Singh
Journal of Computational Science | VOL. 5
Kusum Kumari Bharti, et. al.Kusum Kumari Bharti ... P.K Singh
04 Dec 2013
Journal of Computational Science | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feature selection with graph mining technology

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data Mining and Analytics