Abstract

Many real world applications have problems with high dimensionality, which existing algorithms cannot overcome. A critical data preprocessing problem is feature selection, whereby its non-scalability negatively influences both the efficiency and performance of big data applications. In this research, we developed a new algorithm to reduce the dimensionality of a problem using graph-based analysis, which retains the physical meaning of the original high-dimensional feature space. Most existing feature-selection methods are based on a strong assumption that features are independent of each other. However, if the feature-selection algorithm does not take into consideration the interdependencies of the feature space, the selected data fail to correctly represent the original data. We developed a new feature-selection method to address this challenge. Our aim in this research was to examine the dependencies between features and select the optimal feature set with respect to the original data structure. Another important factor in our proposed method is that it can perform even in the absence of class labels. This is a more difficult problem that many feature-selection algorithms fail to address. In this case, they only use wrapper techniques that require a learning algorithm to select features. It is important to note that our experimental results indicates, this proposed simple ranking method performs better than other methods, independent of any particular learning algorithm used.

Highlights

  • Data processing and decision-making in today’s world have become more complex with the continuously expanding volume of data

  • Big Data Mining and Analytics, June 2019, 2(2): 73–82 we find that the vector of probabilities for visiting each node is fixed

  • We provide specifics about the data we used and our experimental results

Read more

Summary

Introduction

Data processing and decision-making in today’s world have become more complex with the continuously expanding volume of data. Big data applications are much bigger and more complex than traditional data processing applications can handle. Most existing feature-selection algorithms are based on a strong assumption that the features are independent of each other and are identically distributed. As such, since these algorithms neglect the structure or intrinsic dependencies among features, the selected feature set may not effectively represent the data. If there are m nodes V D fV1; V2; : : : ; Vmg and a set of n edges E D fE1; E2; : : : ; Eng in G.V; E/, node Vi corresponds to the i -th feature and Ej represents a pairwise dependency. Learning the representations of nodes in a network or graph with preserving certain properties of the network is beneficial for various analysis tasks and has attracted significant attention in recent years[2]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.