Review of classical dimensionality reduction and sample selection methods for large-scale data processing

Tongfeng Sun,Jiong Zhu,Tianming Liang,Xinzheng Xu,Dong Zheng

doi:10.1016/j.neucom.2018.02.100

Abstract

Abstract In the era of big data, all types of data with increasing samples and high-dimensional attributes are demonstrating their important roles in various fields, such as data mining, pattern recognition and machine learning, etc. Meanwhile, machine learning algorithms are being effectively applied in large-scale data processing. This paper mainly reviews the classical dimensionality reduction and sample selection methods based on machine learning algorithms for large-scale data processing. Firstly, the paper provides a brief overview to the classical sample selection and dimensionality reduction methods. Then, it pays attention to the applications of those methods and their combinations with the classical machine learning methods, such as clustering, random forest, fuzzy set, and heuristic algorithms, particularly deep leaning methods. Furthermore, the paper primarily introduces the application frameworks that combine sample selection and dimensionality reduction in the context of two aspects: sequential and simultaneous, which almost all get the ideal results in the processing of the large-scale training data contrasting to the original models. Lastly, we further conclude that sample selection and dimensionality reduction methods are essential and effective for the modern large-scale data processing. In the future work, the machine learning algorithms, especially the deep learning methods, will play a more important role in the processing of large-scale data.

Full Text