Dimensionality Reduction for Big Data

Julián Luengo,Francisco Herrera,Diego García-Gil,Sergio Ramírez-Gallego,Salvador García

doi:10.1007/978-3-030-39105-8_4

Abstract

In the new era of Big Data, exponential increase in volume is usually accompanied by an explosion in the number of features. Dimensionality reduction arises as a possible solution to enable large-scale learning with millions of dimensions. Nevertheless, as any other family of algorithms, reduction methods require an upgrade in its design so that they can work with such magnitudes. Particularly, they must be prepared to tackle the explosive combinatorial effects of “the curse of Big Dimensionality” while embracing the benefits from the “blessing side of dimensionality” (poorly correlated features). In this chapter we analyze the problems and benefits derived from “the curse of Big Dimensionality”, and how this problem has spread around many fields like life sciences or the Internet. Then we list all the contributions that address the large-scale dimensionality reduction problem. Next, and as a case study, we study in depth the design and behavior of one of the most popular selection frameworks in this field. Finally, we study all contributions related to dimensionality reduction in Big Data streams.

Full Text