A strategy to assist visualization and analysis of large and complex datasets is dimensionality reduction, with which one maps each data point into a low-dimensional manifold. However, various dimensionality reduction techniques are computationally infeasible for large data. Out-of-sample techniques aim to resolve this difficulty; they only apply the dimensionality reduction technique on a small portion of data, referred to as landmarks, and determine the embedding coordinates of the other points using landmarks as references. Out-of-sample techniques have been applied to online settings, or when data arrive as time series. However, existing online out-of-sample techniques use either all the previous data points as landmarks or the fixed set of landmarks and therefore are potentially not good at capturing the geometry of the entire dataset when the time series is non-stationary. To address this problem, we propose an online landmark replacement algorithm for out-of-sample techniques using geometric graphs and the minimal dominating set on them. We mathematically analyse some properties of the proposed algorithm, particularly focusing on the case of landmark multi-dimensional scaling as the out-of-sample technique, and test its performance on synthetic and empirical time-series data.
Read full abstract