Abstract
Spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines. The main aim is to learn a meaningful low dimensional embedding of the data. However, most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty. Thus, learning directly from raw data can be misleading and can negatively impact the accuracy. In this paper, we propose to model artifacts in training data using probability distributions; each data point is represented by a Gaussian distribution centered at the original data point and having a variance modeling its uncertainty. We reformulate the Graph Embedding framework to make it suitable for learning from distributions and we study as special cases the Linear Discriminant Analysis and the Marginal Fisher Analysis techniques. Furthermore, we propose two schemes for modeling data uncertainty based on pair-wise distances in an unsupervised and a supervised contexts.
Highlights
With the advancement of data collection processes, high dimensional data are available for applying machine learning approaches
We propose a novel spectral-based subspace learning framework, called Graph Embedding with Data Uncertainty (GEU), in which input data uncertainties are taken into consideration
In this work, we introduced a novel spectral-based dimensionality reduction framework called Graph Embedding with Data Uncertainty (GEU) that reformulates the Graph Embedding to consider input data uncertainties and artifacts
Summary
With the advancement of data collection processes, high dimensional data are available for applying machine learning approaches. If the provided data is exposed to measurement inaccuracies or artifacts, learning directly from it can lead to a biased or erroneous embedding of the high dimensional data [5], [6] Traditional methods, such as LDA and MFA do not take this into consideration. Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS its uncertainty To this end, we reformulate the Graph Embedding framework to operate on distributions at individual data point level allowing us to determine a mapping from the input data space into a lower-dimensional space via optimizing some properties of interest defined over these distributions. Methods formulated under the proposed framework lead to an increased number of projection directions This is because the covariances employed to model the uncertainty at the level of the individual data point introduce a regularization term to both scatter matrices.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.