Abstract

In supervised classification problems, the test set may contain data points belonging to classes not observed in the learning phase. Moreover, the same units in the test data may be measured on a set of additional variables recorded at a subsequent stage with respect to when the learning sample was collected. In this situation, the classifier built in the learning phase needs to adapt to handle potential unknown classes and the extra dimensions. We introduce a model-based discriminant approach, Dimension-Adaptive Mixture Discriminant Analysis (D-AMDA), which can detect unobserved classes and adapt to the increasing dimensionality. Model estimation is carried out via a full inductive approach based on an EM algorithm. The method is then embedded in a more general framework for adaptive variable selection and classification suitable for data of large dimensions. A simulation study and an artificial experiment related to classification of adulterated honey samples are used to validate the ability of the proposed framework to deal with complex situations.

Highlights

  • Standard supervised classification approaches assume that all existing classes in the data have been observed during the learning phase

  • Since the DimensionAdaptive Mixture Discriminant Analysis (D-Adaptive Mixture Discriminant Analysis (AMDA)) framework adapts to the additional dimensions and classes, all the information available in the variables observed in the test set is exploited for classification, of both variables observed during the training stage and the extra ones present in the test set

  • The variable selection method tends to correctly identify the relevant variables, especially as the number of Gen variables involved in the estimation of the eigenvalue decomposition discriminant analysis (EDDA) model in the learning phase increases, as in scenario 3(a)

Read more

Summary

Introduction

Standard supervised classification approaches assume that all existing classes in the data have been observed during the learning phase. Examples of this situation are: classification of spectrometry data where the test data may be measured at a finer resolution than the learning set, with a increased number of wavelengths; classification of time-dependent data where variables correspond to points in time and observations are recorded in a continuous manner, whereby a given set of observations could have been collected up to a certain data point, while another set of units could have been recorded up to a successive period of time; classification of data where some of the variables of the training set are corrupted and cannot be used to build the classifier, while they are available in the testing stage In all these scenarios, the classifier would need to adapt to the increasing dimensionality. The combination of unrepresented classes in the training data and additional features in the test set leads to a complex situation where the model built in the learning stage is faced with two sources of criticality when classifying the new data: unobserved classes and extra variables

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.