Abstract
Novelty detection methods aim at partitioning the test units into already observed and previously unseen patterns. However, two significant issues arise: there may be considerable interest in identifying specific structures within the novelty, and contamination in the known classes could completely blur the actual separation between manifest and new groups. Motivated by these problems, we propose a two-stage Bayesian semiparametric novelty detector, building upon prior information robustly extracted from a set of complete learning units. We devise a general-purpose multivariate methodology that we also extend to handle functional data objects. We provide insights on the model behavior by investigating the theoretical properties of the associated semiparametric prior. From the computational point of view, we propose a suitable varvec{xi }-sequence to construct an independent slice-efficient sampler that takes into account the difference between manifest and novelty components. We showcase our model performance through an extensive simulation study and applications on both multivariate and functional datasets, in which diverse and distinctive unknown patterns are discovered.
Highlights
Supervised classification techniques aim at predicting a qualitative output for a test set by learning a classifier on a fully-labeled training set
The novelty term is instead captured via a flexible Dirichlet Process mixture: this modeling choice reflects the lack of knowledge about its distributional properties and overcomes the problematic and unnatural a priori specification of its number of components
We investigate the properties of the underlying random mixing measure induced by the model specification we presented in the previous section
Summary
Supervised classification techniques aim at predicting a qualitative output for a test set by learning a classifier on a fully-labeled training set. To this extent, classical methods assume that the labeled units are realizations from each and every sub-groups in the target population. Within the model-based family of classifiers, adaptive methods recently appeared in the literature. Bouveyron (2014) introduces an adaptive classifier in which two algorithms, based respectively on transductive and inductive learning, are devised for inference. Fop et al (2021) extend the original work of Bouveyron (2014) by accounting for unobserved classes and extra variables in high-dimensional discriminant analysis. Classical model-based classifiers are not robust, as they lack the capability of handling outlying observations in the
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have