Abstract
Several authors have addressed learning as a classifier given by a mixed labeled/unlabeled training set. These works assumes the unlabeled sample originates from one of the (known) classes. This work considers the scenario in which unlabeled points may belong either to known/predefined or to here-to-fore undiscovered classes. There are several practical situations where such data may arise. We earlier proposed a novel statistical mixture model to fit in this mixed data. In this paper we review the method and introduce an alternative model. Our fundamental strategy is to view as observed the data not only the feature vector and the class label, but also the fact of label presence/absence for each point. Two types of mixture components are used to explain label presence/absence. "Predefined" components generate both labeled and unlabeled points and assume the labels that are missing at random. These components represent the known classes. "Non-predefined" components only generate unlabeled points. In localized regions, the data subsets are captured exclusively unlabeled. Such subsets may represent an outlier distribution, or new classes. The components' predefined/non-predefined natures are data-driven, learned with the other parameters via an algorithm based on expectation-maximization (EM). There are three natural applications presented: 1) robust classifier design, given by a mixed training set with outliers; 2) classification with rejections; and 3) identification of the unlabeled points (and their representative components) originated from unknown classes, i.e. new class discovery. The effectiveness of our models in discovering purely unlabeled data components (potential new classes) is evaluated both by synthetic and real data sets. Although each of our models has its own advantages, the original model is found is achieved by the best class discovery results.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.