Class Discovery in Galaxy Classification

David Bazell,David J Miller

doi:10.1086/426068

Abstract

In recent years, automated, supervised classification techniques have been fruitfully applied to labeling and organizing large astronomical databases. These methods require off-line classifier training, based on labeled examples from each of the (known) object classes. In practice, only a small batch of labeled examples, hand-labeled by a human expert, may be available for training. Moreover, there may be no labeled examples for some classes present in the data; i.e., the database may contain several unknown classes. Unknown classes may be present because of (1) uncertainty in or lack of knowledge of the measurement process, (2) an inability to adequately survey a massive database to assess its content (classes), and/or (3) an incomplete scientific hypothesis. In recent work, the question of new class discovery in mixed labeled/unlabeled data was formally posed, with a proposed solution based on mixture models. In this work we investigate this approach, propose a competing technique suitable for class discovery in neural networks, and evaluate methods for both classification and class discovery in several astronomical data sets. Our results demonstrate up to a 57% reduction in classification error compared to a standard neural network classifier that uses only labeled data.

Full Text