On the Influence of the Number of Anomalous and Normal Examples in Anomaly-Based Annotation Errors Detection

Jindřich Matoušek,Daniel Tihelka

doi:10.1007/978-3-319-45510-5_37

Abstract

Anomaly detection techniques were shown to help in detecting word-level annotation errors in read-speech corpora for text-to-speech synthesis. In this framework, correctly annotated words are considered as normal examples on which the detection methods are trained. Misannotated words are then taken as anomalous examples which do not conform to normal patterns of the trained detection models. As it could be hard to collect a sufficient number of examples to train and optimize an anomaly detector, in this paper we investigate the influence of the number of anomalous and normal examples on the detection accuracy of several anomaly detection models: Gaussian distribution based models, one-class support vector machines, and Grubbs’ test based model. Our experiments show that the number of examples can be significantly reduced without a large drop in detection accuracy.

Full Text