Abstract

With low inter-annotator agreement and large time consumption, phonetic annotation of non-native speech is complicated. However, having a large-scale non-native speech corpus with error annotation is important for second language phonology and computer-assisted pronunciation training research. To achieve large-scale annotation, we must clarify the causes of difficulty in error annotation. This study identified the factors that can predict annotation performance of non-native Mandarin speech by considering two perspectives.1) We used a decomposed method (that involved dividing a traditional error annotation task into two different subtasks, namely the error location task and error description task) to determine whether human performance, from the cognitive viewpoint, can be predicted using the decomposed annotation method (Experiment 1). 2) We determined whether the features of annotation task materials (e.g., error severity) can be employed to predict annotation performance (Experiment 2). The results of Experiment 1 revealed that the decomposed method leads to a shorter annotation time and a higher consistent hit rate than does the traditional method. The regressions of Experiment 2 revealed that the features of error severity and sentence length accounted for 96% of the hit rate and the features of sentence length and number of errors accounted for 95.5% of the annotation time. These results suggest that the performance of manual annotation can be predicted through both cognitive factors and material features.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.