Abstract
Labelled data for training sequence labelling models can be collected from multiple annotators or workers in crowdsourcing. However, these labels could be noisy because of the varying expertise and reliability of annotators. In order to ensure high quality of data, it is crucial to infer the correct labels by aggregating noisy labels. Although label aggregation is a well-studied topic, only a number of studies have investigated how to aggregate sequence labels. Recently, neural network models have attracted research attention for this task. In this paper, we explore two neural network-based methods. The first method combines Hidden Markov Models with networks while also learning distributed representations of annotators (i.e., annotator embedding); the second method combines BiLSTM with autoencoders. The experimental results on three real-world datasets demonstrate the effectiveness of using neural networks for sequence label aggregation. Moreover, our analysis shows that annotators’ embeddings not only make our model applicable to real-time applications, but also useful for studying the behaviour of annotators.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.