Abstract
Active learning in machine learning is an effective approach to reducing the cost of human efforts for generating labels. The iterative process of active learning involves a human annotation step, during which crowdsourcing could be leveraged. It is essential for organisations adopting the active learning method to obtain a high model performance. This study aims to identify effective crowdsourcing interaction designs to promote the quality of human annotations and therefore the natural language processing (NLP)-based machine learning model performance. Specifically, the study experimented with four human-centred design techniques: highlight, guidelines, validation and text amount. Based on different combinations of the four design elements, the study developed 15 different annotation interfaces and recruited crowd workers to annotate texts with these interfaces. Annotated data under different designs were used separately to iteratively train a machine learning model. The results show that the design techniques of highlight and guideline play an essential role in improving the quality of human labels and therefore the performance of active learning models, while the impact of validation and text amount on model performance can be either positive in some cases or negative in other cases. The ‘simple’ designs (i.e. D1, D2, D7 and D14) with a few design techniques contribute to the top performance of models. The results provide practical implications to inspire the design of a crowdsourcing labelling system used for active learning.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.