Abstract

Crowdsourcing is an effective method for analyzing large scientific databases. However, data annotation relies on untrained volunteers, making it difficult to control the quality of the annotation. Here, we propose a method to estimate the consistency of the annotations of human classifiers in citizen science projects. Since the performance of supervised machine learning systems decreases as the level of noise in the data increases, the method is able to rank human annotators by the consistency with which they annotate. Because the method uses the accuracy of an automatic classifier trained with these samples, it does not require ground truth or data annotated by other citizen scientists. The method allows reducing the number of annotations required for each sample by identifying the most efficient data annotators, as well as improving the overall quality of the data by giving higher weights to the classifications of the more consistent data annotators. The proposed method can also be used for improving the citizen science user experience by providing feedback in real time. Experimental results using a large citizen science project— Galaxy Zoo —and a subset of over $1.1\times 10^6$ data annotations made by 4000 citizen scientists show Pearson correlation of 0.966 between the quality estimation provided by the method and the actual performance of the data annotators. The method also demonstrated efficacy in improving the performance of offline statistical consensus methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.