Abstract
BackgroundIn many biomedical applications, there is a need for developing classification models based on noisy annotations. Recently, various methods addressed this scenario by relaying on unreliable annotations obtained from multiple sources.ResultsWe proposed a probabilistic classification algorithm based on labels obtained by multiple noisy annotators. The new algorithm is capable of eliminating annotations provided by novice labellers and of providing a more accurate estimate of the ground truth by consensus labelling according to higher quality annotations. The approach is evaluated on text classification and prediction of protein disorder. Our study suggests that the higher levels of accuracy, effectiveness and performance can be achieved by the new method as compared to alternatives.ConclusionsThe proposed method is applicable for meta-learning from multiple existing classification models and noisy annotations obtained by humans. It is particularly beneficial when many annotations are obtained by novice labellers. In addition, the proposed method can provide further characterization of each annotator that can help in developing more accurate classifiers by identifying the most competent annotators for each data instance.
Highlights
In many biomedical applications, there is a need for developing classification models based on noisy annotations
In computer-aided diagnosis (CAD), many computer-aided image diagnosis systems [5,21,22,23,24] were built from labels assigned by multiple physicians who provide their estimations of the gold standard, which can only be obtained from dangerous surgical operations
A combination of noisy annotations obtained by humans and existing machine-based classification models were integrated
Summary
There is a need for developing classification models based on noisy annotations. Various groups studied the problem of developing classification models based on examples annotated by multiple labellers. Manually labelled data is successfully used together with mathematical models to provide annotator-specific accuracy estimates based on multi-annotator agreement [19,20]. Valizadegan et al [25] developed a probabilistic approach for learning classification models from opinions provided by multiple doctors and applied the approach to Heparin Induced Thrombocytopenia (HIT) electronic health records (EHR). Meta predictors are typically developed relying on disorder/order labelled training datasets. These datasets contain a very small number of proteins which have not already been used for development of the component predictors. Here a meta predictor is constructed in a completely unsupervised process without use of confirmed disorder/order annotations [32]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.