Ground Truth Inference Research Articles

Crowdsourcing provides a means of gathering data from the public in order to infer what the ground truth label of an unfamiliar entity is. Such data are not used for decision making in their raw form until further processing is done to infer ground truth from the crowdsourced data. This paper presents a detailed comparative analysis of the ground truth inference ability of three clustering algorithms on crowd sourced datasets with different experimental scenarios (Initializing centroids and extracting class labels). The algorithms include, the self-organizing maps, the k-means and the expectation maximization clustering algorithm. The three algorithms were experimented on different datasets. The datasets used are Adult2, weather sentiments, emotion, valence5 and employee review dataset Four possible experimental scenarios for inferring the ground truth label from the curated dataset were analysed. The first scenario makes use of the clustering algorithm alone relying on the inner workings of the algorithm to predict the ground truth, while the second scenario makes use of an extract class label mechanism where the ground truth label was inferred by performing a further analysis on the clusters provided by the algorithm. In the third scenario, the centroids of the clustering algorithm were pre-initialized by setting the maximum value in each class from the curated data as a centroid, where centroid might mean something different relative to the particular algorithm. The fourth experimental scenario is a combination of the second and third scenario. Experimental results show that the self-organizing map (SOM) performs best across all the datasets when the weights of the units in the SOM are pre-initialized. SOM had the best performance on the weather sentiments dataset recording 92.49% accuracy and ROC AUC score of 0.88. It also recorded the best overall average accuracy of 50.2% and ROC AUC score of 0.59365 across all the datasets.

Read full abstract

Nowadays, crowdsourcing gains an increasing popularity as it can be adopted to solve many challenging question answering tasks that are easy for humans but difficult for computers. Due to the variety in the quality of users, it is important to infer not only the underlying ground truth of these tasks but also the users ability from the answers given by users. This problem is called Ground Truth Inference and has been studied for many years. However, since the answers collected from the users may contain sensitive information, ground truth inference raises serious privacy concern. Due to this reason, the problem of ground truth inference under local differential privacy (LDP) model has been recently studied. However, this problem is still not well understood and even some basic questions have not been solved yet. First, it is still unknown what is the average error of the private estimators to the underlying ground truth. Secondly, we do not know whether we can infer the ability of each user under LDP model and what is the estimation error w.r.t. the underlying users ability. Finally, previous work only shows that their methods have better performance than the private major voting algorithm through experiments. However, there is still no theoretically result which shows this priority formally or mathematically. In this paper, we partially solve these problems by studying the ground truth inference problem under local attribute differential privacy (LADP) model, which is a relaxation of LDP model, and propose a new algorithm called private Dawid-Skene method, which is motivated by the classical Dawid-Skene method. Specifically, we first provide the estimation errors for both ability of users and the ground truth under some assumptions of the problem if the algorithm start with some appropriate initial vector. Moreover, we propose an explicit instance and show that the estimation error of the ground truth achieved by the private major voting algorithm is always greater than the error achieved by our method.

Read full abstract

Ground Truth Inference Research Articles

Related Topics

Articles published on Ground Truth Inference

Learning from crowds with robust logistic regression

Neighborhood Weighted Voting-Based Noise Correction for Crowdsourcing

A multi-view-based noise correction algorithm for crowdsourcing learning

Learning From Crowds With Multiple Noisy Label Distribution Propagation.

Learning from biased crowdsourced labeling with deep clustering

Clustering Based Approach for Ground Truth Inference in Crowdsourced Data

A novel ground truth inference algorithm based on instance similarity for crowdsourcing learning

Improving data and model quality in crowdsourcing using co-training-based noise correction

Inferring ground truth from crowdsourced data under local attribute differential privacy

Improving crowd labeling using Stackelberg models

Improving data and model quality in crowdsourcing using cross-entropy-based noise correction

Resampling-based noise correction for crowdsourcing

Labelling Training Samples Using Crowdsourcing Annotation for Recommendation

Machine Learning with Crowdsourcing: A Brief Summary of the Past Research and Future Directions

Noise correction to improve data and model quality for crowdsourcing

Improving Crowdsourced Label Quality Using Noise Correction.

Learning from crowdsourced labeled data: a survey

Multi-Class Ground Truth Inference in Crowdsourcing with Clustering

Imbalanced Multiple Noisy Labeling

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Ground Truth Inference Research Articles

Related Topics

Articles published on Ground Truth Inference

Learning from crowds with robust logistic regression

Neighborhood Weighted Voting-Based Noise Correction for Crowdsourcing

A multi-view-based noise correction algorithm for crowdsourcing learning

Learning From Crowds With Multiple Noisy Label Distribution Propagation.

Learning from biased crowdsourced labeling with deep clustering

Clustering Based Approach for Ground Truth Inference in Crowdsourced Data

A novel ground truth inference algorithm based on instance similarity for crowdsourcing learning

Improving data and model quality in crowdsourcing using co-training-based noise correction

Inferring ground truth from crowdsourced data under local attribute differential privacy

Improving crowd labeling using Stackelberg models

Improving data and model quality in crowdsourcing using cross-entropy-based noise correction

Resampling-based noise correction for crowdsourcing

Labelling Training Samples Using Crowdsourcing Annotation for Recommendation

Machine Learning with Crowdsourcing: A Brief Summary of the Past Research and Future Directions

Noise correction to improve data and model quality for crowdsourcing

Improving Crowdsourced Label Quality Using Noise Correction.

Learning from crowdsourced labeled data: a survey

Multi-Class Ground Truth Inference in Crowdsourcing with Clustering

Imbalanced Multiple Noisy Labeling