Abstract

In the era of big data, we are committed to obtaining the observations of target objects from a wider range of data sources. As the number of data sources increases, we expect that more trustworthy statistical parameters can be estimated from the multi-source observations, for example, the population mean. However, the reliability of data sources rarely attracts our attention, because the hypothesis testing seems to be an effective tool for determining whether a given estimate is acceptable. In practice, the noisy observations from different unreliable data sources may have different statistical characteristic parameters, and these parameters are unknown. It makes the condition that observations should be identically distributed in hypothesis testing no longer tenable. Therefore, a poor estimate of the population mean may be accepted, as the hypothesis testing is performed over the multi-source observations. To address this issue, in this paper, we propose a true mean value discovery algorithm in which we can use multi-source observations to determine whether an estimated population mean should be rejected. Additionally, the reliability degree of each data source can be estimated using the proposed algorithm. By removing incorrect observations provided by unreliable sources, we can obtain more reliable estimates of true population means. Experiments on three real-world tasks demonstrate that the proposed method outperforms state-of-the-art approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call