Abstract

Information gathered from multiple sources on the Web often exhibits conflicts. This phenomenon motivates the need of truth discovery, which aims to automatically find the true claim among multiple conflicting claims. Existing truth discovery methods are mainly based on iterative updates or probabilistic models. In particular, iterative methods specify rules that govern how credibility flows from sources to claims and then back to sources. However, these manually-defined rules tend to be ad hoc and are difficult to adapt and analyze. Probabilistic methods model a few latent factors that impact how sources make claims, such as randomly choosing, guessing, or mistaking. However, these manually-defined factors may not well reflect the underlying data distributions. Given these limitations, we propose a new, unsupervised model for truth discovery in this paper. Our model first constructs a heterogenous network that exploits both source-claim and source-source relationships. It then embeds the network into a low dimensional space through a principled algorithm such that trustworthy sources and true claims (meanwhile, unreliable sources and false claims) are close. In this way, truth discovery can be conveniently performed in the embedding space. Compared with existing methods, our model does not need manually-defined rules or factors. Rather, it learns the embeddings automatically from data. Experiments on two real-world datasets demonstrate that our model outperforms existing state-of-the-art methods for truth discovery.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call