Abstract

In the era of information explosion, plenty of data has been generated through a variety of channels, such as social networks, crowdsourcing platforms and blogs. Conflicts and errors are constantly emerging. Truth discovery aims to find trustworthy information from conflicting data by considering source reliability. However, most traditional truth discovery approaches are designed only for structured data, and fail to meet the strong requirements to extract trustworthy information from unstructured raw text data. The major challenges of inferring reliable information on text data stem from the multifactorial property (i.e., an answer may contain multiple different key factors, which may be complex) and the diversity of word usages (i.e., different words may share similar semantic information, but the spelling of which are completely different). To solve these challenges, an ant colony optimization based text data truth discovery model is proposed. Firstly, keywords extracted from the whole answers of the specific question are grouped into a set. Then, we translate the truth discovery problem to a subset optimization problem, and the parallel ant colony optimization is utilized to find correct keywords for each question based on the hypothesis of truth discovery from the whole keywords. After that, the answers to each question can be ranked based on the similarities between keywords of user answers and identified correct keywords found by colony. The experiment results on real dataset show that even the semantic information of text data is complex, our proposed model can still find trustworthy information from complex answers compared with retrieval-based and state-of-the-art approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call