Abstract

Web-based question answering (QA) systems are effective in corroborating answers from multiple Web sources. However, Web also contains false, fabricated, and biased information that can have adverse effects on the accuracy of answers in Web-based QA systems. Existing, solutions focus primarily on finding relevant Web pages but either do not evaluate Web pages’ credibility or evaluate two to three out of seven credibility categories. This research proposed a credibility assessment algorithm that uses seven categories, including correctness, authority, currency, professionalism, popularity, impartiality, quality, for scoring credibility, where each credibility category consists of multiple factors. The credibility assessment module is added on top of an existing QA system to score answers based on the credibility of Web pages. The system ranks answers based on the Web pages’ credibility from where answers have been taken. The research conducted extensive quantitative tests on 211 factoid questions, taken from TREC QA data from 1999-2001. Our research findings show that credibility categories including correctness, professionalism, impartiality, and quality significantly improved the accuracy of answers. On the other hand, categories such as authority, currency, popularity played a minor role instead. This research hopes to allow researchers and experts in using the Web credibility assessment model to improve the accuracy of information systems. Credibility scores should assist Web users in selecting credible information, while also forcing content creators to focus more on publishing credible content.

Highlights

  • question answering (QA) is a complex form of Information Retrieval (IR) system where the information requested is partially expressed in natural language statements [1]

  • Since the literature on credibility-based Web QA systems is limited; this study reviewed Web Information Systems (IS) conducting credibility assessment on Web pages

  • SELECTING THE IDEAL VALUE OF α FOR CREDOMQA SYSTEM Different values of smoothing factor α can be used for generating a credibility-based answer score, which is used to control the weighting between AnswerPercentageOnPage and CredibilityScore

Read more

Summary

Introduction

QA is a complex form of Information Retrieval (IR) system where the information requested is partially expressed in natural language statements [1] This makes QA systems one of the most natural ways of communicating with computers. QA is a complex process and involves multiple domains including natural language processing (NLP), IR, Information Processing (IP), and machine learning [2]. This is a complex process, in comparison to IR, because IR considers complete documents as relevant, whereas in QA, the only specific portion(s) of text within documents are considered as. To find relevant information on the Web, users and systems make use of search engines [6].

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call