Abstract

The expeditious flow of information over the web and its ease of convenience has increased the fear of the rampant spread of misinformation. This poses a health threat and an unprecedented issue to the world impacting people’s life. To cater to this problem, there is a need to detect misinformation. Recent techniques in this area focus on static models based on feature extraction and classification. However, data may change at different time intervals and the veracity of data needs to be checked as it gets updated. There is a lack of models in the literature that can handle incremental data, check the veracity of data and detect misinformation. To fill this gap, authors have proposed a novel Veracity Scanning Model (VSM) to detect misinformation in the healthcare domain by iteratively fact-checking the contents evolving over the period of time. In this approach, the healthcare web URLs are classified as legitimate or non-legitimate using sentiment analysis as a feature, document similarity measures to perform fact-checking of URLs, and incremental learning to handle the arrival of incremental data. The experimental results show that the Jaccard Distance measure has outperformed other techniques with an accuracy of 79.2% with Random Forest classifier while the Cosine similarity measure showed less accuracy of 60.4% with the Support Vector Machine classifier. Also, when implemented as an algorithm Euclidean distance showed an accuracy of 97.14% and 98.33% respectively for train and test data.

Highlights

  • The exponential growth of the internet and World Wide Web (WWW) and its ease of convenience, has led to an information flow expeditiously

  • The performance evaluation is measured through accuracy, precision, recall and F1-score and presented graphically respectively through Fig. 3 to Fig. 6 for document similarity measures on various classifiers

  • It can be seen that the Random Forest (RF) Classifier outperformed the other 79.2% accuracy for the JD measure followed by Logistic Regression (LR) Classifier 78.1% accuracy for JD Measure

Read more

Summary

Introduction

The exponential growth of the internet and World Wide Web (WWW) and its ease of convenience, has led to an information flow expeditiously. The expediency, diversified knowledge, and reasonable cost attract the users of the internet to access and share information online, leading to a rapid generation of information [1]. An enormous volume of health and medical-related material is accessible online. It was observed that physicians choose the web as a valuable information resource for medical practice, education, or learning as well as decision support while patients surf the internet for information on diseases, infections, and their indications. 65% of users prefer the internet to search health-related topics [2, 3, 4]. It can be determined that the users make maximum usage of the internet for information access

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call