Abstract

The increasing availability of online content these days raises several questions about effective access to information. In particular, the possibility for almost everyone to generate content with no traditional intermediary, if on the one hand led to a process of “information democratization”, on the other hand, has negatively affected the genuineness of the information disseminated. This issue is particularly relevant when accessing health information, which impacts both the individual and societal level. Often, laypersons do not have sufficient health literacy when faced with the decision to rely or not rely on this information, and expert users cannot cope with such a large amount of content. For these reasons, there is a need to develop automated solutions that can assist both experts and non-experts in discerning between genuine and non-genuine health information. To make a contribution in this area, in this paper we proceed to the study and analysis of distinct groups of features and machine learning techniques that can be effective to assess misinformation in online health-related content, whether in the form of Web pages or social media content. To this aim, and for evaluation purposes, we consider several publicly available datasets that have only recently been generated for the assessment of health misinformation under different perspectives.

Highlights

  • In contemporary society, access to information plays a crucial role, influencing choices and behaviors both at the level of individuals and communities

  • Based on the literature and on a classification work performed in this article, we identify six classes of health misinformation features: (i ) textual representation features, i.e., relating to different possible formal representations of the text, (ii ) linguistic-stylistic features, i.e., taking into account the presence of different stylistic aspects of the text, (iii ) linguisticemotional features, i.e., identifying aspects of emotional character that transpire from the text, (iv) linguistic-medical features, i.e., related to the presence of specific medical terms within the text, (v) propagation-network features, i.e., taking into account the social network and the way information is propagated on it, and (vi ) user-profile features, i.e., related to various metadata connected to user profiles

  • Bi-LSTM(WE): Bidirectional Long-Short Term Memory classifier in association with only textual representation features; Convolutional Neural Networks (CNNs)(WE): Convolutional Neural Network classifier in association with only textual representation features; HPN: Hierarchical Propagation Networks in association with the propagation-network features, as proposed in [42]; ML(LIWC): ML algorithms employed in association with the Linguistic Inquiry and Word Count (LIWC) features proposed in [37]

Read more

Summary

Introduction

Access to information plays a crucial role, influencing choices and behaviors both at the level of individuals and communities. Web 2.0 technologies have enabled anyone to play an active role in every stage of the information life cycle, from its generation to its dissemination, especially through social media platforms In this context, characterized by “disintermediation” [1,2], it is essential to be able to distinguish what is genuine information from what is not; this need is amplified, in particular, for those contents that may be delicate and sensitive, because they could have extremely negative social repercussions, such as those related to health. People who are not an expert in the field are unable to properly assess the genuineness of such claims, both, in general, due to their limited cognitive capacities [4,5] and, due to their insufficient level of health literacy [6].

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call