CAPS: A Cross-Lingual Methodology for Detecting Misinformation in Estonian Health News

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Health misinformation poses a significant public threat by eroding trust in scientific expertise and diminishing adherence to health guidelines, which collectively weaken community resilience to preventable diseases. For these reasons, detecting health misinformation is crucial to protect public health. However, manual detection requires substantial human effort and expertise, making it impractical at scale, particularly in low-resource settings where technological and linguistic resources are limited. Developing automated techniques for identifying false or misleading claims is therefore essential to ensure timely intervention. Advancing these automated detection methods depends on the development of robust datasets, as they enable more accurate modeling and adaptation for specific languages and contexts. To the best of the authors’ knowledge, no misinformation detection techniques or datasets have yet been developed specifically for the Estonian language within the health domain. Addressing this gap, the primary objective of this study is to develop a reliable system for generating ground truth labels for health misinformation in Estonian, thereby contributing to misinformation detection in low-resource settings. Leveraging pre-labeled datasets in English, the proposed Cross-lingual Alignment and Confident Prediction Sampling (CAPS) approach employs a hybrid two-phase methodology involving semantic similarity measurements, manual annotation, classification, and confidence sampling. This methodology enables the efficient generation of misinformation labels with minimal reliance on manual annotation, contributing a valuable resource for advancing misinformation detection in underrepresented languages. The resulting dataset of 8,795 annotated news articles represents a significant advancement in health misinformation detection for the Estonian language.

Similar Papers
  • Research Article
  • Cite Count Icon 191
  • 10.1016/j.ipm.2020.102390
Detecting health misinformation in online health communities: Incorporating behavioral features into machine learning based approaches
  • Oct 6, 2020
  • Information Processing & Management
  • Yuehua Zhao + 2 more

Detecting health misinformation in online health communities: Incorporating behavioral features into machine learning based approaches

  • Research Article
  • Cite Count Icon 55
  • 10.2196/38786
Prevalence of Health Misinformation on Social Media-Challenges and Mitigation Before, During, and Beyond the COVID-19 Pandemic: Scoping Literature Review.
  • Aug 19, 2024
  • Journal of medical Internet research
  • Dhouha Kbaier + 3 more

This scoping review accompanies our research study "The Experience of Health Professionals With Misinformation and Its Impact on Their Job Practice: Qualitative Interview Study." It surveys online health misinformation and is intended to provide an understanding of the communication context in which health professionals must operate. Our objective was to illustrate the impact of social media in introducing additional sources of misinformation that impact health practitioners' ability to communicate effectively with their patients. In addition, we considered how the level of knowledge of practitioners mitigated the effect of misinformation and additional stress factors associated with dealing with outbreaks, such as the COVID-19 pandemic, that affect communication with patients. This study used a 5-step scoping review methodology following Arksey and O'Malley's methodology to map relevant literature published in English between January 2012 and March 2024, focusing on health misinformation on social media platforms. We defined health misinformation as a false or misleading health-related claim that is not based on valid evidence or scientific knowledge. Electronic searches were performed on PubMed, Scopus, Web of Science, and Google Scholar. We included studies on the extent and impact of health misinformation in social media, mitigation strategies, and health practitioners' experiences of confronting health misinformation. Our independent reviewers identified relevant articles for data extraction. Our review synthesized findings from 70 sources on online health misinformation. It revealed a consensus regarding the significant problem of health misinformation disseminated on social network platforms. While users seek trustworthy sources of health information, they often lack adequate health and digital literacies, which is exacerbated by social and economic inequalities. Cultural contexts influence the reception of such misinformation, and health practitioners may be vulnerable, too. The effectiveness of online mitigation strategies like user correction and automatic detection are complicated by malicious actors and politicization. The role of health practitioners in this context is a challenging one. Although they are still best placed to combat health misinformation, this review identified stressors that create barriers to their abilities to do this well. Investment in health information management at local and global levels could enhance their capacity for effective communication with patients. This scoping review underscores the significance of addressing online health misinformation, particularly in the postpandemic era. It highlights the necessity for a collaborative global interdisciplinary effort to ensure equitable access to accurate health information, thereby empowering health practitioners to effectively combat the impact of online health misinformation. Academic research will need to be disseminated into the public domain in a way that is accessible to the public. Without equipping populations with health and digital literacies, the prevalence of online health misinformation will continue to pose a threat to global public health efforts.

  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.gltp.2021.08.038
Debunking health fake news with domain specific pre-trained model
  • Aug 14, 2021
  • Global Transitions Proceedings
  • Santoshi Kumari + 3 more

Debunking health fake news with domain specific pre-trained model

  • Research Article
  • Cite Count Icon 1
  • 10.61194/jhlqr.v2i2.545
Social Media, Health Misinformation, and Literacy: A Narrative Review of Challenges and Solutions
  • Sep 30, 2022
  • Journal of Health Literacy and Qualitative Research
  • Marshanda Rimadita Nugrahani

Health misinformation on social media has become a pressing public health challenge, particularly among individuals with low digital health literacy. This study examines the relationship between digital health literacy and the spread of misinformation, analyzing systemic factors that contribute to the persistence of misleading health content. A systematic literature review was conducted using academic databases such as PubMed, Scopus, and Google Scholar, with a focus on peer-reviewed studies published in the past decade. The review identifies key demographic, social, and economic determinants influencing digital health literacy and explores the role of social media platforms in misinformation dissemination. Findings reveal that individuals with limited digital health literacy struggle to critically evaluate health-related content, making them more vulnerable to misinformation. Systemic factors, including weak regulatory oversight and social media algorithms prioritizing engagement-driven content, further facilitate the spread of misleading health information. Effective interventions, such as digital literacy education, peer-led initiatives, and collaboration between social media platforms and public health organizations, are crucial in mitigating misinformation. The study highlights the need for targeted policy reforms, improved algorithmic transparency, and community-based health education to enhance digital health literacy and misinformation resilience. Future research should focus on the long-term efficacy of digital health literacy interventions and explore AI-driven solutions for misinformation detection.

  • Research Article
  • Cite Count Icon 14
  • 10.1007/s12652-023-04619-4
Automatic detection of health misinformation: a systematic review
  • May 27, 2023
  • Journal of Ambient Intelligence and Humanized Computing
  • Ipek Baris Schlicht + 3 more

The spread of health misinformation has the potential to cause serious harm to public health, from leading to vaccine hesitancy to adoption of unproven disease treatments. In addition, it could have other effects on society such as an increase in hate speech towards ethnic groups or medical experts. To counteract the sheer amount of misinformation, there is a need to use automatic detection methods. In this paper we conduct a systematic review of the computer science literature exploring text mining techniques and machine learning methods to detect health misinformation. To organize the reviewed papers, we propose a taxonomy, examine publicly available datasets, and conduct a content-based analysis to investigate analogies and differences among Covid-19 datasets and datasets related to other health domains. Finally, we describe open challenges and conclude with future directions.

  • Research Article
  • Cite Count Icon 4
  • 10.51519/journalisi.v6i4.931
Misinformation Detection: A Review for High and Low-Resource Languages
  • Dec 31, 2024
  • Journal of Information Systems and Informatics
  • Seani Rananga + 3 more

The rapid spread of misinformation on platforms like Twitter, and Facebook, and in news headlines highlights the urgent need for effective ways to detect it. Currently, researchers are increasingly using machine learning (ML) and deep learning (DL) techniques to tackle misinformation detection (MID) because of their proven success. However, this task is still challenging due to the complexity of deceptive language, digital editing tools, and the lack of reliable linguistic resources for non-English languages. This paper provides a comprehensive analysis of relevant research, providing insights into advanced techniques for MID. It covers dataset assessments, the importance of using multiple forms of data (multimodality), and different language representations. By applying the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) methodology, the study identified and analyzed literature from 2019 to 2024 across five databases: Google Scholar, Springer, Elsevier, ACM, and IEEE Xplore. The study selected thirty-one papers and examined the effectiveness of various ML and DL approaches with a focal point on performance metrics, datasets, and false or misleading information detection challenges. The findings indicate that most current MID models are heavily dependent on DL techniques, with approximately 81% of studies preferring these over traditional ML methods. In addition, most studies are text-based, with much less attention given to audio, speech, images, and videos. The most effective models are mainly designed for high-resource languages, with English datasets being the most used (67%), followed by Arabic (14%), Chinese (11%), and others. Less than 10% of the studies focus on low-resource languages (LRLs). Therefore, the study highlighted the need for robust datasets and interpretable, scalable MID models for LRLs. It emphasizes the critical need to prioritize and advance MID research for LRLs across all data types, including text, audio, speech, images, videos, and multimodal approaches. This study aims to support ongoing efforts to combat misinformation and promote a more informed understanding of under-resourced African languages.

  • Research Article
  • 10.1080/10410236.2026.2621231
Fake News Has a New Author, Can You Spot the Lie? Media Cues and the Detection of AI-Generated Health News Through the Lens of the Media Evocation Paradigm
  • Jan 29, 2026
  • Health Communication
  • Zehang Xie + 1 more

As generative artificial intelligence (GAI) increasingly contributes to the creation of news content, its ability to produce authoritative yet fabricated information raises pressing concerns for public trust and misinformation detection. Guided by the media evocation paradigm (MEP), this study examines how source credibility, content style, and communication channel influence users’ ability to detect GAI-generated health fake news. A 2 × 2 × 3 mixed experimental design (N = 120) was employed, in which participants evaluated nine GAI-generated news items across television, newspapers, and social media. Results show that non-authoritative sources, rational framing, and social media platforms significantly enhanced detection accuracy. In contrast, authoritative sources and emotional content in traditional media environments reduced detection rates. A significant three-way interaction reveals that detection accuracy was highest when all three media cues aligned (non-authoritative source, rational style, and social media context). This study extends the MEP to the context of GAI-generated health news and highlights the importance of reflective media processing in how individuals assess information credibility. By identifying how specific combinations of media cues affect fake news detection, the findings offer practical implications for improving public resilience against health misinformation and inform the design of more effective communication strategies in GAI-mediated health contexts.

  • Video Transcripts
  • 10.48448/t9ta-hm27
Drink bleach or do what now? Covid-HeRA: A study of risk-informed health decision making in the presence of COVID19 misinformation
  • May 7, 2022
  • Underline Science Inc.
  • Chengxiang Zhai + 3 more

Given the widespread dissemination of inaccurate medical advice related to the 2019 coronavirus pandemic (COVID-19), such as fake remedies, treatments and prevention suggestions, misinformation detection has emerged as an open problem of high importance and interest for the research community. Several works study health misinformation detection, yet little attention has been given to the perceived severity of misinformation posts. In this work, we frame health misinformation as a risk assessment task. More specifically, we study the severity of each misinformation story and how readers perceive this severity, \ie how harmful a message believed by the audience can be and what type of signals can be used to recognize, potentially malicious, fake news and detect refuted claims. To address our research questions, we introduce a new benchmark dataset, accompanied with detailed data analysis. We evaluate several traditional and state-of-the-art models, and show there is a significant gap in performance when applying traditional misinformation models to detect severe misinformation. We conclude with open challenges and future directions.

  • Research Article
  • Cite Count Icon 21
  • 10.1609/icwsm.v16i1.19372
Drink Bleach or Do What Now? COVID-HeRA: A Study of Risk-Informed Health Decision Making in the Presence of COVID-19 Misinformation
  • May 31, 2022
  • Proceedings of the International AAAI Conference on Web and Social Media
  • Arkin Dharawat + 3 more

Given the widespread dissemination of inaccurate medical advice related to the 2019 coronavirus pandemic (COVID-19), such as fake remedies, treatments and prevention suggestions, misinformation detection has emerged as an open problem of high importance and interest for the research community. Several works study health misinformation detection, yet little attention has been given to the perceived severity of misinformation posts. In this work, we frame health misinformation as a risk assessment task. More specifically, we study the severity of each misinformation story and how readers perceive this severity, i.e., how harmful a message believed by the audience can be and what type of signals can be used to recognize potentially malicious fake news and detect refuted claims. To address our research questions, we introduce a new benchmark dataset, accompanied with detailed data analysis. We evaluate several traditional and state-of-the-art models, and show there is a significant gap in performance when applying traditional misinformation classification models to this task. We conclude with open challenges and future directions.

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.eswa.2005.01.014
An agent to detect online false and misleading claims for weight loss products in Korea
  • Feb 16, 2005
  • Expert Systems with Applications
  • N Sung + 1 more

An agent to detect online false and misleading claims for weight loss products in Korea

  • Research Article
  • 10.36948/ijfmr.2025.v07i06.65117
AI Powered Fake Video and Misinformative Detection
  • Dec 31, 2025
  • International Journal For Multidisciplinary Research
  • Anurag Patel + 4 more

The rapid growth of social media has accelerated the creation and spread of fake videos, deepfakes, and misleading content, posing serious risks to public trust, security, and digital integrity. This research presents an AI-powered system for automated fake video and misinformation detection, combining computer vision, deep learning, and natural language processing to analyze both visual and contextual cues. The proposed framework integrates convolutional neural networks (CNNs) and transformer-based architectures to detect frame-level manipulation patterns, facial inconsistencies, unnatural motion artifacts, and audio-visual mismatches. Additionally, metadata analysis and cross-referencing with verified information sources help identify misinformation embedded within video narratives. Experimental results demonstrate high accuracy in detecting deepfake artifacts and misleading claims across diverse datasets. The system provides a scalable, real-time solution suitable for social media platforms, digital forensics, and content verification agencies. This work contributes to enhancing online safety by offering a reliable AI-driven approach to counter the growing threat of fabricated and misinformative video content.

  • Research Article
  • Cite Count Icon 1
  • 10.24253/anr.v5i1.33
Nurses facing health misinformation: How to spot scientific articles misuse?
  • Dec 31, 2022
  • Archives of Nursing Research
  • Beatriz Jiménez-Gómez + 5 more

Currently, large amounts of health information, mainly in the social media field, have led to an infodemic which, together with the vast misleading and inaccurate information that can be accessed, represents a substantial public health issue. Healthcare professionals can help to identify and even prevent the dissemination of such information, as well as to lead the struggle against it by denying it. Therefore, the aim of this paper is to propose a guide to be used for the detection of health misinformation focused on health professionals. The model is based on an in-depth analysis, focused on assessing the contextualization of the type of scientific document, the possibility of extrapolation of the information, the causality, as well as the quality of the scientific evidence given. Besides requesting an effort from healthcare professionals to prevent the spread of health disinformation, we believe it is essential to offer tools to easily detect it, whereby training in research methodology is the main tool for healthcare professionals in the fight against misinformation and its negative implications on people's health

  • Conference Article
  • 10.1145/3472813.3472814
Evaluation of Applied Machine Learning for Health Misinformation Detection via Survey of Medical Professionals on Controversial Topics in Pediatrics
  • May 14, 2021
  • Hamman Samuel + 2 more

In this research, we present an evaluation of a system for detection of health misinformation using applied machine learning. The system incorporates computing automation, information retrieval, and natural language processing in conjunction with evidence-based medicine to generate a veracity score based on consensus from trusted medical knowledge bases. For our study, we pre-computed the veracity scores of controversial topics in pediatrics with our proposed system, and then also solicited evaluations of these topics from medical professionals in the neurodevelopmental field via a quantitative survey. Hence, this work provides a double-blind comparison on the veracity of medical claims between our proposed system's results and medical professionals' responses. The results showed that our system's automated assessment matched professional opinions of medical personnel with 80% precision. The survey also demonstrated the inherent challenge with health misinformation detection, as there was no consensus among the medical professionals for 50% of the controversial statements. Nevertheless, this evaluation shows promising results for using objective trust metrics such as the veracity score, in contrast with subjective trust metrics that rely on potentially biased crowdsourcing, ratings, and pre-trained labelling of data.

  • Research Article
  • 10.22214/ijraset.2025.75362
Context-Aware Multilingual Fake News Detection Using Machine Learning and Genetic Algorithm-Based Feature Selection
  • Nov 30, 2025
  • International Journal for Research in Applied Science and Engineering Technology
  • Nikita Garg

The rapid proliferation of fake news across multilingual digital platforms poses a significant challenge for information reliability and societal trust. Existing detection approaches often focus on monolingual datasets or fail to integrate robust feature selection with context-aware embeddings, limiting their scalability and effectiveness. This study proposes a novel multilingual fake-news detection framework that combines translation-driven label alignment, dense context-aware embeddings via SentenceBERT (SBERT), and Genetic Algorithm-based feature selection, followed by evaluation using multiple ensemble and traditional classifiers. The framework is validated on English and Bengali datasets, where Bengali news is translated to English and labels are generated through cosine similarity with the English dataset. By extracting semantic-rich embeddings and optimizing feature subsets, the framework effectively reduces dimensionality while retaining discriminative features, enabling enhanced model performance. Experimental results demonstrate that ensemble models, particularly Gradient Boosting and Random Forest, consistently achieve superior accuracy and robustness across languages, with the framework outperforming traditional monolingual and non-optimized approaches. The proposed pipeline addresses the gaps of multilingual alignment, optimizationdriven feature selection, and ensemble evaluation in a unified architecture, offering a scalable, language-independent, and interpretable solution for fake-news detection. These findings highlight the potential of integrating cross-lingual semantic understanding and evolutionary optimization for reliable detection of misinformation in diverse linguistic contexts, providing a foundation for future research in low-resource and multilingual settings

  • Research Article
  • Cite Count Icon 3
  • 10.1007/s13194-024-00619-z
Vigilant trust in scientific expertise
  • Nov 21, 2024
  • European Journal for Philosophy of Science
  • Hanna Metzen

This paper investigates the value of trust and the proper attitude lay people ought to have towards scientific experts. Trust in expertise is usually considered to be valuable, while distrust is often analyzed in cases where it is harmful. I will draw on accounts from political philosophy and argue that it is not only public trust that is valuable when it comes to scientific expertise – but also public vigilance. Expertise may be distorted in different ways, which cannot be remedied by internal control mechanisms alone. This reveals the importance of some forms of democratic oversight. The proper attitude is vigilant trust in expertise. However, vigilant trust seems to be a contradictory notion: How can one be trusting and watchful at the same time? I will show that it is not, and that trust and vigilance can be compatible to a certain extent. I will do so by distinguishing between different levels of both trust and vigilance. Interestingly, this argument requires acknowledging the value of some forms of distrust in scientific expertise, even if that distrust targets trustworthy experts.

Save Icon
Up Arrow
Open/Close