This paper presents a multimodal dataset capturing fact-checked news coverage of Chile’s constitutional processes from 2019–2023. The collection comprises 300 articles from three sources: Fast Check, Fact Checking UC, and BioBioChile, containing 242,687 words of text and visual content in 168 entries. The dataset implements advanced natural language processing through RoBERTa and computer vision techniques via EfficientNet, with unified multimodal analysis using the CLIP model. Technical validation through clustering analysis and expert review demonstrates the dataset’s effectiveness in identifying narrative patterns within constitutional process coverage. The structured format includes verification metadata, precomputed embeddings, and documented relationships between textual and visual elements. This enables research into how misinformation propagates through multiple channels during significant political events. This paper details the dataset’s composition, collection methodology, and validation while acknowledging specific limitations. This contribution addresses a gap in current research resources by providing verified multimodal content spanning two constitutional processes, supporting investigations in computational social science and misinformation studies.
Read full abstract