MDCNN: A multimodal dual-CNN recursive model for fake news detection via audio- and text-based speech emotion recognition
MDCNN: A multimodal dual-CNN recursive model for fake news detection via audio- and text-based speech emotion recognition
- Research Article
- 10.1177/14727978251361541
- Jul 19, 2025
- Journal of Computational Methods in Sciences and Engineering
The spread of false news has hurt both individual practitioners and the media. To enhance the efficiency of false news detection, this study constructs a multi-modal news detection model. The model includes a text encoding module, a contextual semantic encoder, a news propagation encoder, and a false news detection model that integrates semantic features and image recognition. In the results, the multi-modal model showed significantly higher accuracy and F1 score in detecting false news than the unimodal model. Its accuracy and F1 score improved by an average of 7.57% and 7.34% on the POL and GOS datasets, and 7.20% and 6.38% on the WEIBO and TWITTER datasets. In addition, hyperparameter analysis showed that the model performance reached its optimum when the parameters r and k were adjusted to their optimal values. The ablation experiment further validated the importance of the channel attention mechanism and graph comparison method in improving model performance. The results indicate that multi-modal models have significant advantages in detecting false news and can effectively utilize information from different modalities to improve detection accuracy. This study is meaningful for evaluating the reliability of false news information and the media’s credibility in society. Although certain achievements have been made in the research, there are still some limitations. For example, the model may have generalization issues when tested on specific datasets, and the complexity of the model may make deployment difficult in resource-constrained environments. Future work will explore simplified versions of the model and conduct tests on more diverse datasets to enhance the model’s generalization ability and practicability.
- Research Article
5
- 10.1609/icwsm.v18i1.31341
- May 28, 2024
- Proceedings of the International AAAI Conference on Web and Social Media
Information spreads quickly through social media platforms, especially fake news with negative or even malicious intentions. In recent years, psychological studies have found that explicit reminders of fake news would diminish its consequence. Therefore, it is crucial to identify their authenticity at an early stage to avoid serious consequences. However, existing methods for fake news detection either utilize auxiliary information including users’ profiles and related events propagation networks or require sufficient and high-quality training data, which is not suitable for early fake news detection in real. An increasing number of social media news not only involves natural language content but also visual content such as images and videos, which give us a new view of fake news detection at an early stage by multi-modal data. In this paper, we propose a Multi-modal Prompt Learning framework (MPL) based on the multi-modal pre-trained model CLIP for early detection of fake news. A learnable prompt module is developed to adaptively and efficiently generate prompt representations to boost the semantic context. MPL can be implemented in supervised or few-shot settings. Extensive experiments show that the proposed MPL obtains substantial performance and efficiency improvement for the early-stage fake news detection task. The results demonstrate that MPL performs considerably well compared to both the state-ofthe-art supervised multi-modal models and the latest promptbased few-shot multi-modal models. Especially, the high recall of fake news and the high precision of real news that MPL achieved compared to other baselines verify that it will better approach one of the motivations that providing early notification of “maybe real” or “maybe fake” with the release of the news.
- Conference Article
10
- 10.1145/3512732.3533583
- Jun 27, 2022
With the advent of the era of big data, the ubiquity of multi-modal fake news has increasingly affected information dissemination and consumption. Measurements should be taken to identify multimodal fake news for improving the credibility of news. However, existing single-modal fake news detection models fail to detect fake news based on complete multi-modal information, while multimodal models are often difficult to fully utilize the original information of each single modality to obtain the ultimate accuracy. To tackle above problems, we propose a novel multi-modal fake news detection method, called fake news detection based on multi-modal classifier ensemble, which takes into account the advantages of both single-modal and multi-modal models. Specifically, we design two single-modal classifiers for text and image inputs respectively. We then establish a similarity classifier to calculate the feature similarity over the modalities. We also build an integrity classifier that utilizes integral multi-modal information. Finally, all classifier outputs are integrated with an ensemble learning to increase the classification accuracy. Furthermore, we introduce the center loss, to reduce intra-class variance, which is helpful to achieve higher detection accuracy. The cross-entropy loss is used to maximize the inter-class variations while the center loss is used to minimize the intra-class variations so that the discriminate ability of the learned news features can be enhanced. Experimental results on both Chinese and English datasets demonstrate that the proposed method outperforms the baseline fake news detection approaches.
- Research Article
33
- 10.1016/j.heliyon.2023.e20382
- Sep 21, 2023
- Heliyon
A review of fake news detection approaches: A critical analysis of relevant studies and highlighting key challenges associated with the dataset, feature representation, and data fusion
- Research Article
47
- 10.1016/j.inffus.2023.102172
- Nov 30, 2023
- Information Fusion
QMFND: A quantum multimodal fusion-based fake news detection model for social media
- Research Article
- 10.1186/s12916-025-04316-3
- Aug 15, 2025
- BMC Medicine
BackgroundVisual identification of interictal epileptiform discharge (IED) is expert-biased and time-consuming. Accurate automated IED detection models can facilitate epilepsy diagnosis. This study aims to develop a multimodal IED detection model (vEpiNetV2) and conduct a multi-center validation.MethodsWe constructed a large training dataset to train vEpiNetV2, which comprises 26,706 IEDs and 194,797 non-IED 4-s video-EEG epochs from 530 patients at Peking Union Medical College Hospital (PUMCH). The automated IED detection model was constructed using deep learning based on video and electroencephalogram (EEG) features. We proposed a bad channel removal model and patient detection method to improve the robustness of vEpiNetV2 for multi-center validation. Performance is verified in a prospective multi-center test dataset, with area under the precision-recall curve (AUPRC) and area under the curve (AUC) as metrics.ResultsTo fairly evaluate the model performance, we constructed a large test dataset containing 149 patients, 377 h video-EEG data, and 9232 IEDs from PUMCH, Children’s Hospital Affiliated to Shandong University (SDQLCH) and Beijing Tiantan Hospital (BJTTH). Amplitude discrepancies are observed across centers and could be classified by a classifier. vEpiNetV2 demonstrated favorable accuracy for the IED detection, achieving AUPRC/AUC values of 0.76/0.98 (PUMCH), 0.78/0.96 (SDQLCH), and 0.76/0.98 (BJTTH), with false positive rates of 0.16–0.31 per minute at 80% sensitivity. Incorporating video features improves precision by 9%, 7%, and 5% at three centers, respectively. At 95% sensitivity, video features eliminated 24% false positives in the whole test dataset. While bad channels decreased model precision, video features compensate for this deficiency. Accurate patient detection is essential; otherwise, incorrect patient detection can negatively impact overall performance.ConclusionsThe multimodal IED detection model, which integrates video and EEG features, demonstrated high precision and robustness. The large multi-center validation confirmed its potential for real-world clinical application and the value of video features in IED analysis.Supplementary InformationThe online version contains supplementary material available at 10.1186/s12916-025-04316-3.
- Conference Article
42
- 10.1109/uemcon53757.2021.9666618
- Dec 1, 2021
In the global pandemic, social media platforms are the primary source of information exchange. Social bots are one of the main sources of misinformation in the COVID-19 pandemic but do social bots spread the fake and real news with the same ratio as human accounts on social media platforms? Can bot detection improve fake news detection on social media platforms? Who presents more fake news in the COVID-19 pandemic, Human or social bots? This work provides preliminary research results based on limited data to answer these questions, but it opens a new perspective on fake news detection and bot detection on online platforms. We use Bidirectional Encoder Representations from Transformers(BERT) to create a new model for fake news detection. We use the transfer learning model to detect bot accounts in the COVID-19 data set. Then apply new features to improve the new fake news detection model in the COVID-19 data set.
- Research Article
4
- 10.1145/3700748
- Nov 22, 2024
- ACM Computing Surveys
Fake news on social networks is a challenging problem due to the rapid dissemination and volume of information, as well as the ease of creating and sharing content anonymously. Fake news stories are problematic not only for the credibility of online journalism, but also due to their detrimental real-world consequences. The primary research objective of this study is to identify recent state-of-the-art deep learning methods used to detect fake news in social networks. This article presents a systematic literature review of deep learning-based fake news detection models in social networks. The methodology followed a rigorous approach, including predefined criteria for study selection of deep learning modalities. This study focuses on the types of deep learning modalities: unimodal (refers to the use of a single model for analysis or modeling purposes) and multimodal models (refers to the integration of multiple models). The results of this review reveal the strengths and weaknesses of modalities approaches, as well as the limitations of low-resource languages datasets. Furthermore, it provides insights into future directions for deep learning models and different fact-checking techniques. At the end of this study, we discuss the problem of fake news detection in the era of large language models in terms of advantages, drawbacks, and challenges.
- Research Article
4
- 10.1155/2023/8836476
- Jul 3, 2023
- Advances in Multimedia
With the development of online social media, the number of various news has exploded. While social media provides an information platform for news release and dissemination, it also makes fake news proliferate, which may cause potential social risks. How to detect fake news quickly and accurately is a difficult task. The multimodal fusion fake news detection model is the current research focus and development trend. However, in terms of content, most existing methods lack the mining of background knowledge hidden in the news content and ignore the connection between background knowledge and existing knowledge system. In terms of the propagation chain, the research tends to emphasize only the single chain from the previous communication node, ignoring the intricate communication chain and the mutual influence relationship among users. To address these problems, this paper proposes a multimodal fake news detection model, A-KWGCN, based on knowledge graph and weighted graph convolutional network (GCN). The model fully extracted the features of the content and the interaction between users of the news dissemination. On the one hand, the model mines relevant knowledge concepts from the news content and links them with the knowledge entities in the wiki knowledge graph, and integrates knowledge entities and entity context as auxiliary information. On the other hand, inspired by the “similarity effect” in social psychology, this paper constructs a user interaction network and defines the weighted GCN by calculating the feature similarity among users to analyze the mutual influence of users. Two public datasets, Twitter15 and Twitter16, are selected to evaluate the model, and the accuracy reaches 0.905 and 0.930, respectively. In the comparison experiments, A-KWGCN model has more significant advantages than the other six comparison models in four evaluation indexes. Also, ablation experiments are conducted to verify that knowledge module and weighted GCN module play the significant role in the detection of fake news.
- Research Article
20
- 10.1177/2059204318762650
- Jan 1, 2018
- Music & Science
The acoustic cues that convey emotion in speech are similar to those that convey emotion in music, and recognition of emotion in both of these types of cue recruits overlapping networks in the brain. Given the similarities between music and speech prosody, developmental research is uniquely positioned to determine whether recognition of these cues develops in parallel. In the present study, we asked 60 children aged 6 to 11 years, and 51 university students, to judge the emotions of 10 musical excerpts, 10 inflected speech clips, and 10 affect burst clips. We presented stimuli intended to convey happiness, sadness, anger, fear, and pride. Each emotion was presented twice per type of stimulus. We found that recognition of emotions in music and speech developed in parallel, and adult-levels of recognition develop later for these stimuli than for affect bursts. We also found that sad stimuli were most easily recognised, followed by happiness, fear, and then anger. In addition, we found that recognition of emotion in speech and affect bursts can predict emotion recognition in music stimuli independently of age and musical training. Finally, although proud speech and affect bursts were not well recognised, children aged eight years and older showed adult-like responses in recognition of proud music.
- Research Article
40
- 10.1609/aaai.v38i16.29771
- Mar 24, 2024
- Proceedings of the AAAI Conference on Artificial Intelligence
Instruction tuned Large Vision Language Models (LVLMs) have significantly advanced in generalizing across a diverse set of multi-modal tasks, especially for Visual Question Answering (VQA). However, generating detailed responses that are visually grounded is still a challenging task for these models. We find that even the current state-of-the-art LVLMs (InstructBLIP) still contain a staggering 30 percent of the hallucinatory text in the form of non-existent objects, unfaithful descriptions, and inaccurate relationships. To address this, we introduce M-HalDetect, a Multimodal Hallucination Detection Dataset that can be used to train and benchmark models for hallucination detection and prevention. M-HalDetect consists of 16k fine-grained annotations on VQA examples, making it the first comprehensive multi-modal hallucination detection dataset for detailed image descriptions. Unlike previous work that only consider object hallucination, we additionally annotate both entity descriptions and relationships that are unfaithful. To demonstrate the potential of this dataset for hallucination prevention, we optimize InstructBLIP through our novel Fine-grained Direct Preference Optimization (FDPO). We also train fine-grained multi-modal reward models from InstructBLIP and evaluate their effectiveness with best-of-n rejection sampling (RS). We perform human evaluation on both FDPO and rejection sampling, and find that they reduce hallucination rates in InstructBLIP by 41% and 55% respectively. We also find that our reward model generalizes to other multi-modal models, reducing hallucinations in LLaVA and mPLUG-OWL by 15% and 57% respectively, and has strong correlation with human evaluated accuracy scores. The dataset is available at https://github.com/hendryx-scale/mhal-detect.
- Research Article
5
- 10.1002/asi.24202
- Apr 5, 2019
- Journal of the Association for Information Science and Technology
Implicit detection of relevance has been approached by many during the last decade. From the use of individual measures to the use of multiple features from different sources (multimodality), studies have shown the feasibility to automatically detect whether a document is relevant. Despite promising results, it is not clear yet to what extent multimodality constitutes an effective approach compared to unimodality. In this article, we hypothesize that it is possible to build unimodal models capable of outperforming multimodal models in the detection of perceived relevance. To test this hypothesis, we conducted three experiments to compare unimodal and multimodal classification models built using a combination of 24 features. Our classification experiments showed that a univariate unimodal model based on the left‐click feature supports our hypothesis. On the other hand, our prediction experiment suggests that multimodality slightly improves early classification compared to the best unimodal models. Based on our results, we argue that the feasibility for practical applications of state‐of‐the‐art multimodal approaches may be strongly constrained by technology, cultural, ethical, and legal aspects, in which case unimodality may offer a better alternative today for supporting relevance detection in interactive information retrieval systems.
- Research Article
- 10.1155/2024/8725832
- May 30, 2024
- Security and Communication Networks
Malicious encrypted traffic detection is a critical component of network security management. Previous detection methods can be categorized into two classes as follows: one is to use the feature engineering method to construct traffic features for classification and the other is to use the end-to-end method that directly inputs the original traffic to obtain traffic features for classification. Both of the abovementioned two methods have the problem that the obtained features cannot fully characterize the traffic. To this end, this paper proposes a hierarchical multimodal deep learning model (HMMED) for malicious encrypted traffic detection. This model adopts the abovementioned two feature generation methods to learn the features of payload and header, respectively, then fuses the features to get the final traffic features, and finally inputs the final traffic features into the softmax classifier for classification. In addition, since traditional deep learning is highly dependent on the training set size and data distribution, resulting in a model that is not very generalizable and difficult to adapt to unseen encrypted traffic, the model proposed in this paper uses a large amount of unlabeled encrypted traffic in the pretraining layer to pretrain a submodel used to obtain a generic packet payload representation. The test results on the USTC-TFC2016 dataset show that the proposed model can effectively solve the problem of insufficient feature extraction of traditional detection methods and improve the ACC of malicious encrypted traffic detection.
- Research Article
- 10.32628/ijsrst2184111
- Nov 1, 2021
- International Journal of Scientific Research in Science and Technology
A social media adoption is important to provide content authenticity and awareness for the unknown news that might be fake. Therefore, a Natural Language Processing (NLP) model is required to identify the content properties for language-driven feature generation. The present research work utilizes language-driven features that extract the grammatical, sentimental, syntactic, readable features. The feature from the particular news content is extracted to deal with the dimensional problem as the language level features are quite complex. Thus, the Dropout layer-based Long Short Term Network Model (LSTM) for sequential learning achieved better results during fake news detection. The results obtained validate the important features extracted linguistic model features and are combined to achieve better classification accuracy. The proposed Drop out based LSTM model obtained accuracy of 95.3% for fake news classification and detection when compared to the sequential neural model for fake news detection.
- Research Article
37
- 10.1016/j.eswa.2021.115491
- Jun 30, 2021
- Expert Systems with Applications
A link2vec-based fake news detection model using web search results
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.