Abstract

Fake news (FN) has become a big problem in today's world, recognition partly to the widespread use of social media. A wide variety of news organizations and news websites post their stories on social media. It is important to verify that the information posted is genuine and obtained from reputable sources. The intensity and sincerity of internet news cannot be quantified completely and remains a challenge. We present an FNU-BiCNN model for identifying FN and fake URLs in this study by analyzing the correctness of a report and predicting its validity. Stop words and stem words with NLTK characteristics were employed during data pre-processing. Following that, we compute the TF-IDF using LSTM, batch normalization, and dense. The WORDNET Lemmatizer is used to choose the features. Bi-LSTM with ARIMA and CNN are used to train the datasets, and various machine learning techniques are used to classify them. By deriving credibility ratings from textual data, this model develops an ensemble strategy for concurrently learning the depictions of news stories, authors, and titles. To achieve greater accuracy while using Voting ensemble classifier and compared with several machine learning algorithms such as SVM, DT, RF, KNN, and Naive Bayes were tried, and it was discovered that the voting ensemble classifier achieved the highest accuracy of 99.99%. Classifiers' accuracy, recall, and F1-Score were used to assess their performance and efficacy.

Highlights

  • Recent years have seen a rapid rise in the popularity of social networking sites due to greater media coverage

  • Step 1: Pre-processing using Natural Language Toolkit (NLTK) porter stemmer algorithm Step 2: Extract Top features using TF-IDF using LSTM, Batch normalization, and dense Step 3: Select Top features using TF-IDF and Bag of words Step 4: If features are selected, Step 5: Using WORDNET sentiment analysis Step 6: Else Step 7: Repeat step 1 Step 8: Training data using ARIMA+BiLSTM+Convolutional Neural Networks (CNN) Step 9: Find the Training accuracy and loss Step 10: Classification is done by ML Algorithms Step 11: Find the Classification Metrics Step 12: Create a pickle file Step 13: Find the fake news and Uniform Resource Locators (URLs) Step 14: End

  • The Natural Language Toolkit (NLTK) is a critical framework for developing programs that interact with data derived from human language in Python programming

Read more

Summary

INTRODUCTION

Recent years have seen a rapid rise in the popularity of social networking sites due to greater media coverage. Social networking platforms are the preferred news source for many people [1]. It is impossible to accurately measure the reliability of information posted on social media networks [3]. The situation deteriorated as more individuals became aware of the fabricated information online Finding such news online is a difficult Endeavour. Base classifiers include Support Vector Machine (SVM), KNN, Naive Bayes [NB], Decision Tree (DT), and Random Forest (RF) Together, these two classification models provide a better estimate and a classifier that beats all others in terms of accuracy and predictability. B) URL characteristics may be extracted by looking at the domain name (domain) c) The on-site URL feature may be used to estimate the multi-source trustworthiness score by combining text-based characteristics and multi-source credibility ratings to estimate news credibility

Our Contributions
BACKGROUND
PROPOSED FRAMEWORK
Problem Statement
Preprocessing
Feature Extraction
Training Data
RESULTS AND DISCUSSION
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.