In an era, In which misleading data may spread quickly, it is critical to have an efficient fake news detection system. This study explores the field of text-based machine learning models with the goal of separating potentially unreliable news stories from legitimate news articles in daily newspapers. The dataset that was obtained from Kaggle provides the basis for this project. It includes a number of characteristics, such as article headings, authors, textual content, and labels identifying whether an article is "Fake News" or "Real News." A methodical strategy is used that includes data preparation, feature engineering, model selection, and hyperparameter tweaking to provide the best level of accuracy. Text data is tokenized, stemmed, and stop words are eliminated before being converted to numerical features using methods like TF-IDF and word embeddings. To assess model performance, the dataset is intelligently split into training and testing sets. Logistic regression, Naive Bayes, SVM, and sophisticated deep learning models like BERT and GPT are among the machine learning models that are taken into account. To improve accuracy, the project also uses ensemble learning strategies.
Read full abstract