Abstract

Machine learning is a subfield of artificial intelligence (AI) and computer science that utilizes data and algorithms to imitate how people learn, progressively improving its accuracy. Machine learning is an important component of the growing field of data science. Through the use of statistical methods, algorithms are trained to make classifications or predictions, uncovering key insights. Detecting fake news comes under a classification problem. Fake news is false or misleading information presented as news. The initial stage in classification is dataset collection, which is followed by preprocessing, feature selection, dataset training and testing, and finally executing the classifier. There is a large amount of written text in the news. This text is processed using NLP. NLP can perform an intelligent analysis of large amounts of plain written text and generate insights from it. It involves methods like data preprocessing and feature selection. Data pre-processing involves data cleaning, removing any incorrect, duplicate, or incomplete data within a dataset. Feature selection is done using the CountVectorizer and TF-IDF Vectorizer. Then comes dataset training and testing and the use of similar data for training and testing reduces the impact of data inconsistencies. After processing the model using the training set, the model is tested by making predictions against the test set. Then, to assess the performance of the classification model for the provided set of test data confusion matrix is used. The primary purpose is to use the Naive Bayes (NB) Classifier technique to generate two classification models one using CountVectorizer and other using TF-IDF Vectorizer and compare their accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.