Abstract

Awareness about the features of the internet, easy access to data using mobile, and affordable data facilities have caused a lot of traffic on the internet. Digitization came with a lot of opportunities and challenges as well. One of the important advantages of digitization is paperless transactions, and transparency in payment, while data privacy, fake news, and cyber-attacks are the evolving challenges. The extensive use of social media networks and e-commerce websites has caused a lot of user-generated information, misinformation, and disinformation on the Internet. The quality of information depends upon various stages (of information) like generation of information, medium of propagation, and consumption of information. Content being user-generated, information needs a quality assessment before consumption. The loss of information is also necessary to be examined by applying the machine learning approach as the volume of content is extremely huge. This research work focuses on novel multinomial classification (based on multinoulli distribution) techniques to determine the quality of the information in the given content. To evaluate the information content a single algorithm with some processing is not sufficient and various approaches are necessary to evaluate the quality of content. We propose a novel approach to calculate the bias, for which the Machine Learning model will be fitted appropriately to classify the content correctly. As an empirical study, rotten tomatoes’ movie review data set is used to apply the classification techniques. The accuracy of the system is evaluated using the ROC curve, confusion matrix, and MAP.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call