Abstract

Text classifiers can automatically analyse text using Natural Language Processing (NLP) techniques and then assign categories based on its content. Applying machine learning techniques in the field of NLP has achieved appreciable results. In this work, a system for analysing and classifying news videos based on the audio content using machine learning techniques has been presented. It assists the user to find the genre of a news video without watching it. In the proposed work, NLP techniques are utilized to identify the most correlated unigrams and bigrams, TF-IDF which are the features used to train the model using the machine learning techniques such as Multinomial Naive-Bayes classifier, Logistic Regression and Support Vector Machines. The performance of various classifiers in classifying the news videos are analysed and presented here. For this purpose, a dataset has been collected, which consists of 25 News videos of CNN news channel which covers almost five categories. However, the classifier models are trained using text news data obtained from BBC news articles. The accuracy of the classifiers is tested for both BBC text news and also for the text news extracted from news video. The experimental results convey that the multinomial Naive-Bayes classifier outperforms the other classifier models for both the noisy and noiseless text input.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call