Mood classifiaction of lyrics using SentiWordNet

Vipin Kumar,Sonajharia Minz

doi:10.1109/iccci.2013.6466307

Abstract

The text data being unstructured pose multiple research issues in document classification. Relevant feature extraction is the foremost problem in the preprocessing stage. SentiWordNet is an ontology that includes numeric scores related to the positive or negative aspects of the words. The work in this paper explores the use of SentiWordNet to extract sentiment features of the words in the song lyrics. The experiments are carried out on a collection of 185 lyrics each belonging to one of the four classes. Three classification algorithms namely, Naive Bayesian (NB), k-Nearest Neighbor (KNN) and Support Vector Machine (SVM) using six measures for attribute relevance analysis namely, Principal Component Analysis (PCA), Latent Semantic Analysis (LSA), Chi-Square (CS), Information Gain (IG), GINI Index (GI) and Gain Ratio (GR) have been applied to model the classifiers. The experiments examine the relevance of the sentiment features for classification. The ratio of the positive and negative scores, normalized ratio, and average of the positive and negative scores are three sentiment features. The experimental results indicate that the Naive Bayesian classifier using the average of the positive and negative score as sentiment feature, and gain ratio as feature selection criteria achieve 78.27% accuracy based on top 10% of the features. The second best accuracy has been achieved by SVM-based classifiers using the average of the positive and negative score as sentiment feature and top 10% features applying all feature selection criteria except CS.

Full Text