Abstract

The text data being unstructured pose multiple research issues in document classification. Relevant feature extraction is the foremost problem in the preprocessing stage. SentiWordNet is an ontology that includes numeric scores related to the positive or negative aspects of the words. The work in this paper explores the use of SentiWordNet to extract sentiment features of the words in the song lyrics. The experiments are carried out on a collection of 185 lyrics each belonging to one of the four classes. Three classification algorithms namely, Naive Bayesian (NB), k-Nearest Neighbor (KNN) and Support Vector Machine (SVM) using six measures for attribute relevance analysis namely, Principal Component Analysis (PCA), Latent Semantic Analysis (LSA), Chi-Square (CS), Information Gain (IG), GINI Index (GI) and Gain Ratio (GR) have been applied to model the classifiers. The experiments examine the relevance of the sentiment features for classification. The ratio of the positive and negative scores, normalized ratio, and average of the positive and negative scores are three sentiment features. The experimental results indicate that the Naive Bayesian classifier using the average of the positive and negative score as sentiment feature, and gain ratio as feature selection criteria achieve 78.27% accuracy based on top 10% of the features. The second best accuracy has been achieved by SVM-based classifiers using the average of the positive and negative score as sentiment feature and top 10% features applying all feature selection criteria except CS.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.