Abstract

Work on sentiment analysis has thus far been limited in the news article domain. This has mainly been caused by 1) news articles lacking a clearly defined target, 2) the difficulty in separating good and bad news from positive and negative sentiment, and 3) the seeming necessity of, and complexity in, relying on domain-specific interpretations and background knowledge. In this paper we propose, define, experiment with, and evaluate, four different feature categories, composed of 26 article features, for sentiment analysis. Using five different machine learning methods, we train sentiment classifiers of Norwegian financial internet news articles, and achieve classification precisions up to ~71%. This is comparable to the state-of-the-art in other domains and close to the human baseline. Our experimentation with different feature subsets shows that the category relying on domain-specific sentiment lexical ('contextual' category), able to grasp the jargon and lingo used in Norwegian financial news, is of cardinal importance in classification - these features yield a precision increase of ~21% when added to the other feature categories. When comparing different machine learning classifiers, we find J48 classification trees to yield the highest performance, closely followed by Random Forests (RF), in line with recent studies, and in opposition to the antedated conception that Support Vector Machines (SVM) is superior in this domain.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call