Classifying thai news headlines using an artificial neural network

Benjamin Chanakot,Charun Sanrach

doi:10.11591/eei.v12i1.4228

Abstract

This research aimed to measure the effectiveness of Thai news headlines classification using an artificial neural network (ANN). The headlines consisted of i) political news, ii) sports news, iii) economic news, and iv) crime news, 1,200 headlines in total. The distribution of headlines was measured by using chi-square, information gain, and term frequency inverse class frequency (TFICF). Threshold default value was set in relation to terms of headlines before cross-validation was employed to categorize the data to examine the efficiency of the model using a neural network algorithm in classifying the headlines. The investigation of the news headline classification efficiency revealed that the 15-fold data division using TFICF was the most accurate in classifying headlines, with the accuracy rate of 99.60% and F-measure rate of 99.05%. Moreover, it was found that when more news headlines were provided as the learning data, the news headline classification became more accurate. Likewise, appropriate threshold value determination facilitated the selection of appropriate features in the headlines and resulted in more effective and accurate classification. Hence, it can be concluded that headline classification will be more accurate if the appropriate amount of learning data exists, and appropriate threshold value was set.

Full Text