Identification of argumentative sentences in Russian scientific and popular science texts

N V Salomatina,I S Pimenov,E A Sidorova

doi:10.1088/1742-6596/2099/1/012025

Abstract

In this study we analyze the applicability of specific machine learning algorithms to the task of detecting sentences containing argumentation in Russian text. We employ a collection of scientific and popular science texts with manually annotated argumentation to evaluate the quality of identifying argumentative sentences in terms of precision, recall, and F-measure. The experiment involves three algorithms: MNB, SVM, and MLP. The bag of words model is used for representing texts. Lemmas of words in analyzed sentences serve as features for the classification. We perform the automatic selection of informative features in accordance with Variance and χ2 criteria combined with the weight-based filtration of lemmas (via TF*IDF and EMI). The training set includes around 800 sentences, while the test set contains 180. The MNB algorithm demonstrates the highest F-measure and recall scores on almost all feature sets (maximal values reached equal 68.7% and 89% respectively), while the MLP algorithm shows the best precision for about half of feature selection variations (the maximal value is 72.5%).

Full Text