Deception detection in Russian texts

Olga Litvinova,John Lyell,Tatiana Litvinova,Pavel Seredin

doi:10.18653/v1/e17-4005

Abstract

Humans are known to detect deception in speech randomly and it is therefore important to develop tools to enable them to detect deception. The problem of deception detection has been studied for a significant amount of time, however the last 10-15 years have seen methods of computational linguistics being employed. Texts are processed using different NLP tools and then classified as deceptive/truthful using machine learning methods. While most research has been performed for English, Slavic languages have never been a focus of detection deception studies. The paper deals with deception detection in Russian narratives. It employs a specially designed corpus of truthful and deceptive texts on the same topic from each respondent, N = 113. The texts were processed using Linguistic Inquiry and Word Count software that is used in most studies of text-based deception detection. The list of parameters computed using the software was expanded due to the designed users’ dictionaries. A variety of text classification methods was employed. The accuracy of the model was found to depend on the author’s gender and text type (deceptive/truthful).

Highlights

Deception is defined as the intentional falsification of truth made to cause a false impression or lead to a false conclusion (Burgoon and Buller, 1994)
It is only very recently that methods of modern computational linguistics and data analysis have been employed in addressing this issue (Newman et al, 2003)
Most papers dealing with automated deception detection were performed using English texts with the evaluation of reliability/truthfulness of the narrative being addressed as a text classification task employing machine learning methods

Summary

Introduction

Deception is defined as the intentional falsification of truth made to cause a false impression or lead to a false conclusion (Burgoon and Buller, 1994). With the growing number of Internet communications it is increasingly important to identify deceptive information in short written texts. Most papers dealing with automated deception detection were performed using English texts with the evaluation of reliability/truthfulness of the narrative being addressed as a text classification task employing machine learning methods. Using the previously mentioned corpus, a statistically significant difference between truthful and deceptive texts from the same author, written using an identical theme, was discovered. Utilizing these parameters we offer a new approach to the evaluation of the reliability/truthfulness of the Russian written narrative. The classifier was test separately for both men and women

Related Work

Experiments

Findings

Conclusion