Abstract

Motivated by infamous cheating scandals in the media industry, the wine industry, and political campaigns, we address the problem of detecting concealed information in technical settings. In this work, we explore acoustic-prosodic and linguistic indicators of information concealment by collecting a unique corpus of professionals practicing for oral exams while concealing information. We reveal subtle signs of concealing information in speech and text, compare and contrast them with those in deception detection literature, uncovering the link between concealing information and deception. We then present a series of experiments that automatically detect concealed information from text and speech. We compare the use of acoustic-prosodic, linguistic, and individual feature sets, using different machine learning models. Finally, we present a multi-task learning framework with acoustic, linguistic, and individual features, that outperforms human performance by over 15%.

Highlights

  • In 2018, a cheating scandal (Mobley, 2018) at the world’s most notoriously difficult verbal exam for wine professionals shook the global wine industry — answers were found leaked by some examiners to candidates beforehand — and all results were invalidated; in 2016, with questions leaking ahead of political campaigns (Wemple, 2016), CNN faced a grave scandal from which only more controversies ensued; in 2000, the notorious potential debate leak (Bruni and Van Natta, 2000) inbetween the Bush and the Gore campaigns drew the attention of F.B.I. investigators

  • We collect a unique corpus of speech and text from field experiments for the purpose, and show that our multitask learning framework that combines acousticprosodic, linguistic, and individual feature sets outperforms baselines by over 11%, and human performance by over 15%

  • The base model with trigrams did perform slightly better than BiLSTM with GloVe with an improvement of 0.50 and 0.31 of F1 scores over bigrams but the resulting vector dimension does not balance well with that from Multi-Layer Perceptrons (MLP) and individual features for gradient propagation

Read more

Summary

Introduction

In 2018, a cheating scandal (Mobley, 2018) at the world’s most notoriously difficult verbal exam for wine professionals shook the global wine industry — answers were found leaked by some examiners to candidates beforehand — and all results were invalidated; in 2016, with questions leaking ahead of political campaigns (Wemple, 2016), CNN faced a grave scandal from which only more controversies ensued; in 2000, the notorious potential debate leak (Bruni and Van Natta, 2000) inbetween the Bush and the Gore campaigns drew the attention of F.B.I. investigators. Despite the importance and potential impact of detecting concealed information, research on detecting concealed information has been scarce It is partly because large-scale datasets with ground truth labels of information concealment are difficult to come by. It is only in rare cases can we verify the existence of concealed information in the wild. Contrary to deception, because of the endowment with critical information, the candidates experience more confidence, less fear, and potentially lighter cognitive load, due to the informational advantages. All of these possible offsets make it challenging to control for potential indicators of concealing information. Can we improve on human performance, with a new multimodal dataset, a better understanding of individual differences, and tailored classifiers for audios and texts?

How are indicators of concealed information related to those of deception?
Related Work
Blind Tasting Game
Acoustic-prosodic Features and Indicators
Linguistic Features and Indicators
Baseline Models
Deep Learning Models
Multi-task Learning
Measuring Human Performance
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call