Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization

Paul Pu Liang,Louis-Philippe Morency,Ruslan Salakhutdinov,Yao-Hung Hubert Tsai,Qibin Zhao,Zhun Liu

doi:10.18653/v1/p19-1152

Paul Pu Liang, Louis-Philippe Morency + Show 4 more

Open Access

PDF Available

https://doi.org/10.18653/v1/p19-1152

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2019
Citations: 55	License type: cc-by

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

There has been an increased interest in multimodal language processing including multimodal dialog, question answering, sentiment analysis, and speech recognition. However, naturally occurring multimodal data is often imperfect as a result of imperfect modalities, missing entries or noise corruption. To address these concerns, we present a regularization method based on tensor rank minimization. Our method is based on the observation that high-dimensional multimodal time series data often exhibit correlations across time and modalities which leads to low-rank tensor representations. However, the presence of noise or incomplete values breaks these correlations and results in tensor representations of higher rank. We design a model to learn such tensor representations and effectively regularize their rank. Experiments on multimodal language data show that our model achieves good results across various levels of imperfection.

Highlights

Analyzing multimodal language sequences spans various fields including multimodal dialog (Das et al, 2017; Rudnicky, 2005), question answering (Antol et al, 2015; Tapaswi et al, 2015; Das et al, 2018), sentiment analysis (Morency et al, 2011), and speech recognition (Palaskar et al, 2018)
Simple tensors that can be represented as outer products of language AAAB/XicbVDLSsNAFJ3UV62v+Ni5GSyCq5KooMuiGxcKFfuCNpTJdNIOnUzCzI1YQ/FX3LhQxK3/4c6/cdpmoa0HLhzOuZd77/FjwTU4zreVW1hcWl7JrxbW1jc2t+ztnbqOEkVZjUYiUk2faCa4ZDXgIFgzVoyEvmANf3A59hv3TGkeySoMY+aFpCd5wCkBI3XsvTawB0gFkb2E9Bi+vqvejDp20Sk5E+B54makiDJUOvZXuxvRJGQSqCBat1wnBi8lCjgVbFRoJ5rFhA7MhpahkoRMe+nk+hE+NEoXB5EyJQFP1N8TKQm1Hoa+6QwJ9PWsNxb/81oJBOdeymWcAJN0uihIBIYIj6PAXa4YBTE0hFDFza2Y9okiFExgBROCO/vyPKkfl9yTknN7WixfZHHk0T46QEfIRWeojK5QBdUQRY/oGb2iN+vJerHerY9pa87KZnbRH1ifP3B7lTI=
We show that T2FN increases the capacity of TFN to capture high-rank tensor representations, which itself leads to improved prediction performance

Summary

Introduction

Analyzing multimodal language sequences spans various fields including multimodal dialog (Das et al, 2017; Rudnicky, 2005), question answering (Antol et al, 2015; Tapaswi et al, 2015; Das et al, 2018), sentiment analysis (Morency et al, 2011), and speech recognition (Palaskar et al, 2018). These multimodal sequences contain heterogeneous sources of information across the language, visual and acoustic modalities.

Methods

Results

Discussion

Conclusion