Learning distributed sentence representations for story segmentation

Jia Yu,Lei Xie,Xiong Xiao,Eng Siong Chng

doi:10.1016/j.sigpro.2017.07.026

Abstract

Traditional sentence representations such as bag-of-words (BOW) and term frequency-inverse document frequency (tf-idf) face the problem of data sparsity and may not generalize well. Neural network based representations such as word/sentence vectors are usually trained in an unsupervised way and lack the topic information which is important for story segmentation. In this paper, we propose to learn sentence representation by using deep neural network (DNN) to directly predict the topic class of the input sentence. By using supervised training, the learned vector representation of sentences contains more topic information and is more suitable for the story segmentation task. The input of the DNN is BOW vector computed from a context window. Multiple time resolution BOW and bottleneck features (BNF) are also introduced to enhance the performance of story segmentation. As text data labeled with topic information is limited, we cluster stories into classes and use the class ID as the topic label of the stories for DNN training. We evaluated the proposed sentence representation with the TextTiling and normalized cuts (NCuts) based story segmentation methods on the topic detection and tracking (TDT2) task. Experimental results show that the proposed topical sentence representation outperforms both the BOW baseline and the recently proposed neural network based representations, i.e., word and sentence vectors.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning distributed sentence representations for story segmentation

Abstract

Talk to us

Similar Papers

More From: Signal Processing

Lead the way for us

Journal: Signal Processing	Publication Date: Jul 24, 2017
Citations: 6

Similar Papers

Bottleneck and Embedding Representation of Speech for DNN-based Language and Speaker Recognition
Alicia Lozano-Diez ... Joaquin Gonzalez-Rodriguez
-
Alicia Lozano-Diez, et. al.Alicia Lozano-Diez ... Joaquin Gonzalez-Rodriguez
21 Nov 2018
21 Nov 2018

NIST's 1998 topic detection and tracking evaluation (TDT2)
Jon Fiscus ... John Garofolo
-
Jon Fiscus, et. al.Jon Fiscus ... John Garofolo
05 Sep 1999
NIST's 1998 topic detection and tracking evaluation (TDT2)
Jon Fiscus ... John Garofolo

An end-to-end neural network approach to story segmentation
Jia Yu ... Lei Xie
-
Jia Yu, et. al.Jia Yu ... Lei Xie
01 Dec 2017
01 Dec 2017

An Improved Clustering Algorithm based on Single-pass
Biao Wang ... Shun Li
-
Biao Wang, et. al.Biao Wang ... Shun Li
19 Jul 2019
19 Jul 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning distributed sentence representations for story segmentation

Abstract

Talk to us

Similar Papers

More From: Signal Processing