Abstract

The inestimable volumes of multimedia associated with spoken documents that been made available to the public in the past two decades have brought spoken document understanding and organization to the forefront as subjects of research. Among all the related subtasks, spoken document indexing, retrieval and summarization can be thought of as the cornerstones of this research area. Statistical language modeling (LM), which purports to quantify the acceptability of a given piece of text, has long been an interesting yet challenging research area. Much research shows that language modeling for spoken document processing has enjoyed remarkable empirical success. Motivated by the great importance of and interest in language modeling for various spoken document processing tasks (i.e., indexing, retrieval and summarization), language modeling is the backbone of this thesis. In real-world applications, a serious challenge faced by the search engine is that queries usually consist of only a few words to address users’ information needs. This thesis starts with a general survey of the practical challenge, and then not only proposes a principled framework which can unify the relationships among several widely-used approaches but also extends this school of techniques to spoken document summarization tasks. Next, inspired by the concept of the i-vector technique, an i-vector based language modeling framework is proposed for spoken document retrieval and reformulated to accurately represent users’ information needs. Following, we are aware that language models have shown preliminary success in extractive speech summarization, but a central challenge facing the LM approach is how to formulate sentence models and accurately estimate their parameters for each sentence in the spoken document to be summarized. Thus, in this thesis we propose a framework which builds on the notion of recurrent neural network language models and a curriculum learning strategy, which shows promise in capturing not only word usage cues but also long-span structural information about word co-occurrence relationships within spoken documents, thus eliminating the need for the strict bag-of-words assumption made by most existing LM-based methods. Lastly, word embedding has been a recent popular research area due to its excellent performance in many natural language processing (NLP)-related tasks. However, as far as we are aware, there are relatively few studies that investigate its use in extractive text or speech summarization. First of all, this thesis focuses on building novel and efficient ranking models based on general word embedding methods for extractive speech summarization. Next, the thesis proposes a novel probabilistic modeling framework for learning word and sentence representations, which not only inherits the advantages of the original word embedding methods but also boasts a clear and rigorous probabilistic foundation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call