Domain Independent Key Term Extraction from Spoken Content Based on Context and Term Location Information in the Utterances

Hsien-Chin Lin,Chi-Yu Yang,Lin-Shan Lee,Hung-Yi Lee

doi:10.1109/icassp.2018.8462252

Abstract

This paper proposes a domain independent approach for extracting key terms from spoken content based on context and term location information, or the sentence structures. Once it is trained with data of enough different domains, it is able to extract key terms in other unseen domains. This is obviously very attractive because of the unlimited number of domains over the Internet. Its performance degrades only very slightly with recognition errors, so very useful for spoken content. The basic idea here is that the sentence structures or context and term location information are in general domain independent, and remain essentially unchanged with recognition errors. For example, the fact that the key term for the sentence “The subject of this article is primarily about neural networks” is “neural networks” can be extended to any other unseen term other than “neural networks” in any other unseen domain, and this is more or less preserved under recognition errors. In the experiments a model trained with data for five different domains can extract key terms from data in the sixth unseen domain with very good performance.

Full Text