Elimination of person names in spoken documents for privacy protection

Ryo Kawaguchi,Masatoshi Tsuchiya,Seiichi Nakagawa

doi:10.1109/apsipa.2014.7041603

Abstract

There is an increasing use of sensor networks capable of sensing multimedia data including audio data. Unfortunately, public use of these is not allowed because they contain crucial privacy information such as person and location names. Person name extraction (PNE), which is a widely investigated research topic, is an effective technique to resolve this problem. However, there is an important difference between traditional PNE and PNE for privacy protection: traditional PNE often misses out-of-vocabulary (OOV) person names that do not occur in a training corpus, and PNE for privacy protection must cover OOV person names because of the demand for privacy protection. To resolve the issue of PNE for privacy protection, this study proposes a method consisting of two stages: the first stage is speech recognition using a language model modified to over-extract person names including OOV person names, and the second stage is filtering over-extracted person names using an SVM (Support Vector Machine). The experiments show that our method is effective in detecting / eliminating person names, and listening tests also show that the performance of our method in removing person names is promising.

Full Text