音源分離との統合によるミッシングフィーチャマスク自動生成に基づく同時発話音声認識

Shunichi Yamamoto,Mikio Nakano,Kazunori Komatani,Kazuhiro Nakadai,Jean-Marc Valin,Hiroshi Tsujino,Hiroshi G Okuno,Tetsuya Ogata

doi:10.7210/jrsj.25.92

Abstract

Our goal is to realize a humanoid robot that has the capabilities of recognizing simultaneous speech. A humanoid robot under real-world environments usually hears a mixture of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. In particular, an interface between sound source separation and speech recognition is important. In this paper, we designed an interface between sound source separation and speech recogniton by applying Missing Feature Theory (MFT) . In this method, spectral sub-bands distorted by sound source separation are detected from input speech as missing features. The detected missing features are masked on recognition not to affect the system badly. Therefore, this method is more flexible when noises change dynamically and drastically. It is the most important issue how distorted spectral sub-bands are detected. To solve the issue, we used speech feature apropriate for MFT-based ASR, and developed automatic missing feature mask generation. As a speech feature, we used a Mel-Scale Log Spectral (MSLS) feature instead of Mel-Frequency Cepstrum Coefficient (MFCC) which is commonly used for ASR. We presented a method of generating missing feature mask automatically by using information from sound source separation. To evaluate our method, we implemented it in a humanoid robotSIG2, and performed the experiments on recognition of three simultaneous isolated words. As a result, our method outperformed conventional ASR with MSLS feature.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

音源分離との統合によるミッシングフィーチャマスク自動生成に基づく同時発話音声認識

Abstract

Talk to us

Similar Papers

More From: Journal of the Robotics Society of Japan

Lead the way for us

Similar Papers

Enhanced Robot Speech Recognition Based on Microphone Array Source Separation and Missing Feature Theory
S Yamamoto ... T Ogata
-
S Yamamoto, et. al.S Yamamoto ... T Ogata
23 Jun 2015
23 Jun 2015

Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net
Yui Sudo ... Kazuhiro Nakadai
-
Yui Sudo, et. al.Yui Sudo ... Kazuhiro Nakadai
11 Jan 2021
11 Jan 2021

Making a robot recognize three simultaneous sentences in real-time
S Yamamoto ... J.-M Valin
-
S Yamamoto, et. al.S Yamamoto ... J.-M Valin
01 Jan 2004
01 Jan 2004

A real-time super-resolution robot audition system that improves the robustness of simultaneous speech recognition
Keisuke Nakamura ... Hiroshi G Okuno
Advanced Robotics | VOL. 27
Keisuke Nakamura, et. al.Keisuke Nakamura ... Hiroshi G Okuno
01 Aug 2013
Advanced Robotics | VOL. 27

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

音源分離との統合によるミッシングフィーチャマスク自動生成に基づく同時発話音声認識

Abstract

Talk to us

Similar Papers

More From: Journal of the Robotics Society of Japan