LSTM‐based japanese speaker identification using an omnidirectional camera and voice information

Etsuro Nakamura,Satoshi Hirose,Yoichi Kageyama

doi:10.1002/tee.23555

Abstract

Numerous companies in Japan currently desire to increase the efficiency of meetings, which is an important objective that should be addressed with computational methods. To this end, increasing the efficiency of minute‐taking tasks using voice‐recognition technology has been considered. The development of an automated system able to identify individual speakers by voice so as to add information encoding which speaker uttered which text would substantially improve the overall efficiency of the process. Several speaker‐identification methods have been proposed in prior works on the automatic generation of meeting minutes. However, such methods require the preparation of multiple microphones to register audio signals. In contrast, speakers can be identified with a smaller preparation cost by utilizing an omnidirectional camera; however, hundreds of hours of training data are required. In this study, we propose a speaker‐identification method that uses only tens of minutes of moving images as training data. Evaluation results demonstrate that the proposed method is able to identify speakers with maximum and average success rates of 93.0% and 87.2%, respectively. © 2022 Institute of Electrical Engineers of Japan. Published by Wiley Periodicals LLC.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

LSTM‐based japanese speaker identification using an omnidirectional camera and voice information

Abstract

Talk to us

Similar Papers

More From: IEEJ Transactions on Electrical and Electronic Engineering

Lead the way for us

Journal: IEEJ Transactions on Electrical and Electronic Engineering	Publication Date: Jan 29, 2022
Citations: 4

Similar Papers

Multilingual MLP features for low-resource LVCSR systems
Samuel Thomas ... Sriram Ganapathy
-
Samuel Thomas, et. al.Samuel Thomas ... Sriram Ganapathy
01 Mar 2012
01 Mar 2012

Transfer Learning for End-to-End ASR to Deal with Low-Resource Problem in Persian Language
Maryam Asadolahzade Kermanshahi ... Babak Nasersharif
-
Maryam Asadolahzade Kermanshahi, et. al.Maryam Asadolahzade Kermanshahi ... Babak Nasersharif
03 Mar 2021
03 Mar 2021

Cross-Lingual Phone Mapping for Large Vocabulary Speech Recognition of Under-Resourced Languages
Van Hai Do ... Xiong Xiao
IEICE Transactions on Information and Systems | VOL. E97.D
Van Hai Do, et. al.Van Hai Do ... Xiong Xiao
01 Jan 2014
IEICE Transactions on Information and Systems | VOL. E97.D

Data-driven posterior features for low resource speech recognition applications
Samuel Thomas ... Aren Jansen
-
Samuel Thomas, et. al.Samuel Thomas ... Aren Jansen
09 Sep 2012
09 Sep 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LSTM‐based japanese speaker identification using an omnidirectional camera and voice information

Abstract

Talk to us

Similar Papers

More From: IEEJ Transactions on Electrical and Electronic Engineering