Abstract
Numerous companies in Japan currently desire to increase the efficiency of meetings, which is an important objective that should be addressed with computational methods. To this end, increasing the efficiency of minute‐taking tasks using voice‐recognition technology has been considered. The development of an automated system able to identify individual speakers by voice so as to add information encoding which speaker uttered which text would substantially improve the overall efficiency of the process. Several speaker‐identification methods have been proposed in prior works on the automatic generation of meeting minutes. However, such methods require the preparation of multiple microphones to register audio signals. In contrast, speakers can be identified with a smaller preparation cost by utilizing an omnidirectional camera; however, hundreds of hours of training data are required. In this study, we propose a speaker‐identification method that uses only tens of minutes of moving images as training data. Evaluation results demonstrate that the proposed method is able to identify speakers with maximum and average success rates of 93.0% and 87.2%, respectively. © 2022 Institute of Electrical Engineers of Japan. Published by Wiley Periodicals LLC.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEJ Transactions on Electrical and Electronic Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.