Abstract

Numerous companies in Japan currently desire to increase the efficiency of meetings, which is an important objective that should be addressed with computational methods. To this end, increasing the efficiency of minute‐taking tasks using voice‐recognition technology has been considered. The development of an automated system able to identify individual speakers by voice so as to add information encoding which speaker uttered which text would substantially improve the overall efficiency of the process. Several speaker‐identification methods have been proposed in prior works on the automatic generation of meeting minutes. However, such methods require the preparation of multiple microphones to register audio signals. In contrast, speakers can be identified with a smaller preparation cost by utilizing an omnidirectional camera; however, hundreds of hours of training data are required. In this study, we propose a speaker‐identification method that uses only tens of minutes of moving images as training data. Evaluation results demonstrate that the proposed method is able to identify speakers with maximum and average success rates of 93.0% and 87.2%, respectively. © 2022 Institute of Electrical Engineers of Japan. Published by Wiley Periodicals LLC.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call