Abstract

One of the major AI research fields is natural language processing by speech recognition. IBM Watson is one of the representative tools for this speech recognition system which can automatically generate not only the recognized words from voice signal but also the speaker ID and timing information of each words including the starting time and the ending time. However, IBM Watson is not enough good and easily generate incorrect recognition output when there are some noise in the audio signal, especially for movies where background music and special sound effects are incorporated together. There were some studies to solve this problem using the IBM Watson API based on the assumption that speaker pronunciation time DB was already implemented properly. But, it is not easy to make speaker pronunciation time DB and it requires big cost. In this paper, to resolve this problem of speaker pronunciation time DB, we introduce an efficient method to implement and update the speaker pronunciation time DB in real time.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.