Abstract

In this paper, a spoken command and control interface that acquires spoken language through demonstrations from the user is discussed. The user can train the system by uttering a command and subsequently demonstrating the required action through an alternative interface. From the demonstration, a bag of semantic concepts representation that represents which semantic concepts are present in the demonstration is extracted. In the previous work, we have proposed a method for learning words for these concepts by linking the bag of semantic concepts representation to a bag of features representation of the acoustics. In this method, the order in which the words occur is lost. However, in many cases, the order in which the words occur is important to be able to determine the correct action. In this paper, the vocabulary acquisition based on nonnegative matrix factorization is jointly trained with a hidden Markov model (HMM), making it possible to use the bag of concepts representation as a weak supervision for HMM learning. This model can better utilize the timing information to improve the results and the order in which the words occur is retained making it possible to learn vocabulary and grammar. The proposed system is tested on several command and control tasks and it is shown that for unimpaired speech the resulting system outperforms the system solely based on vocabulary acquisition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.