From science fiction to science fact: A Smart-House interface using speech technology and a photo-realistic avatar

T J Moir,G L Filho

doi:10.1109/mmvip.2008.4749555

Abstract

This paper explores the problems of speech recognition in a (sometimes) noisy environment. An adaptive acoustic beamformer is proposed based on the Griffiths-Jim method and a hot-spot where speech can be received within a geometric defined boundary and rejected outside of it will be shown to give a certain amount of noise immunity and improve the signal-to-noise ratio for the second stage, which is the speech recognition engine. The recognition engine used has a limited vocabulary which gives rise to an excellent hit-rate and less training than unlimited vocabulary. Limited vocabulary is sufficient for a good many applications where devices are switched in a Boolean form for lighting, TV, radio etc. In addition to the speech recognition, good quality speech synthesis is also necessary to feedback information about the house to the end-user. The technology here has improved vastly within the last decade and will be shown that by using a head and shoulders avatar that is both photo-realistic and with appealing personality, that the experience of a speech interface is vastly enhanced. The paper explores these technologies and investigate the convergence of many of them in the current Massey smart-office.

Full Text