Current traits in smart assistants and smart home automation are currently attracting the interests of customers and researchers. Speech enabled smart virtual assistants (named smart speakers) offer a wide sort of network orientated services and, in some instances, can connect to smart environments, accordingly improving them with new and effective user interfaces. But, such gadgets also reveal new desires and a few weaknesses. Specially, they constitute faceless and blind assistants, not able to reveal a face, and therefore an emotion, and not able to ‘see’ the person. For this reason, the interaction is impaired and, in a few cases, ineffective. One of the goals of artificial intelligence is the realization of natural talk among humans and machines. In latest years, the dialogue systems, known as interactive conversational systems are the fastest developing vicinity in AI. Many companies have used the dialogue systems technology to set up various sorts of VPAs based on their application areas, including Apple’s Siri, Amazon Alexa, etc. To triumph over such troubles, in this project we combine a number of advance techniques. The proposed Assistant is powerful and resource-efficient, interactive and customizable. We use the multi-modal dialogue structures and screen projection which process two or more combined user input modes, which includes speech, image, video, touch, manual gestures, body movements with the intention to design the NextGeneration of VPAs. The new version of VPAs can be used to increase the interaction between humans and machines by the usage of distinctive technologies, consisting of gesture recognition, image/video recognition, speech recognition, dialogue system, conversational information base, and the overall knowledge base. Furthermore, this VPA device can be utilized in different areas such as education, medical assistance, disabilities systems, home automation and security access control.
Read full abstract