A cloud-based middleware for multi-modal interaction services and applications

Bilgin Avenoğlu,Vincent J Koeman,Koen V Hindriks

doi:10.3233/ais-220161

Bilgin Avenoğlu, Vincent J Koeman + Show 1 more

Open Access

https://doi.org/10.3233/ais-220161

Copy DOI

Abstract

Smart devices, such as smart phones, voice assistants and social robots, provide users with a range of input modalities, e.g., speech, touch, gestures, and vision. In recent years, advancements in processing of these input channels enable more natural interaction (e.g., automated speech, face, and gesture recognition, dialog generation, emotion expression etc.) experiences for users. However, there are several important challenges that need to be addressed to create these user experiences. One challenge is that most smart devices do not have sufficient computing resources to execute the Artificial Intelligence (AI) techniques locally. Another challenge is that users expect responses in near real-time when they interact with these devices. Moreover, users also want to be able to seamlessly switch between devices and services any time and from anywhere and expect personalized and privacy-aware services. To address these challenges, we design and develop a cloud-based middleware (CMI) which helps to develop multi-modal interaction applications and easily integrate applications to AI services. In this middleware, services developed by different producers with different protocols and smart devices with different capabilities and protocols can be integrated easily. In CMI, applications stream data from devices to cloud services for processing and consume the results. It supports data streaming from multiple devices to multiple services (and vice versa). CMI provides an integration framework for decoupling the services and devices and enabling application developers to concentrate on “interaction” instead of AI techniques. We provide simple examples to illustrate the conceptual ideas incorporated in CMI.

Full Text