MMDP: A Mobile-IoT Based Multi-Modal Reinforcement Learning Service Framework

Puming Wang,Xue Li,Jintao Li,Xiaokang Zhou,Laurence T Yang

doi:10.1109/tsc.2020.2964663

Abstract

With the development of GPS technology, a new Mobile Internet of Things (M-IoT) is emerging, which perceives the city's rhythm and pulse day and night to collect a large scale of city data. It is urgent to innovate M-IoT service system for these large-scale and heterogeneous data. To cope with the problem, this article proposes a Mobile-IoT based multi-modal reinforcement learning service framework from data perspective, which has three highlights, i) Developing Action-aware High-order Transition Tensor ( $AHTT$ A H T T ) to fuse the heterogeneous data from M-IoTs in a unified form. ii) Developing Multi-modal Markov Decision Process ( $MMDP$ M M D P ) to model the multi-modal reinforcement learning for M-IoT service framework. iii) Developing Tensor Policy Iteration algorithm ( $TPIA$ T P I A ) to solve the optimal tensor policy. Due to using tensor keeps the multi-modal relations of the context information in the process of solving the optimal policy. The proposed M-IoT service system provides more personalized service for taxi drivers. The experiment results shows that most taxi drivers earn more revenue according to the tensor policy.

Full Text