Abstract

Convolutional neural networks (CNN) have become due to their outstanding performance in the past few years rapidly the standard approach when it comes to processing 2D data as these can be found in the image recognition and classification domain. Recent research shows that CNN models can handle 1D data, such as temporal sequences (e.g., speech and text), with a similar high performance as well. This fact motivated our present idea to apply convolutional networks for modeling human semantic trajectories and predicting future locations. Our work consists of three parts. The first part evaluates the performance of a standard spatial CNN in comparison with a vanilla feed-forward, a recurrent and a long short-term memory network (LSTM) at two different semantic representation levels. In the second part, we explore in depth the impact of the kernel size and propose a multi-channel convolutional approach based on kernels of varied size. Finally, part three investigates the depthwise factorization of the convolutional layer with regard to training time and test accuracy. Altogether, it can be shown that convolutional networks are able to outperform the competition, with the channel number as well as the kernel size being the most significant hyperparameters.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call