Convolutional Neural Networks for User Identification Based on Motion Sensors Represented as Images

Cezara Benegui,Radu Tudor Ionescu

doi:10.1109/access.2020.2984214

Cezara Benegui, Radu Tudor Ionescu

Open Access

https://doi.org/10.1109/access.2020.2984214

Copy DOI

Abstract

In this paper, we propose a deep learning approach for smartphone user identification based on analyzing motion signals recorded by the accelerometer and the gyroscope, during a single tap gesture performed by the user on the screen. We transform the discrete 3-axis signals from the motion sensors into a gray-scale image representation which is provided as input to a convolutional neural network (CNN) that is pre-trained for multi-class user classification. In the pre-training stage, we benefit from different users and multiple samples per user. After pre-training, we use our CNN as feature extractor, generating an embedding associated to each single tap on the screen. The resulting embeddings are used to train a binary Support Vector Machines (SVM) model in a few-shot user identification setting, i.e. requiring only 20 taps on the screen during the registration phase. We compare our identification system based on CNN features with two baseline systems, one that employs handcrafted features and another that employs recurrent neural networks (RNN) features. All systems are based on the same classifier, namely SVM. To pre-train the CNN and the RNN models for multi-class user classification, we use a different set of users than the set used for few-shot user identification, ensuring a realistic scenario. The empirical results demonstrate that our CNN model yields a top accuracy of 90.75% in multi-class user classification and a top accuracy of 96.72% in few-shot user identification. We thus believe that our system is ready for practical use, having a better generalization capacity than both baselines. We also conduct experiments showing that the binary SVM provides better results than the one-class SVM, although the negative samples added during training do not belong to the attackers (known only at test time).

Highlights

Nowadays, common mobile device authentication mechanisms such as PINs, graphical passwords and fingerprint scans offer limited security
In this paper, we have presented an approach based on pre-trained convolutional neural network (CNN) features that can identify users by analyzing data recorded by motion sensors incorporated in mobile devices, while the user performs a single tap gesture on the screen
Our approach is based on transforming the discrete signals from motion sensors into a gray-scale image representation which is provided as input to a convolutional neural network (CNN) that is pre-trained on a multi-class user classification task

Summary

Introduction

Common mobile device authentication mechanisms such as PINs, graphical passwords and fingerprint scans offer limited security. These mechanisms are susceptible to guessing (or spoofing in the case of fingerprint scans) and to side channel attacks [1] such as smudge [2], reflection [3], [4] and video capture attacks [5]–[7]. A fundamental limitation of PINs, passwords, and fingerprint scans is that these mechanisms require explicit user interaction. One-time or continuous user identification based on the data collected by the motion sensors of a mobile device is an actively studied task [8]–[22], that emerged after the integration of motion sensors into commonly used mobile devices

Objectives

Methods

Findings

Conclusion