Abstract

Biometric recognition has been an extensively researched field in recent years due to the growth of its applications in daily activities. State of the art work in biometrics proposes the implementation of multimodal systems that employ one or more traits to increase the security of the system since it is more difficult for an impostor to acquire, falsify or forge multiple samples of different traits from an enrolled user. In this paper, we propose the implementation of a Deep Learning bimodal network that combines voice and face modalities. Voice feature extraction was done with a SincNet arquitecture and face image features were extracted with a set of convolutional layers. The feature vectors of both modalities are combined within the network with two methods: averaging or concatenation. The averaged/concatenated vector is further processed with a fully connected layer to output a bimodal vector that contains discriminatory information of an individual. The bimodal vector is used with a fully connected layer with the softmax function to perform the identification task. The verification task is performed by matching the bimodal vector with a template to obtain a score that must be used to either accept or reject an users identity. We compared the results yielded by both fusion methods implemented in our proposed network for both recognition tasks. Both methods achieved an accuracy as high as 99 % in the identification task and an Equal Error Rate (EER) as low as 0.14 % for verification. These results were obtained by combining BIOMEX-DB and VidTimit databases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call