Abstract

This paper presents ear recognition models constructed with Deep Residual Networks (ResNet) of various depths. Due to relatively limited amounts of ear images we propose three different transfer learning strategies to address the ear recognition problem. This is done either through utilizing the ResNet architectures as feature extractors or through employing end-to-end system designs. First, we use pretrained models trained on specific visual recognition tasks, inititalize the network weights and train the fully-connected layer on the ear recognition task. Second, we fine-tune entire pretrained models on the training part of each ear dataset. Third, we utilize the output of the penultimate layer of the fine-tuned ResNet models as feature extractors to feed SVM classifiers. Finally, we build ensembles of networks with various depths to enhance the overall system performance. Extensive experiments are conducted to evaluate the obtained models using ear images acquired under constrained and unconstrained imaging conditions from the AMI, AMIC, WPUT and AWE ear databases. The best performance is obtained by averaging ensembles of fine-tuned networks achieving recognition accuracy of 99.64%, 98.57%, 81.89%, and 67.25% on the AMI, AMIC, WPUT, and AWE databases, respectively. In order to facilitate the interpretation of the obtained results and explain the performance differences on each ear dataset we apply the powerful Guided Grad-CAM technique, which provides visual explanations to unravel the black-box nature of deep models. The provided visualizations highlight the most relevant and discriminative ear regions exploited by the models to differentiate between individuals. Based on our analysis of the localization maps and visualizations we argue that our models make correct prediction when considering the geometrical structure of the ear shape as a discriminative region even with a mild degree of head rotations and the presence of hair occlusion and accessories. However, severe head movements and low contrast images have a negative impact of the recognition performance.

Highlights

  • Personal identification based on biological characteristics, including physiological or behavioral modalities, has established itself as the most convenient means of reliable and fast recognition of individuals

  • In order to build confidence in the decisions made by our recognition system, we provide visual explanations, interpret what deep learning models learn and visualize how they make their predictions

  • In this paper we present three different transfer learning strategies applied to deep Residual Networks (ResNet) architectures to learn discriminative features from ear images

Read more

Summary

Introduction

Personal identification based on biological characteristics, including physiological (e.g., face, iris, retina, fingerprints, etc.) or behavioral (e.g., voice, signature, gait, gesture, etc.) modalities, has established itself as the most convenient means of reliable and fast recognition of individuals. Biometric systems based on the physiological characteristics are found to have a high level of reliability due to their robustness against stress effects and being relatively more stable throughout the life of individuals. Research has opened up new biometrics for personal identification, such as the ear shape. A number of studies has been conducted to explore the unique characteristics of human ears as an appealing alternative for or addition to common biometrics. Compared to conventional biometric modalities such as faces and fingerprints, ears provide some.

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call