Abstract

In this work, we present a novel visual perception-inspired local description approach as a preprocessing step for deep learning. With the ongoing growth of visual data, efficient image descriptor methods are becoming more and more important. Several local point-based description methods were defined in the past decades before the highly accurate and popular deep learning methods such as convolutional neural networks (CNNs) emerged. The method presented in this work combines a novel local description approach inspired by the Gestalt laws with deep learning, and thereby, it benefits from both worlds. To test our method, we conducted several experiments on different datasets of various forensic application domains, e.g., makeup-robust face recognition. Our results show that the proposed approach is robust against overfitting and only little image information is necessary to classify the image content with high accuracy. Furthermore, we compared our experimental results to state-of-the-art description methods and found that our method is highly competitive. For example it outperforms a conventional CNN in terms of accuracy in the domain of makeup-robust face recognition.

Highlights

  • Deep learning is a predominant method in visual information retrieval today

  • We present a novel local description approach inspired by the Gestalt laws as a preprocessing step for deep learning

  • We proposed a novel visual perception-inspired local description approach as a preprocessing step for deep learning

Read more

Summary

Introduction

Deep learning is a predominant method in visual information retrieval today. Though typically applied on the pixel level, there are good reasons to combine deep learning methods with signal processing-based feature extraction methods in order to create a powerful visual media analysis scheme. We present a novel local description approach inspired by the Gestalt laws as a preprocessing step for deep learning. We decided to fuse our method with the CNN approach to build an even more powerful image recognition system. It turns out that feeding the output of our method into a CNN makes the image recognition process more accurate and robust against overfitting for our application domain of makeup-robust face recognition. This is due to the heavily compressed and content-rich image description produced by our approach

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call