Abstract

Our goal is to design architectures that retain the groundbreaking performance of Convolutional Neural Networks (CNNs) for landmark localization and at the same time are lightweight, compact and suitable for applications with limited computational resources. To this end, we make the following contributions: (a) we are the first to study the effect of neural network binarization on localization tasks, namely human pose estimation and face alignment. We exhaustively evaluate various design choices, identify performance bottlenecks, and more importantly propose multiple orthogonal ways to boost performance. (b) Based on our analysis, we propose a novel hierarchical, parallel and multi-scale residual architecture that yields large performance improvement over the standard bottleneck block while having the same number of parameters, thus bridging the gap between the original network and its binarized counterpart. (c) We perform a large number of ablation studies that shed light on the properties and the performance of the proposed block. (d) We present results for experiments on the most challenging datasets for human pose estimation and face alignment, reporting in many cases state-of-the-art performance. (e) We further provide additional results for the problem of facial part segmentation. Code can be downloaded from https://www.adrianbulat.com/binary-cnn-landmarks.

Highlights

  • THIS work is on localizing a predefined set of fiducial points on objects of interest which can typically undergo non-rigid deformations like the human body or face

  • Work based on Convolutional Neural Networks (CNNs) has revolutionized landmark localization, demonstrating results of remarkable accuracy even on the most challenging datasets for human pose estimation [1], [2], [3] and face alignment [4]

  • They behave as simple filters deciding when a certain value should be passed or not. This allows the input to pass through the layer with little modifications, sometimes blocking “good features” and hurting the overall performance by a noticeable amount. This is problematic for the task of landmark localization, where a high level of detail is required for successful localization

Read more

Summary

Introduction

THIS work is on localizing a predefined set of fiducial points on objects of interest which can typically undergo non-rigid deformations like the human body or face. Work based on Convolutional Neural Networks (CNNs) has revolutionized landmark localization, demonstrating results of remarkable accuracy even on the most challenging datasets for human pose estimation [1], [2], [3] and face alignment [4]. This work is on highly accurate and robust yet efficient and lightweight landmark localization using binarized CNNs. Our work is inspired by recent results of binarized CNN architectures on image classification [5], [6]. Our work is inspired by recent results of binarized CNN architectures on image classification [5], [6] Contrary to these works, we are the first to study the effect of neural network binarization on fine-grained tasks like landmark localization. We are interested only in the latter one which was designed to reduce the number of parameters and keep the

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.