Abstract

While recent advances in deep learning have led to significant improvements in facial expression classification (FEC), a major challenge that remains a bottleneck for the widespread deployment of such systems is their high architectural and computational complexities. This is especially challenging given the operational requirements of various FEC applications, such as safety, marketing, learning, and assistive living, where real-time requirements on low-cost embedded devices is desired. Motivated by this need for a compact, low latency, yet accurate system capable of performing FEC in real-time on low-cost embedded devices, this study proposes EmotionNet Nano, an efficient deep convolutional neural network created through a human-machine collaborative design strategy, where human experience is combined with machine meticulousness and speed in order to craft a deep neural network design catered toward real-time embedded usage. To the best of the author’s knowledge, this is the very first deep neural network architecture for facial expression recognition leveraging machine-driven design exploration in its design process, and exhibits unique architectural characteristics such as high architectural heterogeneity and selective long-range connectivity not seen in previous FEC network architectures. Two different variants of EmotionNet Nano are presented, each with a different trade-off between architectural and computational complexity and accuracy. Experimental results using the CK + facial expression benchmark dataset demonstrate that the proposed EmotionNet Nano networks achieved accuracy comparable to state-of-the-art FEC networks, while requiring significantly fewer parameters. Furthermore, we demonstrate that the proposed EmotionNet Nano networks achieved real-time inference speeds (e.g., >25 FPS and >70 FPS at 15 and 30 W, respectively) and high energy efficiency (e.g., >1.7 images/sec/watt at 15 W) on an ARM embedded processor, thus further illustrating the efficacy of EmotionNet Nano for deployment on embedded devices.

Highlights

  • Facial expression classification (FEC) is an area in computer vision that has benefited significantly from the rapid advances in machine learning, which has enabled data collections comprising a diversity of facial expressions captured of different individuals to be leveraged to learn classifiers for differentiating between different facial expression types

  • Motivated by the desire to design deep neural network architectures catered for real-time embedded facial expression recognition, in this study we explore the efficacy of leveraging a human-machine collaborative design strategy that leverages human experience and ingenuity with the raw speed and meticulousness of machine driven design exploration, in order to find the optimal balance between accuracy and architectural and computational complexity

  • It can be observed that both EmotionNet Nano-A and Nano-B networks achieve strong classification accuracy, with EmotionNet Nano-A in particular achieving comparable accuracy with the highest-performing state-of-theart networks that are more than a magnitude larger

Read more

Summary

Introduction

Facial expression classification (FEC) is an area in computer vision that has benefited significantly from the rapid advances in machine learning, which has enabled data collections comprising a diversity of facial expressions captured of different individuals to be leveraged to learn classifiers for differentiating between different facial expression types. Even though the performance of deep learning-based FEC systems continue to rise, widespread deployment of such systems is limited, with one of the biggest hurdles being the high architectural and computational complexities of the deep neural networks that drive such systems This hurdle is limiting for real-time embedded scenarios, where low latency operation is required on the low-cost embedded devices. The assistive devices must leverage small, low-cost, embedded processors, yet provide low latency to enable real-time feedback to the user Another example is in-car driver monitoring (Jeong and Ko, 2018), where a FEC system would record the driver and determine their current mental state, and warn them if their awareness level is deteriorating. For those relying on software assistance for social purposes, information is required at no delay in order to keep a conversation alive and not cause discomfort for both parties

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call