Lercpose: Learned Ranking and Contrastive Loss for Robust Head Pose Estimation
In this paper, we present a method for robust head pose estimation via carefully designed loss functions. We propose that exploiting the relationship between the predicted yaw, pitch, roll values and the features of a head pose estimation network is crucial to make robust predictions. With the aim of achieving the above criteria, we formulate novel loss functions that assure robustness and generalization of the network predictions. We report results on public datasets namely, AFLW2000-3D and BIWI demonstrating that the proposed method outperforms the state-of-the-art 2-d head pose estimation algorithms by a margin of up to 10%. We will release the source code at https://github.com/soni-H/lercpose/ upon acceptance of the paper.
- Conference Article
10
- 10.1109/afgr.2008.4813466
- Sep 1, 2008
We developed a fast and robust head pose and gaze estimation system. This system can detect facial feature points and estimate 3D pose angles and gaze direction in various conditions including changes in facial expression, partial occlusion, etc. The system needs only one face image as input and doesn't need any special devices such as blinking LED or stereo camera. Moreover, no calibration process is needed. It shows 95% of head pose estimation accuracy and 81% of gaze estimation accuracy (when the error margin is 15 degrees). The processing time is approximately 15ms/frame (Pentium4 3.2 GHz). Acceptable range of facial pose is within +/- 60 degrees in yaw (left-right) and within +/- 30 degrees in pitch (up-down).
- Book Chapter
8
- 10.1007/978-3-642-37484-5_14
- Jan 1, 2013
Real-time accurate head pose estimation is required for several applications. Methods based on 2D images might not provide accurate and robust head pose measurements due to large head pose variations and illumination changes. Robust and accurate head pose estimation can be achieved by integrating intensity and depth information. In this paper we introduce a head pose estimation system that employs random forests and tensor regression algorithms. The former allow the modeling of large head pose variations using large sets of training data, while the latter allow the estimation of more accurate head pose parameters. The combination of the above mentioned methods results in more robust and accurate predictions for large head pose variations. We also study the fusion of different sources of information (intensity and depth images) to determine how their combination affects the performance of a head pose estimation system. The efficiency of the proposed framework is tested on the Biwi Kinect Head Pose dataset, where it is shown that the proposed methodology outperforms typical random forests.
- Conference Article
4
- 10.1145/2522848.2522864
- Dec 9, 2013
Head pose is an important indicator of a person's attention, gestures, and communicative behavior with applications in human-computer interaction, multimedia, and vision systems. Robust head pose estimation is a prerequisite for spontaneous facial biometrics-related applications. However, most previous head pose estimation methods do not consider the facial expression and hence are more likely to be influenced by the facial expression. In this paper, we develop a saliency-guided 3D head pose estimation on 3D expression models. We address the problem of head pose estimation based on a generic model and saliency guided segmentation on a Laplacian fairing model. We propose to perform mesh Laplacian fairing to remove noise and outliers on the 3D facial model. The salient regions are detected and segmented from the model. The salient region Iterative Closest Point (ICP) then register the test face model with the generic head model. The algorithms for pose estimation are evaluated through both static and dynamic 3D facial databases. Overall, the extensive results demonstrate the effectiveness and accuracy of our approach.
- Book Chapter
3
- 10.1007/3-540-45453-5_32
- Jan 1, 2001
In this paper, a robust head pose estimation algorithm is presented. In contrast with other approaches, the proposed algorithm adopts textured polygonal model generated from two orthogonal views for accurate head pose estimation. To achieve robust estimation under varying illumination, local correlation coefficient is taken as the similarity measure. The tracking is further improved by modeling head dynamics with Kalman filtering. Preliminary simulation results indicate that the proposed algorithm can reliably estimate the head pose under large rotation angles with varying illumination, and the average estimation error are all below 4 degrees.KeywordsFeature PointGesture RecognitionTracking SpeedGradient SearchInverse Distance Weighted MethodThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
- Research Article
46
- 10.1109/tmm.2021.3114551
- Jan 1, 2022
- IEEE Transactions on Multimedia
Recently, applying deep learning to no-reference image quality assessment (NR-IQA) has received significant attention. Especially in the last five years, an increasing interest has been drawn to the studies of rank learning since it can help mitigate the problem of small IQA datasets. However, on one hand, existing rank learning is not suitable for the authentically distorted images due to the lack of generated rank samples. On the other hand, the output of existing rank loss functions is uncontrollable, resulting in reduced performance. Motivated by these two limitations, we propose a novel rank learning based NR-IQA method, termed controllable list-wise ranking IQA (CLRIQA) in this paper. To be specific, we first present an imaging-heuristic approach, in which the over- and under-exposure is formulated as an inverse of the Weber-Fechner law, and fusion strategy and compression are adopted, to simulate the authentic distortion and generate the rank image samples. These samples are label-free yet associated with quality ranking information. Then we design a controllable list-wise ranking (CLR) loss function by setting an upper and lower bound of rank range and introducing an adaptive margin to tune rank interval. Finally, both the generated rank samples and proposed CLR are used to pre-train a convolutional neural network. Moreover, to obtain a more accurate prediction model, we take advantage of the IQA datasets to fine-tune the pre-trained network further. Various experiments are conducted on the IQA benchmark datasets, and experimental results demonstrate the effectiveness of the proposed CLRIQA method. The source code and network model can be downloaded at the following web address: <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><uri>https://github.com</uri> <inline-formula><tex-math notation="LaTeX">$/$</tex-math></inline-formula> GZHU-DVL <inline-formula><tex-math notation="LaTeX">$/$</tex-math></inline-formula> CLRIQA</i> .
- Conference Article
9
- 10.1145/1180995.1181026
- Nov 2, 2006
We developed a fast and robust head pose and gaze estimation system. This system can detect facial points and estimate 3D pose angles and gaze direction under various conditions including facial expression changes and partial occlusion. We need only one face image as input and do not need special devices such as blinking LEDs or stereo cameras. Moreover, no calibration is needed. The system shows a 95% head pose estimation accuracy and 81% gaze estimation accuracy (when the error margin is 15 degrees). The processing time is about 15 ms/frame (Pentium4 3.2 GHz). Acceptable range of facial pose is within a yaw (left-right) of ±60 degrees and within a pitch (up-down) of ±30 degrees.
- Research Article
- 10.3745/kipstb.2007.14-b.4.311
- Aug 31, 2007
- The KIPS Transactions:PartB
본 논문에서는 강건한 얼굴 포즈 추정과 실시간 표정제어가 가능한 비전 기반 3차원 얼굴 모델의 자동 표정 생성 방법 및 시스템을 제안한다. 기존의 비전 기반 3차원 얼굴 애니메이션에 관한 연구는 얼굴의 움직임을 나타내는 모션 추정을 반영하지 못하고 얼굴 표정 생성에 초점을 맞추고 있다. 그러나, 얼굴 포즈를 정확히 추정하여 반영하는 작업은 현실감 있는 얼굴 애니메이션을 위해서 중요한 이슈로 인식되고 있다. 본 연구 에서는 얼굴 포즈추정과 얼굴 표정제어가 동시에 가능한 통합 애니메이션 시스템을 제안 하였다. 제안된 얼굴 모델의 표정 생성 시스템은 크게 얼굴 검출, 얼굴 모션 추정, 표정 제어로 구성되어 있다. 얼굴 검출은 비모수적 HT 컬러 모델과 템플릿 매칭을 통해 수행된다. 검출된 얼굴 영역으로부터 얼굴 모션 추정과 얼굴 표정 제어를 수행한다. 얼굴 모션 추정을 위하여 3차원 실린더 모델을 검출된 얼굴 영역에 투영하고 광류(optical flow) 알고리즘을 이용하여 얼굴의 모션을 추정하며 추정된 결과를 3차원 얼굴 모델에 적용한다. 얼굴 모델의 표정을 생성하기 위해 특징점 기반의 얼굴 모델 표정 생성 방법을 적용한다. 얼굴의 구조적 정보와 템플릿 매칭을 이용하여 주요 얼굴 특징점을 검출하며 광류 알고리즘에 의하여 특징점을 추적한다. 추적된 특징점의 위치는 얼굴의 모션 정보와 표정 정보의 조합으로 이루어져있기 때문에 기하학적 변환을 이용하여 얼굴의 방향이 정면이었을 경우의 특징점의 변위인 애니메이션 매개변수(parameters)를 계산한다. 결국 얼굴 표정 복제는 두 개의 정합과정을 통해 수행된다. 애니메이션 매개변수 3차원 얼굴 모델의 주요 특징점(제어점)의 이동은 획득된 애니메이션 매개변수를 적용하여 수행하며, 정점 주위의 부가적 정점의 위치는 RBF(Radial Basis Function) 보간법을 통해 변형한다. 실험결과 본 논문에서 제안된 비전기반 애니메이션 시스템은 비디오 영상으로부터 강건한 얼굴 포즈 추정과 얼굴의 표정변화를 잘 반영하여 현실감 있는 애니메이션을 생성함을 입증할 수 있었다. This paper presents vision-based 3D facial expression animation technique and system which provide the robust 3D head pose estimation and real-time facial expression control. Many researches of 3D face animation have been done for the facial expression control itself rather than focusing on 3D head motion tracking. However, the head motion tracking is one of critical issues to be solved for developing realistic facial animation. In this research, we developed an integrated animation system that includes 3D head motion tracking and facial expression control at the same time. The proposed system consists of three major phases: face detection, 3D head motion tracking, and facial expression control. For face detection, with the non-parametric HT skin color model and template matching, we can detect the facial region efficiently from video frame. For 3D head motion tracking, we exploit the cylindrical head model that is projected to the initial head motion template. Given an initial reference template of the face image and the corresponding head motion, the cylindrical head model is created and the foil head motion is traced based on the optical flow method. For the facial expression cloning we utilize the feature-based method, The major facial feature points are detected by the geometry of information of the face with template matching and traced by optical flow. Since the locations of varying feature points are composed of head motion and facial expression information, the animation parameters which describe the variation of the facial features are acquired from geometrically transformed frontal head pose image. Finally, the facial expression cloning is done by two fitting process. The control points of the 3D model are varied applying the animation parameters to the face model, and the non-feature points around the control points are changed by use of Radial Basis Function(RBF). From the experiment, we can prove that the developed vision-based animation system can create realistic facial animation with robust head pose estimation and facial variation from input video image.
- Conference Article
36
- 10.1109/fg.2017.90
- May 1, 2017
Accurate and robust 3D head pose estimation is important for face related analysis. Though high accuracy has been achieved by previous works based on 3D morphable model (3DMM), their performance drops with extreme head poses because such models usually only represent the frontal face region. In this paper, we present a robust head pose estimation framework by complementing a 3DMM model with an online 3D reconstruction of the full head providing more support when handling extreme head poses. The approach includes a robust online 3DMM fitting step based on multi-view observation samples as well as smooth and face-neutral synthetic samples generated from the reconstructed 3D head model. Experiments show that our framework achieves state-of-the-art pose estimation accuracy on the BIWI dataset, and has robust performance for extreme head poses when tested on natural interaction sequences.
- Conference Article
3
- 10.1109/iranianmvip.2013.6780014
- Sep 1, 2013
Head pose estimation is an important preprocessing step in many computer vision and pattern recognition systems such as face recognition. Compared to face detection and recognition which have been wildly used in computer vision systems, head pose estimation has fewer proposed systems and generic solutions. In this paper we propose a novel approach for robust human head pose estimation using contourletSD transform. At first we apply contourletSD transform on images, then we create feature vector by computing gray-level co-occurrence matrix (GLCM) from each contourlet sub-band. Linear discriminant analysis (LDA) is used for dimensionality reduction of feature vector. Finally, we classify obtained feature vectors using Support Vector Machine (SVM), K-nearest Neighbor (KNN) and hierarchical decision tree (HDT) classifiers, separately. Experimental results on FERET database demonstrate robustness of the proposed method than previous methods in human head pose estimation.
- Conference Article
5
- 10.1109/ijcb.2011.6117529
- Oct 1, 2011
In this paper; a new ℓ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> -graph regularized semi- supervised manifold learning (LRSML) method is proposed for robust human head pose estimation problem. The manifold is constructed under Biased Manifold Embedding (BME) framework which computes a biased neighborhood of each point in the feature space with ℓ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> -graph regularization. The construction process of ℓ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> -graph is assumed to be unsupervised without harnessing any data label information and uncovers the underlying ℓ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> -norm driven sparse reconstruction relationship of each sample. The LRSML is more robust to noises and has the potential to convey more discriminative information compared to conventional manifold learning methods. Furthermore, utilizing both labeled and unlabeled information improve the pose estimation accuracy and generalization capability. Numerous experiments show the superiority of our method over several current state of the art methods on publicly available dataset.
- Conference Article
5
- 10.1109/iccvw.2009.5457450
- Sep 1, 2009
This paper presents a head pose and facial feature estimation technique that works over a wide range of pose variations without a priori knowledge of the appearance of the face. Using simple LK trackers, head pose is estimated by Levenberg-Marquardt (LM) pose estimation using the feature tracking as constraints. Factored sampling and RANSAC are employed to both provide a robust pose estimate and identify tracker drift by constraining outliers in the estimation process. The system provides both a head pose estimate and the position of facial features and is capable of tracking over a wide range of head poses.
- Conference Article
99
- 10.1109/icpr.2004.87
- Aug 23, 2004
Head tracking and pose estimation are usually considered as two sequential and separate problems: pose is estimated on the head patch provided by a tracking module. However, precision in head pose estimation is dependent on tracking accuracy which itself could benefit from the head orientation knowledge. Therefore, this work considers head tracking and pose estimation as two coupled problems in a probabilistic setting. Head pose models are learned and incorporated into a mixed-state particle filter framework for joint head tracking and pose estimation. Experimental results on real sequences show the effectiveness of the method in estimating more stable and accurate pose values.
- Conference Article
67
- 10.1109/icpr.2004.1333754
- Jan 1, 2004
Head tracking and pose estimation are usually considered as two sequential and separate problems: pose is estimated on the head patch provided by a tracking module. However, precision in head pose estimation is dependent on tracking accuracy which itself could benefit from the head orientation knowledge. Therefore, this work considers head tracking and pose estimation as two coupled problems in a probabilistic setting. Head pose models are learned and incorporated into a mixed-state particle filter framework for joint head tracking and pose estimation. Experimental results on real sequences show the effectiveness of the method in estimating more stable and accurate pose values.
- Conference Article
2
- 10.1109/icip.2014.7025679
- Oct 1, 2014
It is a crucial problem to estimate head pose automatically and robustly in many visual applications. In order to solve this problem, we propose in this work a novel and simple face image descriptor (i.e., block energy map) and, based on which, complete schemes for automatic and robust head pose estimation using support vector regression and Gaussian processes regression, respectively. The proposed descriptor and schemes contrast with many of the previously published ones that rely on manual assistance to locate the face position in an input image and/or are sensitive to factors such as identity and misalignment. Experimental results demonstrate the superiority of the proposed descriptor and schemes.
- Research Article
19
- 10.1016/j.patcog.2011.06.007
- Jul 19, 2011
- Pattern Recognition
Model free head pose estimation using stereovision