A collaborative multi-task learning framework is proposed for integrating 3D face alignment and head pose estimation using RGB and sparse depth with a frontal face constraint. Most previous works involve using dense depth images to exploit spatial information, but the acquisition of high-resolution dense depth images is not always available due to the limitation of the image capture device. Besides, the existing multi-task learning methods have the problem of insufficient learning since they have not bridged the semantic gap between both tasks. In this letter, integrating 3D face alignment and head pose estimation into a collaborative multi-task learning framework is proposed, which is supervised by a frontal face constraint. In addition, sparse depth is incorporated into the network to provide additional facial geometrical information. A UV position map is generated to conduct face alignment, and a head pose vector is output to predict the head orientation. The performance on AFLW2000-3D dataset is evaluated, and the minimum quantization errors are achieved for both tasks when compared with state-of-the-art methods, demonstrating the superiority of the proposed method.