Abstract. Visual perception and human body recognition are fundamental capabilities required for effective and safe interactions between artificial intelligence (AI), computer vision, and humans in real-world scenarios. Recent groundbreaking developments in AI and computer vision have resulted in major advancements in human body recognition technology. However, research in human body recognition is still in the early stages of the product lifecycle. Identifying the three-dimensional locations of the joints in the human body from pictures or videos is known as 3D posture estimation. Although it is widely used in areas like human motion analysis and robotics, it continues to be a difficult task due to challenges such as depth ambiguity and the scarcity of robust datasets. Over the past decade, numerous methods have been developed, many of which are based on deep learning, significantly improving the performance of existing benchmarks. A comprehensive literature review of this field is crucial for future development. However, in nowadays,more and more such research has mainly concentrated on traditional techniques, requirement for a comprehensive examination of tools based on deep learning. This paper delivers a thorough overview of current deep learning-based 3D pose estimation algorithms, outlining their advantages and limitations while providing a detailed understanding of the field. It also explores commonly used benchmark datasets and methods for analyzing human poses in unlabeled field images, providing a thorough comparative analysis. Finally, insights are provided to aid in the design of future models and algorithms.
Read full abstract