Abstract

Human pose estimation (HPE) is a fundamental problem in computer vision, and it is also the basis of applied research in many fields, which can be used for virtual fitting, fashion analysis, behavior analysis, human-computer interaction, and auxiliary pedestrian detection. The purpose of HPE is to use image processing and machine learning methods to find out the positions and types of joints of people in pictures. There are two main difficulties in HPE. First, the complex human images make the model need to learn a highly nonlinear mapping relationship, and the learning of this mapping relationship is extremely difficult. Second, the highly nonlinear mapping relationship needs to be learned by using a model with high complexity, and a model with high complexity requires a lot of computational overhead. In this context, this paper studies the 3D HPE based on the transformer. We introduce the research status of HPE at home and abroad and provide a theoretical basis for designing the transformer 3D HPE model in this paper. We introduce the technical principle and optimization scheme of CNN and transformer and propose a 3D HPE model based on transformer. We used two datasets, COCO and the MPII datasets, and performed a number of experiments to find the best parameters for model development and then assess the model’s performance. The experimental findings suggest that the strategy described in this study outperforms all other methods on both datasets. The average precision (AP) of our model reaches up to 79% on COCO dataset but a PCKh-0.5 score of 81.5% on the MPII dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call