Abstract

The simplebaseline model achieves high performance of human pose estimation with simple network structure. But the model lacks the layer and spatial information fusion. In this paper, we propose DLSAnet, which fuse layers and spatial information efficetively. DLSAnet uses DLA as backbone which has excellent feature extraction capabilities in the field of object detection. In addition, a modified spatial pyramid pooling is introduced to pool and connect multi-scale local area features, allowing the network to learn object features more comprehensively. Using a four-branch SPP module instead of a single-branch SPP module connected by a single hopping layer. This method is effective in alleviating the problem of slow loss drop late in training. Experiments show that DLSAnet can achieve better accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call