As a multifunctional aircraft with small size, low cost and easy control, UAVs can be used in many fields such as gardening, plant protection, mapping, logistics, military, etc. For a more convenient human-machine interaction mode, the user interacts with the UAVs in the form of dynamic gestures. The current traditional approach is to train convolutional neural networks with images as direct inputs to the system for the purpose of controlling UAVs. However, this approach leaves the temporal representation information between images missing, resulting in poor training results or over-reliance on computing resources. Therefore, this paper proposes an efficient processing strategy, that is, for simple gesture tasks, the optimized frame extraction algorithm is adopted to process picture-based gesture recognition; for complex gesture tasks, compressed video is used as system input to complete video-based gesture recognition. Based on the training results in action recognition datasets, UCF-101 and HMDB-51, the efficiency of compressed video in gesture recognition tasks has been verified, which can be applied to UAV dynamic gesture control.
Read full abstract