RASTREAMENTO DE ALVOS EM CENAS COMPLEXAS COM BASE NA VISÃO COMPUTADORIZADA

Huanan Shang

doi:10.1590/1517-8692202228052021_0532

Abstract

ABSTRACT Objective: Use the deep learning network model to identify key content in videos. Methodology: After reviewing the literature on computer vision, the feature extraction of the target video from the network using deep learning with the time-series data enhancement method was performed. The preprocessing method for data augmentation and Spatio-temporal feature extraction on the video based on LI3D network was explained. Accuracy rate, precision, and recall were used as indices. Results: The three indicators increased from 0.85, 0.88, and 0.84 to 0.89, 0.90, and 0.88, respectively. This shows that the LI3D network model maintains a high recall rate accompanied by high accuracy after data augmentation. The accuracy and loss function curves of the training phase show that the accuracy of the network is greatly improved compared to I3D. Conclusion: The experiment proves that the LI3D model is more stable and has faster convergence. By comparing the accuracy curve and loss function curve during LI3D, LI3D-LSTM, and LI3D-BiLSTM training, it is found that the LI3D-BiLSTM model converges faster. Level of evidence II; Therapeutic studies - investigation of treatment results.

Full Text