Automatic human action recognition remains an intricate issue that includes extractions as well as classifying spatial features of individual pose on video remains as difficult problem for computer vision specialists. Human activity is described as temporal variation of the human body. Capturing and assessing multimedia content of dance remains beneficial for the conservation of cultural heritage, for creating video recommendation systems, for aiding learners to employ tutoring systems, and so on. Indian classical dance (ICD) classification remains a fascinating discipline due to its intricate body posture. This gives a stage for examining diverse computer vision alongside deep learning notions. Through modification in learning styles, automated teaching solutions became unavoidable over all disciplines, from classical to online forums. Furthermore, ICD becomes a necessary portion of a prosperous cultural and heritage that at any means should be updated and conserved. Although several classification approaches prevail for bidimensional dance images, in this study, we define the issue of dance style classification for classifying Indian dance. The dance consists of intricate poses like full-body rotation and self-occlusion, and we proffer three-phase deep learning techniques. Initially, we excerpted necessary joint positions of the dance out of every video frame by pre-trained paradigm, TensorFlow Mobile Net Architecture, that assists us in assessing whatsoever body position within the frame. Next, correlative or identification of the determined action factors by Cosine Similarity was employed. Lastly, we classified the dance pose by excerpted salient details and trained this employing Convolution Neural Network – Long Short Term Memory (CNN-LSTM) Network structure utilizing training dataset of system for classification. We incorporate a feature vector of flow, angles linking anchor joint, and regularised remoteness vectors that catch proximity configuration in skeletal design rather than providing an informal chain of joints in sequence learner that ordinarily obstructs network execution. Therefore, kinematic association amidst body joints through frames employing pose assessment aids in finer showing of the Spatio-temporal reliance. The proffered CNN-LSTM structure for dance classification is compared with Multilayer Perceptron (MLP), Convolutional LSTM (LRCN), Multilayer Perceptron (MLP), CONB3D and LSTM concerning criteria like F1-score, accuracy, recall, AUC curve and precision. Therefore, determined that the proffered CNN-LSTM classification structure attains 98.53% of accuracy, 99.04% of precision, 98.47% of recall, 99.12% of AUC score, and 98.72% of F1-score during the examining procedure. During the training procedure, this attains 94.01% of accuracy, 93.11% of precision, 94.76% of recall, 96.06% of AUC score, and 93.51% of F1-score.
Read full abstract