Objective: To tackle the issues of occlusion in human skeleton extraction and simplify the pixel matching related to the human skeleton structure for efficient Indian Sign Language (ISL) recognition and translation. Methods: This paper presents Occlusion-Resistant STHCN (OSTHCN) to tackle the occlusion problem in human skeleton extraction for effective ISL recognition and translation. This model incorporates Skeleton Occupancy Likelihood Map estimation using B-Spline curves to enhance the skeleton extraction. Due to occlusions caused by fingers and hands, the extracted skeleton is composed of disconnected skeletal subgraphs. Consequently, each observed skeleton is represented as a probability distribution along an ellipsoidal contour, originating from the central points of the skeleton. A heuristic technique estimates occluded skeletons using 3D probability map with an occupancy grid where each voxel indicates skeleton likelihood. The occupancy distribution is updated using observed branch clusters across image sequences, detecting occluded skeletons by finding minimum-cost paths. Finally, Maximally connected subgraphs are merged into a main graph by finding minimum-cost paths in the 3D likelihood map, enabling the prediction of occluded skeleton parts for ISL recognition and translation. Findings: OSTHCN model achieved an accuracy of 96.74% on the ISL Continuous Sign Language Translation Recognition (ISL-CSLTR) dataset outperforming existing prediction models. Novelty: This model employs a unique occlusion-handling strategy for skeleton extraction, estimating occluded part, integrating connected subgraphs via minimal cost path searches for more precise skeleton parts and enhancing accuracy for ISL recognition and translation. Keywords: Sign Language, Graph Convolutional Neural Network, Ellipsoidal Contour, 3D Likelihood Map, Occupancy Probability
Read full abstract