Sign language recognition, an essential interface between the hearing and deaf-mute communities, faces challenges with high false positive rates and computational costs, even with the use of advanced deep learning techniques. Our proposed solution is a stacked encoded model, combining artificial intelligence (AI) with the Internet of Things (IoT), which refines feature extraction and classification to overcome these challenges. We leverage a lightweight backbone model for preliminary feature extraction and use stacked autoencoders to further refine these features. Our approach harnesses the scalability of big data, showing notable improvement in accuracy, precision, recall, F1-score, and complexity analysis. Our model's effectiveness is demonstrated through testing on the ArSL2018 benchmark dataset, showcasing superior performance compared to state-of-the-art approaches. Additional validation through an ablation study with pre-trained convolutional neural network (CNN) models affirms our model's efficacy across all evaluation metrics. Our work paves the way for the sustainable development of high-performing, IoT-based sign-language-recognition applications.