Communication has been crucial to human existence, society, and globalization for millennia. Speech Recognition (SR) technologies include biometric evaluation, security, safety, medical care, and smart cities. Most research has primarily focused on English; others neglect other lower-asset dialects like Uzbek, neglecting its research unaddressed. This study examines the efficacy of peer and ASR response in wireless mobile networks-assisted pronouncing training. This study proposes a Deep Neural Network (DNN) and Hidden Markov Method (HMM) based ASR model to develop a voice recognition system utilizing a combination of Connected Time-based Categorization (CTC)-attending networks for the Uzbek words and their variants. The suggested method diminishes training duration and enhances SR precision by efficiently employing the CTC goal function in attentiveness modeling. The research assessed the results of both linguistic experts and native speakers on the Uzbek database, which was compiled for this research. The data were gathered through a pronunciation assessment and a discussion. The participant was further instructed in the classroom. Test outcomes indicate that the suggested method attained a word error rate of 13.1%, utilizing 210 hours of records as a learning dataset for the Uzbek dialect. The proposed technique can significantly enhance students' pronunciation qualities. It might inspire pupils to participate in pronunciation learning.