With the development of globalization, learning a second language has received increasing attention from people. To improve English oral proficiency, a computer-aided online learning system for English oral pronunciation is studied. A denoising autoencoder is integrated into the system to create a simplified end-to-end recurrent neural network for pronunciation detection and diagnosis based on deep learning. The study first collected and preprocessed oral pronunciation data of English learners, including enhancing speech signals and reducing noise. Next, an RNN model with Long Short-Term Memory (LSTM) as the core was constructed to capture time series characteristics in pronunciation. And use DAE to extract features and reduce the influence of background noise to enhance the recognition of pronunciation features. At the same time, the study utilized web crawler technology to collect a large amount of oral pronunciation data from non-native English learners, and constructed an English oral corpus containing pronunciation errors. And in order to simulate real situations, white noise and pink noise were artificially added to the corpus in the study, and they were divided into training and testing sets in a ratio of 60% to 40%. The results showed that the classification accuracy of the system in the training and testing sets under white noise environment was 78.97% and 94.01%, respectively, and the classification accuracy in the pink noise environment was 76.19% and 94.03%, respectively. The system’s error detection accuracy in vowel and consonant pronunciation detection is 88.91% and 91.68%, respectively, and the error correction accuracy in vowel and consonant pronunciation detection is 90.67% and 91.96%, respectively. In summary, the research on computer-aided online learning of English oral pronunciation based on Denoising Auto Encoders end-to-end recurrent neural networks has effectively improved learning efficiency.