Computer-Aided Language Learning (CALL) frameworks have gained much popularity because of their adaptability, enabling students to refine their language abilities at their own pace. Much research has been done to help improve CALL systems and dig out the more suitable features for targeted Mandarin mispronunciation detection is still an open research area. The acoustic model of convolutional recurrent neural networks (CRNN) (CNN + LSTM) + connectivity time series classification (CTC) model is used to convert acoustic signals into pinyin label sequences. As many Chinese speech data sets were trained using this model, the initial and vowel pronunciation error rates were finally obtained. Therefore, through linguistic classification, mainly four pronunciation errors related to the native Spanish pronunciation habits are further discovered. Moreover, based on the final analysis obtained, some corresponding instructive suggestions for further international Chinese teaching from different aspects are also put forward. Apart from proposing practical suggestions from different perspectives for further Mandarin CALL international teaching according to the experiment results’ evaluations, this system still has room for further improvement.