In tone languages, such as Mandarin Chinese, a syllable with different tones conveys different meanings. Previously, Mandarin tone recognition based on Mel-frequency cepstral coefficients (MFCCs) and Convolutional Neural Networks (CNN) was examined and the results outperformed the model of conventional neural network using manually edited F0 data. In the present study, Mandarin tone recognition based on spectrograms, instead of MFCCs, was explored. Unsupervised feature learning was applied to the unlabeled spectrograms directly with a denoising autoencoder (dAE). Then, the model convolved the labeled spectrograms with the learnt “sound features” and produced a set of feature maps. A dataset that consisted of 4500 monosyllabic words collected from 125 children was used to evaluate the recognition performance. Compared with methods based on MFCCs, there are more parameters to train in the new approach based on spectrograms. As a result, the new model might better capture the statistical distribution in the original data. Therefore, the new approach, with unsupervised feature learning, could perform even better than previous methods based on MFCCs or those based on the extracted F0 information. The advantages and shortcomings of various approaches for lexical tone recognition will be discussed. [Work supported in part by the NIH NIDCD Grant No. R15-DC014587.]