Non-negative matrix factorization (NMF) was introduced by Paatero and Tapper in 1994 and it was a general way of reducing the dimension of the matrix with non-negative entries. Non-negative matrix factorization is very useful in many data analysis applications such as character recognition, text mining, and others. This paper aims to study the application in Chinese character recognition using non-negative matrix factorization. Python was used to carry out the LU factorization and non-negative matrix factorization of a Chinese character in Boolean Matrix. Preliminary analysis confirmed that the data size of and and are chosen for the NMF of the Boolean matrix. In this project, one hundred printed Chinese characters were selected, and all the Chinese characters can be categorized into ten categories according to the number of strokes , for . The Euclidean distance between the Boolean matrix of a Chinese character and the matrix after both LU factorization and NMF is calculated for further analysis. Paired t-test confirmed that the factorization of Chinese characters in the Boolean matrix using NMF is better than the LU factorization. Finally, ten handwritten Chinese characters were selected to test whether the program is able to identify the handwritten and the printed Chinese characters. Experimental results showed that 70% of the characters can be recognized via the least Euclidean distance obtained. NMF is suitable to be applied in Chinese character recognition since it can reduce the dimension of the image and the error between the original Boolean matrix and after NMF is less than 5%.
Read full abstract