Abstract

Most traditional postcode recognition systems implicitly assumed that the distribution of the 10 numerals (0–9) is balanced. However it is far from a reasonable setting because the distribution of 0–9 in postcodes of a country or a city is generally imbalanced. Some numerals appear in more postcodes, while some others do not. In this paper, we study cost-sensitive neural network classifiers to address the class imbalance problem in postcode recognition. Four methods, namely: cost-sampling, cost-convergence, rate-adapting and threshold-moving are considered in training neural networks. Cost-sampling adjusts the distribution of the training data such that the costs of classes are conveyed explicitly by the appearances of their instances. Cost-convergence and rate-adapting are carried out in training phase by modifying the architecture of training algorithms of the neural network. Threshold-moving tries to increase the probability estimations of expensive classes to avoid the samples with higher costs to be misclassified. 10,702 postcode images are experimented using five cost matrices based on the distribution of numerals in postcodes. The results suggest that cost-sensitive learning is indeed effective on class imbalanced postcode analysis and recognition. It also reveals that cost-sampling on a proper cost matrix outperforms others in this application.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call