Improved parcel sorting by combining automatic speech and character recognition

Amriteshwar Singh,John H L Hansen,Abhijeet Sangwan

doi:10.1109/espa.2012.6152444

Amriteshwar Singh, John H L Hansen + Show 1 more

https://doi.org/10.1109/espa.2012.6152444

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2012

Citations: 5

Affiliation: The University of Texas at Dallas

Abstract
Full-Text
Similar Papers

Abstract

Listen

Automatic postal sorting systems have traditionally relied on optical character recognition (OCR) technology. While OCR systems perform well for flat mail items such as envelopes, the performance deteriorates for parcels. In this study, we propose a new multimodal solution for parcel sorting which combines automatic speech recognition (ASR) technology with OCR in order to deliver better performance. Our multimodal approach is based on estimating OCR output confidence, and then optionally using ASR system output when OCR results show low confidence. Particularly, we proposed a Levenshtein edit distance (LED) based measure to compute OCR confidence. Based on the OCR confidence measure, a dynamic fusion strategy is developed that forms its final decision based on (i) OCR output alone, (ii) ASR output alone, and (iii) combination of ASR and OCR outputs. The proposed system is evaluated on speech and image data collected in real-world conditions. Our experiments show that the proposed multimodal solution achieves an overall zip code recognition rate of 90.2%, which is a substantial improvement over ASR alone (81%) and OCR alone (80.6%) systems. This advancement represents an important contribution that leverages OCR and ASR technologies to improve address recognition in parcels.

Full Text