Abstract

Automatic postal sorting systems have traditionally relied on optical character recognition (OCR) technology. While OCR systems perform well for flat mail items such as envelopes, the performance deteriorates for parcels. In this study, we propose a new multimodal solution for parcel sorting which combines automatic speech recognition (ASR) technology with OCR in order to deliver better performance. Our multimodal approach is based on estimating OCR output confidence, and then optionally using ASR system output when OCR results show low confidence. Particularly, we proposed a Levenshtein edit distance (LED) based measure to compute OCR confidence. Based on the OCR confidence measure, a dynamic fusion strategy is developed that forms its final decision based on (i) OCR output alone, (ii) ASR output alone, and (iii) combination of ASR and OCR outputs. The proposed system is evaluated on speech and image data collected in real-world conditions. Our experiments show that the proposed multimodal solution achieves an overall zip code recognition rate of 90.2%, which is a substantial improvement over ASR alone (81%) and OCR alone (80.6%) systems. This advancement represents an important contribution that leverages OCR and ASR technologies to improve address recognition in parcels.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.