Abstract
We introduce an algorithm for text detection and localization ("spotting") that is computationally efficient and produces state-of-the-art results. Our system uses multi-channel MSERs to detect a large number of promising regions, then subsamples these regions using a clustering approach. Representatives of region clusters are binarized and then passed on to a deep network. A final line grouping stage forms word-level segments. On the ICDAR 2011 and 2015 benchmarks, our algorithm obtains an F-score of 82% and 83%, respectively, at a computational cost of 1.2 seconds per frame. We also introduce a version that is three times as fast, with only a slight reduction in performance.
Accepted Version (
Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have