Detection and Tracking of Text from Video Using MSER and SIFT

M Manasa Devi,M Seetha,D Srinivasa Rao,S Viswanada Raju

doi:10.1007/978-3-030-24318-0_82

Abstract

Text that looks in a scene or is explicitly added to video can offer an imperative additional basis of directory evidence as well as evidences for interpreting the video’s arrangement and for classification. Computerized text mining from a number of stationary resources quickness up the progression in workplaces, libraries, banks and an assortment of further places. Text extraction can be completed expending a quantity of various methods provisional upon the necessity of system and exactness level. In this paper, we present and implemented two popular algorithms Maximally Stable Extremal Regions (MSER) and Scale Invariant Feature Transform (SIFT) for spotting and tracking text in digital video. We analyzed results with respect to accuracy of text detection and tracking from videos. Experimental results shows that SIFT are 80% more accurate than MSER in the process of detection and tracking for extraction of text from video. Drawbacks of these two algorithms are also identified. This research paper appearance the diverse alterations that can be made to present text mining procedures by means of applying deep learning based recurrent convolution neural networks (CNN) to rectify drawbacks of two popular proposed techniques. CNN have advantages like local spatial consistency in the input (often images), which permit them to have smaller amount weights as some parameters are shared. This process, taking the form of convolutions, makes them especially well-suited to extract relevant information at a low computational cost.

Full Text