Wavelet Band-pass Filters for Matching Multiple Templates in Real-time

Yue Wu,Premkumar Natarajan,Joseph Noonan,Pradeep Natarajan,Rohit Prasad

doi:10.5244/c.25.90

Abstract

Many applications in image processing and computer vision require finding a particular template in an image or a video, that is, template matching. Given a template and an input, the matching algorithm finds the region of interest (ROI) that most closely matches the template in terms of some similarity measurement. According to the way similarity measurements are performed, the template matching methods can roughly be classified into two groups: 1) patch matching schemes, such as the sum of absolute difference (SAD) , the sum of squared difference (SSD) [1], or cross correlation (XCORR), where the similarity measurement directly relies on pixel information from the patch of interest; and 2) feature matching schemes, such as invariant features [4] and bags of features [5], where similarity measurement relies on features describing the template and the frame. Patch matching methods are not robust, especially when noise, skew, or errors occur. Further, they consume a large amount of time, because of expensive sliding window search for calculating the similarity score over all possible locations. Several techniques have been explored for accelerating such matching methods, including early rejections and correlation techniques [1]. However, the computation cost could still be unaffordable when the frame size is large. Typically other techniques, like frame difference, are used to reduce the search space in applications. Feature matching methods process the template and describe it with features, which are ideally invariant to rotation, skew, noise etc. However in many cases, the use of a more complicated model for similarity measurement results in higher computational cost. Further, sliding window search is also a costly stage for such methods. While there exist known algorithms for fast search of object instances in an image using branch-and-bound techniques, in our particular problem, methods of this type have two crucial limitations. First, they require a large number of training samples for each class to learn robust classifiers. Second, interest point detectors like SIFT [5] typically do not generate sufficient number of feature points, because of the small size of the provided logo, large homogenous regions and degradations. Wavelets based approaches have been extensively used in object detection and recognition. In [3], wavelet coefficients based image histogram are collected in bins and are used for classifying logos. In [6], wavelet coefficients are directly used and trained for pedestrian detection. In [8], wavelet coefficients are selected to form rotation-invariant features by using the angular-radial transform. However, matching logos within frames using [3, 6, 8] still requires expensive window searching and thus are not appropriate for real-time processing. In this paper, we propose a new matching method using the wavelet based band-pass filters (WBPFs). Instead of using direct distance measurement requiring expensive window search, the similarity is measured in the indirect way involving two stages. In the stage of offline template processing (see Figure 1), a template is automatically described by a set of three directional WBPFs, where only salient wavelet frequency components of the template are allowed to pass. In the stage of online frame processing (see Figure 2), a frame is transformed to the wavelet domain and its sub-bands are filtered with respect to the corresponding template WBFPs. Finally, the detection is made with respect to the region of the densest responses under spatial constraints [2, 4]. We show that the proposed template matching system has a very low computational cost, which is 50 times faster than the correlation based SSD [1] and 10 times faster than the orthogonal Haar transform (OHT) based SSD [7]. Further, the proposed method does not trade-off accuracy, since the use of subtemplate information makes it robust to skew and camera view change. Experimental results demonstrate our method for real-time logo detection in broadcast videos.

Full Text