Abstract

AdaBoost has been proved a successful statistical learning method for concept detection with high performance of discrimination and generalization. However, it is computationally expensive to train a concept detector using boosting, especially on large scale datasets. The bottleneck of training phase is to select the best learner among massive learners. Traditional approaches for selecting a weak classifier usually run in O(NT), with N examples and T learners. In this paper, we treat the best learner selection as a Nearest Neighbor Search problem in the function space instead of feature space. With the help of Locality Sensitive Hashing (LSH) algorithm, the best learner searching procedure can be speeded up in the time of O(NL), where L is the number of buckets in LSH. Compared with the T (~500,000), the L (~600) is much smaller in our experiments. In addition, through studying the distribution of weak learners and candidate query points, we present an efficient method to try to partition the weak learner points and the feasible region of query points uniformly as much as possible, which can achieve significant improvement in both recall and precision compared with the random projection in traditional LSH algorithm. Experimental results reveal our method can significantly reduce the training time. And still the performance of our method is comparable with the state-of-art methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.