Abstract
We investigate the fine-grained object classification problem of determining make, model and year of car from a video. To this end we introduce a new dataset called CarVideos. It is a novel dataset for fine-grained object classification in videos. The CarVideos dataset contains over a million video frames annotated with bounding boxes around the visible cars as well as the specific year, make and model of each car. We implemented several state-of- the-art methods for object classification in videos and compared them using the dataset to establish a baseline performance level for future research. We also introduce a novel approach to fine-grained object classification in videos that combines a Single Shot Multibox Detector (SSD) with a single stream multi-region convolutional neural network (CNN). Our experiments show that our novel method significantly outperforms previous methods in terms of accuracy on the dataset. Our approach outperforms Temporal Segment Networks (TSN) and 3D Convolutional Networks, which are state-of-the-art on human action recognition in videos.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.