Abstract
AbstractConvolutional Neural Networks (CNNs) have recently been applied for video classification applications where various methods for combining the appearance (spatial) and motion (temporal) information from video clips are considered. The most common method for combining the spatial and temporal information for video classification is averaging prediction scores at softmax layer. Inspired by the Mycin uncertainty system for combining production rules in expert systems, this paper proposes using the Mycin formula for decision fusion in two-stream convolutional neural networks. Based on the intuition that spatial information is more useful than temporal information for video classification, this paper also proposes multiplication and asymmetrical multiplication for decision fusion, aiming to better combine the spatial and temporal information for video classification using two-stream convolutional neural networks. The experimental results show that (i) both spatial and temporal information are important, but the decision from the spatial stream should be dominating with the decision from temporal stream as complementary and (ii) the proposed asymmetrical multiplication method for decision fusion significantly outperforms the Mycin method and average method as well.KeywordsDeep learningVideo classificationAction recognitionConvolutional neural networksDecision fusion
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.