Abstract
We propose a feature pruning method that eliminates irrelevant or redundant features in the video sequence based on the spatio-temporal neighborhood of each feature. Furthermore, our framework assumes that human movements are highly correlated with sound emissions, so we employ canonical correlation analysis to determine the correlation between audio and visual features before fusing them. We evaluate the proposed method’s performance using two datasets, one containing political speeches and the other human interactions from TV shows. The experimental results demonstrate the superiority of our approach compared to several baseline and alternative methods for recognizing human behavior. The method also assumes that human movements are highly correlated with sound emissions, so it uses canonical correlation analysis to determine the correlation between audio and visual features prior to fusing them. The performance of the proposed method is evaluated on two datasets, one containing political speeches and the other human interactions from TV shows. The experimental results demonstrate that the proposed method outperforms other baseline and alternative methods for recognizing human behavior.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.