Abstract

Interpretability plays a vital role in understanding complex deep learning models by providing transparency and insights. It addresses the black-box nature of these models, aids in human-in-the-loop systems, enhances model development, and supports education. However, existing interpretability algorithms in computer vision primarily target images, leaving a gap for video applications. In this article, we emphasize the importance of interpretability in video-based Human Action Recognition (HAR). We extend existing 2D interpretability techniques to the 3D domain, specifically focusing on saliency maps and gradient-weighted class activation maps. The proposed interpretability system is then employed in the analysis of a well-known HAR dataset to better understand action recognition in videos.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.