Abstract
In response to the evolving challenges posed by small unmanned aerial vehicles (UAVs), which have the potential to transport harmful payloads or cause significant damage, we present AV-FDTI, an innovative Audio-Visual Fusion system designed for Drone Threat Identification. AV-FDTI leverages the fusion of audio and omnidirectional camera feature inputs, providing a comprehensive solution to enhance the precision and resilience of drone classification and 3D localization. Specifically, AV-FDTI employs a CRNN network to capture vital temporal dynamics within the audio domain and utilizes a pretrained ResNet50 model for image feature extraction. Furthermore, we adopt a visual information entropy and cross-attention-based mechanism to enhance the fusion of visual and audio data. Notably, our system is trained based on automated Leica tracking annotations, offering accurate ground truth data with millimeter-level accuracy. Comprehensive comparative evaluations demonstrate the superiority of our solution over the existing systems. In our commitment to advancing this field, we will release this work as open-source code and wearable AV-FDTI design, contributing valuable resources to the research community.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.