Abstract

In recent years, deep neural networks have achieved significant success in 3D point cloud recognition tasks. However, these models still demonstrate substantial performance challenges in the presence of data corruption. It is crucial to improve the robustness and generalization ability of the model. In this work, we propose a novel framework that combines GPT and CLIP models to enhance the robustness of existing point cloud classification models, which has two main modules: the Text-Image Fusion Module, which includes a GPT-Driven TextGen Processor and FocalView Projection, and the Dual-Path Intelligent Adapter Module. First, the GPT-Driven TextGen Processor leverages GPT-4’s capabilities to generate detailed textual descriptions tailored to point cloud intricacies. FocalView Projection dynamically selects viewpoints based on attention maps, enhancing two-dimensional representations of three-dimensional point clouds. Secondly, the Dual-Path Intelligent Adapter Module achieves fine-tuning and feature adaptation by combining internal and external adapters. Additionally, during the fine-tuning process, we employ a variant of Projected Gradient Descent (PGD) adversarial training, named VPGD, to increase the model’s resilience to adversarial perturbations. Our approach has achieved state-of-the-art results on robust 3D points cloud recognition datasets such as ModelNet40-C and ScanObjectNN-C.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.