3D-COCO: Extension of MS-COCO Dataset for Scene Understanding and 3D Reconstruction
We introduce 3D-COCO, an extension of the original MS-COCO [1] dataset providing 3D models and 2D-3D alignment annotations. 3D-COCO was designed to achieve computer vision tasks such as 3D reconstruction or image detection configurable with textual, 2D image, and 3D CAD model queries. We complete the existing MS-COCO [1] dataset with 28 K 3D models collected on ShapeNet [2] and Objaverse [3]. By using an IoU-based method, we match each MS-COCO [1] annotation with the best 3D models to provide a 2D-3D alignment. The open-source nature of $3 \mathrm{D}-\mathrm{COCO}$ is a premiere that should pave the way for new research on 3D-related topics. The dataset and its source codes is available at https://kalisteo.cea.fr/index.php/ coco3d-object-detection-and-reconstruction/
- Book Chapter
2
- 10.1007/978-981-19-0019-8_20
- Jan 1, 2022
The presence of potholes on the roads is one of the major causes of road accidents as well as wear and tear of vehicles. Various methods have been implemented to solve this problem ranging from manual reporting to authorities to the use of vibration-based sensors to 3D reconstruction using laser imaging. However, these methods have some limitations such as the high setup cost, risk while detection or no provision for night vision. In this work, we use the Mask R-CNN model to detect potholes, as it provides exceptional segmentation results. We synthetically generate a dataset for potholes, annotate it, do data augmentation and perform transfer learning on top of Mask R-CNN model which is pre-trained on MS COCO dataset. This support system was tested in varying lighting and weather conditions and was performed well in these situations as well.KeywordsPothole detectionTransfer learningMask R-CNN
- Research Article
- 10.62051/ijcsit.v4n3.46
- Dec 23, 2024
- International Journal of Computer Science and Information Technology
This paper proposes an unsupervised deep learning model for homography estimation, addressing limitations of traditional feature-based methods and supervised learning approaches. By leveraging reprojection error as the optimization objective, the model eliminates the need for labeled data while achieving precise homography estimation. The framework comprises a feature extraction module, a feature difference module, and a homography regression network. Extensive experiments on the MS-COCO and HPatches datasets demonstrate that the proposed model achieves a mean reprojection error (MRE) of 3.67 pixels and an accuracy (ACC) of 88.3%, closely approaching the performance of supervised methods like DeepHomography while significantly outperforming classical SIFT + RANSAC. The model's lightweight design ensures efficient inference, requiring only 12ms per estimation, making it suitable for real-time applications. Ablation studies validate the effectiveness of key components such as the feature difference module and regularization loss, highlighting their contributions to performance improvement. Compared to traditional methods, the proposed approach exhibits superior robustness under varying lighting conditions, viewpoint changes, and noise interference. Moreover, it removes the dependency on labeled data, reducing application costs and barriers. This unsupervised framework presents a practical and efficient solution for homography estimation and offers potential for broader applications in multi-view geometry and 3D reconstruction tasks. This is an example abstract. It describes the content and purpose of the paper succinctly. The abstract should be single-spaced and use Times New Roman 10 pt font.