Single-Panorama Classification of 3D Objects Using Horizontally Stacked Dilated Convolutions
This paper presents a single-image approach for classifying 3D objects represented as meshes. Our method centers a virtual spherical camera at the object’s centroid and casts omnidirectional rays. Then, it computes local geometry information of each ray’s first and last intersection points, generating a single multi-channel equirectangular (ERP) image per object. We propose a convolutional block named Horizontally Stacked Dilated Convolution (HSDC) to handle ERP distortions and introduce a classifier built upon these blocks. Our experiments in popular datasets show that the results produced by our method are competitive or better than state-of-the-art voxel-and point-based methods, being the best among single-view approaches. Code is available at https://github.com/rmstringhini/HSDCNet.
- Supplementary Content
- 10.2312/2632808
- Apr 18, 2019
The amount of 3D objects has grown over the last decades, but we can expect that it will grow much further in the future. 3D objects are also becoming more and more accessible to non-expert users. The growing amount of available 3D data is welcome for everyone working with this type of data, as the creation and acquisition of many 3D objects is still costly. However, the vast majority of available 3D objects are only present as pure polygon meshes. We arguably can not assume to get meta-data and additional semantics delivered together with 3D objects stemming from non-expert or 3D scans of real objects from automatic systems. For this reason content-based retrieval and classification techniques for 3D objects has been developed. Many systems are based on the completely unsupervised case. However, previous work has shown that there are strong possibilities of highly increasing the performance of these tasks by using any type of previous knowledge. In this thesis I use procedural models as previous knowledge. Procedural models describe the construction process of a 3D object instead of explicitly describing the components of the surface. These models can include parameters into the construction process to generate variations of the resulting 3D object. Procedural representations are present in many domains, as these implicit representations are vastly superior to any explicit representation in terms of content generation, flexibility and reusability. Therefore, using a procedural representation always has the potential of outclassing other approaches in many aspects. The usage of procedural models in 3D object retrieval and classification is not highly researched as this powerful representation can be arbitrary complex to create and handle. In the 3D object domain, procedural models are mostly used for highly regularized structures like buildings and trees. However, Procedural models can deeply improve 3D object retrieval and classification, as this representation is able to offer a persistent and reusable full description of a type of object. This description can be used for queries and class definitions without any additional data. Furthermore, the initial classification can be improved further by using a procedural model: A procedural model allows to completely parameterize an unknown object and further identify characteristics of different class members. The only drawback is that the manual design and creation of specialized procedural models itself is very costly. In this thesis I concentrate on the generalization and automation of procedural models for the application in 3D object retrieval and 3D object classification. For the generalization and automation of procedural models I propose to offer different levels of interaction for a user to fulfill the possible needs of control and automation. This thesis presents new approaches for different levels of automation: the automatic generation of procedural models from a single exemplary 3D object. The semi-automatic creation of a procedural model with a sketch-based modeling tool. And the manual definition a procedural model with restricted variation space. The second important step is the insertion of parameters into the procedural model, to define the variations of the resulting 3D object. For this step I also propose several possibilities for the optimal level of control and automation: An automatic parameter detection technique. A semi-automatic deformation based insertion. And an interface for manually inserting parameters by choosing one of the offered insertion principles. It is also possible to manually insert parameters into the procedures if the user needs the full control on the lowest level. To enable the usage of procedural models directly for 3D object retrieval and classification techniques I propose descriptor-based and deep learning based approaches. Descriptors measure the difference of 3D objects. By using descriptors as comparison algorithm, we can define the distance between procedural models and other objects and order these by similarity. The procedural models are sampled and compared to retrieve an optimal object retrieval list. We can also directly use procedural models as data basis for a retraining of a convolutional neural network. By deep learning a set of procedural models we can directly classify new unknown objects without any further large learning database. Additionally, I propose a new multi-layered parameter estimation approach using three different comparison measures to parameterize an unknown object. Hence, an unknown object is not only classified with a procedural model but the approach is also able to gather new information about the characteristics of the object by using the procedural model for the parameterization of the unknown object. As a result, the combination of procedural models with the tasks of 3D object retrieval and classification lead to a meta concept of a holistically seamless system of defining, generating, comparing, identifying, retrieving, recombining, editing and reusing 3D objects.
- Research Article
48
- 10.1186/s13673-020-00228-8
- May 7, 2020
- Human-centric Computing and Information Sciences
With the wide application of Light Detection and Ranging (LiDAR) in the collection of high-precision environmental point cloud information, three-dimensional (3D) object classification from point clouds has become an important research topic. However, the characteristics of LiDAR point clouds, such as unstructured distribution, disordered arrangement, and large amounts of data, typically result in high computational complexity and make it very difficult to classify 3D objects. Thus, this paper proposes a Convolutional Neural Network (CNN)-based 3D object classification method using the Hough space of LiDAR point clouds to overcome these problems. First, object point clouds are transformed into Hough space using a Hough transform algorithm, and then the Hough space is rasterized into a series of uniformly sized grids. The accumulator count in each grid is then computed and input to a CNN model to classify 3D objects. In addition, a semi-automatic 3D object labeling tool is developed to build a LiDAR point clouds object labeling library for four types of objects (wall, bush, pedestrian, and tree). After initializing the CNN model, we apply a dataset from the above object labeling library to train the neural network model offline through a large number of iterations. Experimental results demonstrate that the proposed method achieves object classification accuracy of up to 93.3% on average.
- Conference Article
14
- 10.24963/ijcai.2020/443
- Jul 1, 2020
Three-dimensional (3D) object classification is widely involved in various computer vision applications, e.g., autonomous driving, simultaneous localization and mapping, which has attracted lots of attention in the committee. However, solving 3D object classification by directly employing the 3D convolutional neural networks (CNNs) generally suffers from high computational cost. Besides, existing view-based methods cannot better explore the content relationships between views. To this end, this work proposes a novel multi-view framework by jointly using multiple 2D-CNNs to capture discriminative information with relationships as well as a new multi-view loss fusion strategy, in an end-to-end manner. Specifically, we utilize multiple 2D views of a 3D object as input and integrate the intra-view and inter-view information of each view through the view-specific 2D-CNN and a series of modules (outer product, view pair pooling, 1D convolution, and fully connected transformation). Furthermore, we design a novel view ensemble mechanism that selects several discriminative and informative views to jointly infer the category of a 3D object. Extensive experiments demonstrate that the proposed method is able to outperform current state-of-the-art methods on 3D object classification. More importantly, this work provides a new way to improve 3D object classification from the perspective of fully utilizing well-established 2D-CNNs.
- Research Article
20
- 10.1364/ao.43.000442
- Jan 10, 2004
- Applied Optics
We address three-dimensional (3D) object classification with computational holographic imaging. A 3D object can be reconstructed at different planes by use of a single hologram. We apply principal component and Fisher linear discriminant analyses based on Gabor-wavelet feature vectors to classify 3D objects measured by digital interferometry. Experimental and simulation results are presented for regional filtering concentrated at specific positions and for overall grid filtering. The proposed technique substantially reduces the dimensionality of the 3D classification problem. To the best of our knowledge, this is the first report on the use of the proposed technique for 3D object classification.
- Research Article
- 10.1504/ijcvr.2019.10025713
- Jan 1, 2019
- International Journal of Computational Vision and Robotics
Since the discovery of 3D sensors such as Kinect camera, 3D object models, and point clouds become frequently used in many areas. The most important one is the 3D object recognition and classification in robotic applications. This type of sensors, like the human vision, allows generating the object model from a field of view or even a complete 3D object model by combining several individual Kinect frames. In this work, we propose a new feature learning-based object classification approach using point cloud library (PCL) detectors and descriptors and deep belief networks (DBNs). Before developing the classification approach, we evaluate 3D descriptors by proposing a new pipeline that uses the L2-distance and the recognition threshold. 3D descriptors are computed on different datasets, in order to achieve the best descriptors. Subsequently, these descriptors are used to learn robust features in the classification approach using DBNs. We evaluate the performance of these contributions on two datasets; Washington RGB-D and our real 3D object datasets. The results show that the proposed approach outperforms advanced methods by approximately 5% in terms of accuracy.
- Research Article
24
- 10.1364/osac.1.000373
- Sep 14, 2018
- OSA Continuum
We propose a framework for three-dimensional (3D) object recognition and classification in very low illumination environments using convolutional neural networks (CNNs). 3D images are reconstructed using 3D integral imaging (InIm) with conventional visible spectrum image sensors. After imaging the low light scene using 3D InIm, the 3D reconstructed image has a higher signal-to-noise ratio than a single 2D image, which is a result of 3D InIm being optimal in the maximum likelihood sense for read-noise dominant images. Once 3D reconstruction has been performed, the 3D image is denoised and regions of interest are extracted to detect 3D objects in a scene. The extracted regions are then inputted into a CNN, which was trained under low illumination conditions using 3D InIm reconstructed images, to perform object recognition. To the best of our knowledge, this is the first report of utilizing 3D InIm and convolutional neural networks for 3D training and 3D object classification under very low illumination conditions.
- Conference Article
16
- 10.1109/cgiv.2004.1
- Jul 26, 2004
This paper proposes a method for recognition and classification of 3D objects using 2D moments and HMLP network. The 2D moments are calculated based on 2D intensity images taken from multiple cameras that have been arranged using multiple views technique. 2D moments are commonly used for 2D pattern recognition. However, the current study proves that with some adaptation to multiple views technique, 2D moments are sufficient to model 3D objects. In addition, the simplicity of 2D moment's calculation reduces the processing time for feature extraction, thus decreases the recognition time. The 2D moments were then fed into a neural network for classification of the 3D objects. In the current study, hybrid multi-layered perceptron (HMLP) network is proposed to perform the classification. Two distinct groups of objects that are polyhedral and free-form objects were used to access the performance of the proposed method. The recognition results show that the proposed method has successfully classified the 3D object with the accuracy of up to 100%.
- Research Article
76
- 10.1109/tmm.2019.2943740
- Oct 3, 2019
- IEEE Transactions on Multimedia
In this paper, we propose the multi-view saliency guided deep neural network (MVSG-DNN) for 3D object retrieval and classification. This method mainly consists of three key modules. First, the module of model projection rendering is employed to capture the multiple views of one 3D object. Second, the module of visual context learning applies the basic Convolutional Neural Networks for visual feature extraction of individual views and then employs the saliency LSTM to adaptively select the representative views based on multi-view context. Finally, with these information, the module of multi-view representation learning can generate the compile 3D object descriptors with the designed classification LSTM for 3D object retrieval and classification. The proposed MVSG-DNN has two main contributions: 1) It can jointly realize the selection of representative views and the similarity measure by fully exploiting multi-view context; 2) It can discover the discriminative structure of multi-view sequence without constraints of specific camera settings. Consequently, it can support flexible 3D object retrieval and classification for real applications by avoiding the required camera settings. Extensive comparison experiments on ModelNet10, ModelNet40, and ShapeNetCore55 demonstrate the superiority of MVSG-DNN against the state-of-art methods.
- Research Article
10
- 10.48161/qaj.v4n2a557
- Jun 30, 2024
- Qubahan Academic Journal
The development of 3D scanning technologies has made it possible to obtain an increasing amount of data about the external world, which leads to the need for efficient methods of processing acquired data to recognize objects. Traditional approaches face accuracy, speed, and reliability problems due to the complexity and diversity of object shapes, sizes, degrees of detail, and the presence of noise and artifacts in the data. Therefore, our goal was to improve the object recognition efficiency. It is necessary to determine the method of obtaining geometric and topological parameters. In the paper it is proposed to use the method of Laplace-Beltrami, which allows to calculate distances and angles between points within a given area. Next, it is necessary to determine which parameters will be used to analyze the obtained geometric data. We propose the use of three spectral descriptors – Heat Kernel Signature (HKS), Weave Kernel Signature (WKS) and the wavelet descriptor (SGWT). Then, we develop a high-accuracy recognition method based on spectral and topological invariants processed using a convolutional neural network. Subsequently, the parameters of the descriptors are calculated, and then they are calculated through the neural network, resulting in the classification of the object. In summary, the structure of the proposed method comprises the computation of the Laplace-Beltrami spectrum, the construction of spectral distribution maps, and the subsequent processing of this information using a neural network. After analyzing the results, we found that the proposed method has a recognition rate of 0.9 s and recognition accuracy of 97%. It was shown how much more effective the use of the three descriptors was compared to the use of each one individually. An example of object recognition using the proposed method was also given. The methodology outlined in this paper utilizes machine learning to achieve high levels of accuracy in the classification of different objects. Effectiveness of the proposed method in this study enhance existing recognition systems and open new opportunities for their application in various fields, including robotics, agriculture, and navigation. This research demonstrates considerable potential for further development and application in the field of agriculture, underscoring the continued necessity for research in this area
- Research Article
26
- 10.3390/app10196735
- Sep 26, 2020
- Applied Sciences
Point clouds have been widely used in three-dimensional (3D) object classification tasks, i.e., people recognition in unmanned ground vehicles. However, the irregular data format of point clouds and the large number of parameters in deep learning networks affect the performance of object classification. This paper develops a 3D object classification system using a broad learning system (BLS) with a feature extractor called VB-Net. First, raw point clouds are voxelized into voxels. Through this step, irregular point clouds are converted into regular voxels which are easily processed by the feature extractor. Then, a pre-trained VoxNet is employed as a feature extractor to extract features from voxels. Finally, those features are used for object classification by the applied BLS. The proposed system is tested on the ModelNet40 dataset and ModelNet10 dataset. The average recognition accuracy was 83.99% and 90.08%, respectively. Compared to deep learning networks, the time consumption of the proposed system is significantly decreased.
- Research Article
5
- 10.1127/1432-8364/2013/0172
- Jun 1, 2013
- Photogrammetrie - Fernerkundung - Geoinformation
Due to the increasing availability of large unstructured point clouds from lasers scanning and photogrammetry, there is a growing demand for automatic evaluation methods. Given the complexity of the underlying problems, several new methods resort to using semantic knowledge in particular for object detection and classification support. In this paper, we present a novel approach, which makes use of advanced algorithms, and benefits from intelligent knowledge management strategies for the processing of 3D point clouds and object classification in a scanned scene. In particular, our method extends the use of semantic knowledge to all stages of the processing, including the guidance of the 3D processing algorithms. The complete solution consists of a multi-stage, iterative, concept based on three factors: the modeled knowledge, the package of algorithms, and the classification engine.
- Research Article
5
- 10.1127/1432-8364/2013/01721432-8364/13/0172
- Jun 1, 2013
- HAL (Le Centre pour la Communication Scientifique Directe)
International audience
- Research Article
16
- 10.1109/access.2019.2947245
- Jan 1, 2019
- IEEE Access
The rapid development of 3D technique has led to the dramatic increase in 3D data. The scalable and effective 3D object retrieval and classification algorithms become mandatory for large-scale 3D object management. One critical problem of view-based 3D object retrieval and classification is how to exploit the relevance and discrimination among multiple views. In this paper, we propose a multi-view hierarchical fusion network (MVHFN) for these two tasks. This method mainly contains two key modules. First, the module of visual feature learning applies the 2D CNNs to extract the visual feature of multiple views rendered around the specific 3D object. Then, the multi-view hierarchical fusion module we proposed is employed to fuse the multiple view features into a compact descriptor. This module can not only fully exploit the relevance among multiple views by intra-cluster multi-view fusion mechanism, but also discover the content discrimination by inter-cluster multi-view fusion mechanism. Experimental results on two public datasets, i.e., ModelNet40 and ShapeNetCore55, show that our proposed MVHFN outperforms the current state-of-the-art methods in both the 3D object retrieval and classification tasks.
- Research Article
- 10.36001/phmap.2023.v4i1.3602
- Sep 4, 2023
- PHM Society Asia-Pacific Conference
This study presents a monitoring method that utilizes 3D object classification to accurately detect mechanical and electrical components of a wind turbine by combining a geometric and statistic feature extractor (GSFE) with a multiview approach. The proposed monitoring method also detect outlier after executing object detection to localize overheat faults in these components with fused Optical or Infrared/LiDAR measurements. The proposed method hasthree key characteristics. First, the proposed outlier detection allocates two extremes of normal and faulty clusters by using 2D object classification/detection model or measuring thestandard deviation of temperature with sensor fusing measurements. Specifically, the outlier detection with sensor fusing measurements extracts the position coordinates andtemperature data to localize overheat faults, effectively detecting an overheat component. Second, the GSFE utilizes a group sampling approach to extract the local geometric feature information from neighboring point clouds, aggregating normal vectors and standard deviation. This method ensures the high accuracy of object classification. Third, a multi-view approach focuses on updating local geometric and statistic features through a graph convolution network, improving the accuracy and robustness of object classification. The proposed outlier detection is verified through overheat/fire field tests. The effectiveness of the proposed 3D object classification method is also validated by using a virtual wind turbine nacelle CAD dataset and a public CAD dataset named ModelNet40. Consequently, the proposed method is practical and effective for monitoring a fire and overheat component because it can accurately detect critical components with only a few virtual datasets because gathering bigdata for training a neural network is extremely difficult.
- Research Article
28
- 10.1016/j.imavis.2021.104265
- Aug 12, 2021
- Image and Vision Computing
Dense graph convolutional neural networks on 3D meshes for 3D object segmentation and classification