Surface descriptors, which represent the surface characteristics of an image numerically, are the fundamental elements in many vision applications. Although traditional surface descriptors that are handcrafted or learned using machine learning techniques have been applied in many different vision applications, some difficulty remains in handling large amounts of noise and variance in 3D data. To resolve this difficulty, recent studies have applied deep learning techniques for the development of surface descriptors. Unlike other techniques based on the complete 3D CAD model or pre-known mesh information of the object, we consider the constraint of the robotic applications in which the information mentioned above is difficult to preload. In this paper, we propose a new 3D surface descriptor that does not require any pre-loaded topological information of the objects or a mesh construction, which may occasionally fail with new or previously unknown objects. Further, we propose a voxel representation that is adaptive to the density of the points, resolving the problem of varying densities of the point cloud data. Finally, we adopt domain-adversarial learning that leads a network to learn the features discriminative for similarity measurements while remaining invariant to different point densities. We gathered approximately 5,000 point-cloud images of objects along with their position and orientation information. We then constructed approximately half a million pairs of point clouds indicating the identical and different parts of the objects, which are labeled as true and false, respectively. The dataset of constructed pairs was used for the learning of 3D surface descriptors using a Siamese convolutional neural network (SCNN) with a domain-adversarial characteristic. The results indicate that the proposed descriptor outperforms other descriptors.