In recent years, technological advancements in face recognition have sparked numerous research efforts and have opened up a variety of applications in fields such as security, access control, and identity verification. The accuracy of two-dimensional (2D) face recognition is not up to the mark when used in highly illuminated or dark environments. Further, its vulnerability to spoofing makes it a poor choice for security applications. These problems can be easily resolved with the help of three-dimensional (3D) face recognition. However, 3D data comes with its own set of issues and challenges. The resources and computational power required to collect and process 3D data are found to be heavy. Most recent signs of progress in this area have been achieved by training deep neural networks on large datasets, which is computationally costly and time-consuming. To address these issues, instead of using 3D face data directly, we propose the use of a 2.5D representation of 3D face data along with registered 2D face images, which makes it relatively easy to work with in terms of computational power and time requirements. The paper proposes a robust face recognition approach using multi-modal data (2.5 face images along with 2D face images) and transfer learning. The proposed approach is built on ResNet-34 and Siamese network models. The ResNet-34 network is first trained on 2D face images. Further, by reusing the pretrained ResNet-34 network model on 2D images, we perform transfer learning to produce a network that can make accurate predictions on 2.5D images. The final outcome of the face recognition is achieved by fusing the results obtained on 2D and 2.5D data. The proposed approach has been validated on the University of Notre Dame 3D face dataset (ND-Collection D). The experimental analysis shows the effectiveness of the proposed technique.