Human Action Recognition has gained a huge research interest due to its widespread applications in various fields. However, due to several challenges like noisy and occluded data, view-point variations, body sizes etc., still the action recognition remains a challenging task. Most of the existing action recognition methods focused on the single data type thereby the recognition system has limited performance. To improve the recognition performance, we have modeled a new approach for human action recognition from two different data types; they are depth images and skeleton joints. Two different descriptors are developed for action representation; they are Differential Depth Motion History Image for depth maps and Motion Kinematic Joint Descriptor for skeleton joints. To attain a discriminative feature set, we have trained three different Convolutional Neural Network Models and the results are fused for final action classification. Simulation is carried out over two public datasets and the obtained results indicate that the proposed approach outperforms state-of-art methods.