SummaryHuman activity recognition (HAR) has gained researcher's interest due to its increasing demand in automated monitoring applications. Development of efficient HAR algorithm is still an open research area due to the challenges like inter and intra‐class variations, diversity in lighting conditions, view point changes, and complex object motions. Convolutional neural network (CNN) based methods have achieved significant improvement in HAR. However, CNN implementations have drawback that it require a lot of computational resources due to the use of large number of learnable parameters. To overcome this drawback, we propose a simple and computationally efficient deep CNN architecture using multi‐layer information fusion for HAR. In this study, we explore the impact of information fusion at intermediate layers of the network, as each convolutional layer of the network hierarchically extracts information at different level of abstraction of the objects from the video frames. In this work, first we designed a simple and computationally efficient deep CNN architecture and then we introduce a feature fusion strategy to integrate the complementary information of intermediate layers to the layer of the proposed CNN architecture. The proposed architecture is fine‐tuned and trained from scratch with raw RGB data. Softmax classifier is used at the last layer of network for activity classification. Benefits of the proposed architecture over standard deep learning architectures is it's high computational efficiency and reduced requirement of computational resources. To prove the effectiveness of the proposed method, we performed several extensive experiments on publically available datasets. The experimental results of the proposed method have demonstrated its superiority over other existing state‐of‐the‐art methods.