Autonomous systems play a vital role in creating an ambient environment, but for comprehensive human-robot interaction, the system must interpret and identify specific human behaviours. The demand for recognizing human activity from videos has significantly increased across diverse domains, such as medicine, intelligent video surveillance, human-robot interaction, and ambient-assisted environments. Deep learning has emerged as a pioneering technique for human activity recognition from videos due to its ability to learn features from input videos automatically. This research proposes a novel approach for recognizing activities in videos by leveraging the combined power of the VGG-16 convolutional neural network (CNN) and the Long Short-Term Memory (LSTM). To assess the efficiency of our suggested methodology, experiments are conducted on the widely used benchmark dataset UCF-50. The experimental results demonstrate that our VGG-16 and LSTM-based model outperforms state-of-the-art methods for activity recognition, achieving higher accuracy and robustness across various activity categories.