Human motion prediction is essential for safe and effective human-robot interaction, but modeling the intricate spatio-temporal dynamics inherent in human movement remains challenging. While recent methods have advanced motion prediction accuracy, they rely on complex neural architectures such as Graph Convolutional Networks and Recurrent Neural Networks, demanding extensive hyperparameter tuning. To overcome this limitation, we propose a streamlined multi-stage layer incremental perceptron (MSLP) architecture that achieves competitive results with far fewer parameters, improving efficiency. The MSLP eschews the complexity of prevalent networks and instead takes a stepped refinement approach to predict intricate motions using a lightweight model. This simplified yet effective design enables nuanced learning of spatio-temporal relationships without exhaustive tuning. Specifically, the MSLP incorporates two key components: a multi-channel feature extraction and enhancement block (MFE-block) and a temporal feature extraction module (TFE-block). The MFE-block strengthens the representation of each action node by integrating multi-dimensional action features. The TFE-block then captures the contextual relationships between actions over time. Together, the MFE-block and TFE-block allow the MSLP to model the complex spatial and temporal dynamics of human movement using a streamlined architecture and minimal parameter tuning. When evaluated on established datasets including Human3.6 M, CMU, 3DPW, and AMASS, the proposed MSLP method achieves improved accuracy gains of 16.7%-67.0% over existing state-of-the-art techniques. Additionally, the MSLP significantly reduces the number of parameters by 35.7%-99.5% compared to prior architectures. We find that the proposed MSLP significantly improves both long-term prediction and generalization capabilities of the model.