To achieve precise multi-step heave motion prediction for active compensation control in marine equipment, an innovative approach that integrates an attention-fused multi-scale CNN with a Transformer encoder is introduced in this study. The dual-path CNN and point-wise convolution-enhanced Transformer encoder are designed to capture local features across various scales and the global characteristics of heave motion signals, respectively. Furthermore, a novel optimization objective which combines a slope factor-optimized Huber loss function with a maximum square distance loss function is proposed to better approximate the original signals at extreme points. The proposed model is trained and tested on simulated data under multiple sea conditions. In typical active compensation scenarios (Hs=3.5m), the proposed model demonstrates superior performance compared to baseline methods, achieving an RMSE of 0.0083 m, an MAE of 0.0066 m, and the upper whisker of the box plot for the prediction error is less than 0.02 m. These outcomes effectively satisfy the accuracy requirements for active compensation systems. The generalization test on real-ship collected data demonstrates excellent performance after fine-tuning. Additionally, the model is considered suitable for deployment in real-world applications due to its memory efficiency and fast inference speed.