Variations in music intensity and pitch can significantly impact sleep and alleviate insomnia through relaxing music. Traditional EEG analysis methods struggle with capturing temporal features and managing time-series data complexity. To address these challenges, we propose an improved Temporal Network model integrated with a Temporal Self-Attention (TSA) mechanism. This model enhances time-step feature capture in EEG signals, transforming one-dimensional EEG data into two-dimensional tensors to capture multi-scale temporal features. Experiments on the DEAP dataset demonstrate significant performance improvements, with 84.26% accuracy, 90.06% precision, and 97.82% recall, outperforming current models. Our model effectively analyzes the impact of relaxing music on different sleep stages, providing insights for improving sleep quality. Future work will optimize the model structure, incorporate additional deep learning techniques, and expand the scope to include multiple physiological signals, aiming to enhance the clinical value for sleep disorder diagnosis and personalized music therapy.